CDN integration module 6.x-1.x

published on February 16, 2010

In this article, I explain the rationale behind the CDN integration module for Drupal 6, which was written as part of my bachelor thesis. It supports integration with both Origin Pull CDNs (out-of-the-box) and Push CDNs (by using File Conveyor).
Note that development of version 2 of this module has already begun! Version two will also be ported to Drupal 7.

Previously in this series:

It should be obvious by now that we still need a module to integrate Drupal with a CDN, as Drupal does not provide such functionality on its own — if it did, then this bachelor thesis would be titled differently. This is the end of the long journey towards supporting the simplest and the most complex CDN or static file server setups one can make. Fortunately, this is all fairly trivial, except for maybe the necessary Drupal core patch.

1. Goals

The File Conveyor daemon I wrote is not necessary for Origin Pull CDNs. So this module should support those through a simple UI. On the other hand, it must also be easy to use the daemon for a Drupal web site. The former is called basic mode and the latter is called advanced mode, thereby indicating that the latter is more complex to set up (i.e. it requires you to set up the daemon).Here are the goals again, this time in more detail:

  • shared functionality
    • ability to show per-page statistics: number of files on the page, number of files served from the CDN
    • status report shows if CDN integration is active and displays as a warning if it is disabled or in debug mode (to stress the importance of having it enabled)
  • basic mode
    • enter the CDN URL and it will be used in file URLs automatically
    • ability to only use the CDN for files with certain extensions
  • advanced mode
    • enter the absolute path to the synced files database and then file URLs will be looked up from there automatically
    • status report: check if daemon is running, if not, display the report as an error
    • status report: number of synced files, number of files in the pipeline, number of files waiting to enter the pipeline
    • per-page statistics: show from which destination the file is being served
    • per-page statistics: show the total and average time spent on querying the synced files database
    • ability to decide from which destination a file will be served (if multiple destinations for a file are available) based on user properties (user role, language, location) or whatever other property

2. Drupal core patch

I had the chance to speak to Andrew “drewish” Morton at DrupalCon DC about the Drupal core patch that is necessary for the CDN integration module for Drupal to become possible. He is the one who managed to get his proposed Drupal File API patches committed to the current development version of Drupal (which will become Drupal 7). So he definitely is the person to go to for all things concerning files in Drupal right now. I explained to him the need for a unified file URL generation/alteration mechanism and he immediately understood and agreed.

Drupal already has one function to generate file URLs: file_create_url($path). Unfortunately, this function is only designed to work for files that have been uploaded by users or are generated by modules (e.g. transformations of images). And now the bad news: there is no function through which the URLs for the other files (the ones that are not uploaded but are shipped with Drupal core and modules and themes) are generated. To be honest, the current method for generating these URLs is very ugly, although very simple: prepend the base path to the relative file path. So if you want to serve the file misc/jquery.js (which is part of Drupal core), then you would write the following code to generate an URL for it:

$url = base_path() . 'misc/jquery.js';

Andrew and I agreed that since eventually both kinds of files are typically served from the same server(s), it only makes sense to generate their URLs through one function. So the sensible thing to do was to also route the non-uploaded files through the file_create_url() function to generate their URLs. And then there would be a function that a module could implement, custom_file_url_rewrite($path) which would then allow file URLs to be altered.

So, I wrote a Drupal core patch exactly according to these specifications, and it works great. However, we must fall back to the old mechanisms in case the custom_file_url_rewrite() function returns FALSE (meaning that the CDN cannot or should not serve the file). But since there is a distinction between uploaded/generated files and shipping files, we must first determine which kind of file it is. This can be done by looking at the path that was given to file_create_url(): if it begins with the path of the directory that the Drupal administrator chose to use for uploaded and generated files, then it is an uploaded/generated file. After this distinction has been made, the original procedures are applied.

This patch was also ported to Drupal 7 (which will be the next version of Drupal) and submitted for review. Unit tests were added (this is a requirement). The reviews are very positive so far (with Dries Buytaert, the Drupal founder, simply commenting “Awesome.” and adding it to his list of favorite patches) but it was submitted too late to ensure it got committed before this thesis text had to be finalized. However, the positivity of the reviews suggests that is is very likely that the patch will get committed.

Remark: because this is a verbatim copy of the bachelor thesis text, thee above is no longer accurate. The Drupal core patch no longer uses the proposed custom_file_url_rewrite() function, but instead a new hook, hook_file_url_alter(), has been introduced.

3. Implementation

  • A simple configuration UI was created using the Forms API. Advanced mode cannot be started if the daemon is not configured properly yet (by ensuring the synced files database exists).
  • The per-page statistics are rendered through Drupal’s hook_exit(), which is called just before the end of each page request. It is therefor able to render after the rest of the page is rendered, which of course implies that all file URLs have been created, so it is safe to calculate the statistics.
  • A hook_requirements() implementation was created, which allows me to add information about the CDN integration module to Drupal’s status report page.
  • The aforementioned custom_file_url_rewrite() function was implemented, which rewrites the URL based on the mode. In basic mode, the CDN URL is automatically inserted into file URLs and in advanced mode, the synced files database is queried. This is an SQLite database, which the Drupal 6 database abstraction layer does not support. Drupal 7’s database abstraction layer does support SQLite, but is still in development (and will be for at least 6 more months). Fortunately, there is also PDO, which makes this sufficiently easy.

That is all there is to tell about this module. It is very simple: all complexity is now embedded in the daemon, as it should be.

4. Comparison with the old CDN integration module

In January 2008, I wrote the initial version of the CDN integration module. It was written for Drupal 5 instead of Drupal 6 though and was pure PHP code, and thus limited by PHP’s constraints. It did not support Origin Pull CDNs. Instead, it only supported push CDNs that were accessible over FTP. The synchronization happened from within Drupal, on each cron run. Which means it relied on manual file system scanning (i.e. polling) to detect changes and was prevented by design to perform concurrent syncs, since PHP cannot do that. To top it off, it did not store anything in the database, but in a serialized array, which had to be unserialized on every page to retrieve the URLs. It should be obvious that this was significantly slower and absolutely unscalable and definitely unusable on any real web sites out there.

It had its algorithms right though. You could consider it a very faint preview of what the end result looks like right now.

5. Screenshots

The configuration UI

See the attached figures:

The status report

The per-page statistics

This is a republished part of my bachelor thesis text, with thanks to Hasselt University for allowing me to republish it. This is section ten in the full text, in which it was called “Improving Drupal: CDN integration” instead of “CDN integration module 6.x-1”.

Previously in this series:


sime's picture

I haven’t looked into your core patch in much detail, though have you considered incorporating this into Pressflow?