page loading performance

Drupal 6 CDN integration: a test case

In this final article in my bachelor thesis series, I explain how I proved that the work I had done for my bachelor thesis (which includes the Episodes module, the Episodes Server module, the CDN integration module and File Conveyor) actually had a positive impact on page loading performance. For that, I converted a fairly high-traffic web site to Drupal, installed File Conveyor to optimize & sync files to both a static file server and an FTP Push CDN, used the CDN integration module to serve files from either the static file server or the FTP Push CDN (the decision to pick either of those two is based on the visitor's location, i.e. the IP address), measure the results using Episodes and prove the positive impact using Episodes Server's charts.

Previously in this series:


As a back-up plan in case there would not be much feedback from companies (as turned out to be the case), I wanted to have a web site under my own control to use as a test case. That web site is driverpacks.net (see the first figure below for a screenshot of its homepage). It is the web site of an open source project, with more than 100,000 visits per month and more than 700,000 pageviews per month, with traffic coming from all around the world. These fairly large numbers and the geographical spread of its visitors make it a good test case for measuring the effect of a CDN. See the second figure below for a map and details.
Visitors come from 196 different countries, although the top three countries represent more than a quarter of the visitors and the top ten countries represent more than half of the visitors. Nevertheless, this is still a very geographically dispersed audience.

driverpacks.net homepage.

Google Analytics' Map Overlay view for driverpacks.net.

The goal was obviously to port this site to Drupal (which is not a part of this thesis of course) and to install the Episodes module. During about a week, statistics would be collected while no CDN was being used. Then, I would install the daemon on the server to sync the files to a CDN. Next, I would install the Drupal CDN integration module. Then again for about a week, statistics would be collected while the CDN was being used. Hopefully, by visualizing the collected episode measurements, it would be confirmed that this had indeed had a positive effect.

CDN integration module 6.x-1.x

In this article, I explain the rationale behind the CDN integration module for Drupal 6, which was written as part of my bachelor thesis. It supports integration with both Origin Pull CDNs (out-of-the-box) and Push CDNs (by using File Conveyor).
Note that development of version 2 of this module has already begun! Version two will also be ported to Drupal 7.

Previously in this series:


It should be obvious by now that we still need a module to integrate Drupal with a CDN, as Drupal does not provide such functionality on its own – if it did, then this bachelor thesis would be titled differently. This is the end of the long journey towards supporting the simplest and the most complex CDN or static file server setups one can make. Fortunately, this is all fairly trivial, except for maybe the necessary Drupal core patch.

1. Goals

The File Conveyor daemon I wrote is not necessary for Origin Pull CDNs. So this module should support those through a simple UI. On the other hand, it must also be easy to use the daemon for a Drupal web site. The former is called basic mode and the latter is called advanced mode, thereby indicating that the latter is more complex to set up (i.e. it requires you to set up the daemon).Here are the goals again, this time in more detail:

CDN integration module 5.x-1.x

In this article, which was in fact written in January-February 2008 (well over two years ago), I explain what the benefit is of using a CDN and how the then-new CDN integration module 1.x for Drupal 5 could help you do that for a cheap FTP Push CDN.
This was in fact more of a proof of concept module and therefore this Drupal 5 version of the CDN integration module is no longer supported. This article has been published because it would otherwise only been gathering dust. It will give you a better view on Drupal's history for supporting CDNs, i.e. how hacky this solution is in comparison with its follower, the CDN integration module for Drupal 6.

A CDN (short for Content Delivery Network) is basically a load-balanced, globally distributed static file server. Why do CDNs matter to Drupal? Because they can drastically improve its page loading performance. The page loading performance of a web site is the time it takes for the end user's browser to download all files and then render it.
If you'd like to learn more on how to improve Drupal's page loading performance, see the article I wrote about it.

How it works

When a file is uploaded to a CDN, it is sent to servers all over the planet (hence "globally distributed"): in North- and South-America, in Europe and in South-East Asia. These servers are superfast. And more importantly, because you will be downloading files from a server that's typically much closer to you, the latency will be much lower. The exact technique used may also affect the result: some pick a server based on the available capacity on a server, others by proximity. The latter is what matters most if you want fast loading websites, the former is more useful in case of large downloads (videos in particular). In the rest of this article, I'm going to assume you've opted for a CDN that picks servers based on proximity to the client.

Especially web sites with many images will benefit from this: if you have 8 images to download and 100 ms latency, that's only 800 ms combined latency. That might go unnoticed. However, imagine there are 40 images on a web page. That would account for 4000 ms in latency, which would definitely be noticed not go unnoticed.
In reality, a browser can perform parallel downloads, so you can't just add up the latencies linearly. While the example I just gave is scientifically worthless, it's sufficiently correct to illustrate that latency matters: it adds up in the end.

It's also worth noting that most web sites are served from the U.S.A., with the necessary consequences: the latencies quickly grow beyond reasonable proportions. The result is that web sites without CDN's – or without static file servers in Europe, become saddingly slow at the other side of the Atlantic. Digg.com for example, has terrible latencies and terrifying page loading times (6 seconds until I can see everything properly, >12 seconds until everything has finished loading).

CDN integration module: Drupal + CDN made simple!

I won't discuss any implementation challenges here, this introductory blog post is supposed to be a quick and easy introduction and how-to. I'd like to refer you to my aforementioned article, which has a separate section explaining the challenges faced with integrating this module with Drupal core — which is of course a necessity.

File Conveyor: design

In this extensive article, I explain the architecture of the “File Conveyor” daemon that I wrote to detect files immediately (through the file system event monitors on each OS, i.e. inotify on Linux), process them (e.g. recompress images, compress CSS/JS files, transcode videos …) and finally, sync them (FTP, Amazon S3, Amazon CloudFront and Rackspace CloudFiles are supported).

Previously in this series:


So now that we have the tools to accurately (or at least representatively) measure the effects of using a CDN, we still have to start using a CDN. Next, we will examine how a web site can take advantage of a CDN.

As explained in “Key Properties of a CDN”, there are two very different methods for populating CDNs. Supporting pull is easy, supporting push is a lot of work. But if we want to avoid vendor lock-in, it is necessary to be able to transparently switch between pull and any of the transfer protocols for push. Suppose that you are using CDN A, which only supports FTP. when you want to switch to a cheaper, yet better CDN B, that would be a costly operation, because CDN B only supports a custom protocol.

To further reduce costs, it is necessary that we can do the preprocessing ourselves (be that video transcoding, image optimization or anything else). Also note that many CDNs do not support processing of files — but it can reduce the amount of bandwidth consumed significantly, and thereby the bill received every month.

That is why the meat of this thesis is about a daemon that makes it just as easy to use either push or pull CDNs and that gives you full flexibility in what kind of preprocessing you would like to perform. All you will have to do to integrate your web site with a CDN is:

  1. install the daemon
  2. tell it what to do by filling out a simple configuration file
  3. start the daemon
  4. retrieve the URLs of the synced files from an SQLite database (so you can alter the existing URLs to files to the ones for the CDN)

FOSDEM 2010

This weekend on Sunday, February 7, we'll have a full day of Drupal talks at the 10th edition of FOSDEM, Europe's biggest, free-est and open-est software conference.

FOSDEM, is a free and non-commercial event organized by the community, for the community. Its goal is to provide Free and Open Source developers a place to meet. The Drupal project was granted a developer room at FOSDEM to do exactly that: to share knowledge about Drupal.

The presentations schedule for the Drupal devroom features interesting speakers such as Robert Douglass, Károly Négyesi, Roel de Meester and Kristof van Tomme and even more interesting subjects as mobile device design, AHAH, eID and Views 3. Everyone is invited to attend the presentations.

I will be talking about page loading performance once again. My presentation will be similar to the one I gave at DrupalCon Paris 2009, but extended with the goals for CDN integration module 2.0 and a look forward of what I'll work on for my master thesis.
Last but most definitely not least, Joeri Poesen will show off the File Conveyor set-up he uses for a powerful integration with a CDN (which was written as part of my bachelor thesis).

Syndicate content