In this final article in my bachelor thesis series, I explain how I proved that the work I had done for my bachelor thesis (which includes the Episodes module, the Episodes Server module, the CDN integration module and File Conveyor) actually had a positive impact on page loading performance. For that, I converted a fairly high-traffic web site to Drupal, installed File Conveyor to optimize & sync files to both a static file server and an FTP Push CDN, used the CDN integration module to serve files from either the static file server or the FTP Push CDN (the decision to pick either of those two is based on the visitor's location, i.e. the IP address), measure the results using Episodes and prove the positive impact using Episodes Server's charts.
Previously in this series:
As a back-up plan in case there would not be much feedback from companies (as turned out to be the case), I wanted to have a web site under my own control to use as a test case. That web site is driverpacks.net (see the first figure below for a screenshot of its homepage). It is the web site of an open source project, with more than 100,000 visits per month and more than 700,000 pageviews per month, with traffic coming from all around the world. These fairly large numbers and the geographical spread of its visitors make it a good test case for measuring the effect of a CDN. See the second figure below for a map and details.
Visitors come from 196 different countries, although the top three countries represent more than a quarter of the visitors and the top ten countries represent more than half of the visitors. Nevertheless, this is still a very geographically dispersed audience.
The goal was obviously to port this site to Drupal (which is not a part of this thesis of course) and to install the Episodes module. During about a week, statistics would be collected while no CDN was being used. Then, I would install the daemon on the server to sync the files to a CDN. Next, I would install the Drupal CDN integration module. Then again for about a week, statistics would be collected while the CDN was being used. Hopefully, by visualizing the collected episode measurements, it would be confirmed that this had indeed had a positive effect.
In this article, I explain the rationale behind the CDN integration module for Drupal 6, which was written as part of my bachelor thesis. It supports integration with both Origin Pull CDNs (out-of-the-box) and Push CDNs (by using File Conveyor).
Note that development of version 2 of this module has already begun! Version two will also be ported to Drupal 7.
Previously in this series:
It should be obvious by now that we still need a module to integrate Drupal with a CDN, as Drupal does not provide such functionality on its own – if it did, then this bachelor thesis would be titled differently. This is the end of the long journey towards supporting the simplest and the most complex CDN or static file server setups one can make. Fortunately, this is all fairly trivial, except for maybe the necessary Drupal core patch.
The File Conveyor daemon I wrote is not necessary for Origin Pull CDNs. So this module should support those through a simple UI. On the other hand, it must also be easy to use the daemon for a Drupal web site. The former is called basic mode and the latter is called advanced mode, thereby indicating that the latter is more complex to set up (i.e. it requires you to set up the daemon).Here are the goals again, this time in more detail:
In this extensive article, I explain the architecture of the “File Conveyor” daemon that I wrote to detect files immediately (through the file system event monitors on each OS, i.e. inotify on Linux), process them (e.g. recompress images, compress CSS/JS files, transcode videos …) and finally, sync them (FTP, Amazon S3, Amazon CloudFront and Rackspace CloudFiles are supported).
Previously in this series:
So now that we have the tools to accurately (or at least representatively) measure the effects of using a CDN, we still have to start using a CDN. Next, we will examine how a web site can take advantage of a CDN.
As explained in “Key Properties of a CDN”, there are two very different methods for populating CDNs. Supporting pull is easy, supporting push is a lot of work. But if we want to avoid vendor lock-in, it is necessary to be able to transparently switch between pull and any of the transfer protocols for push. Suppose that you are using CDN A, which only supports FTP. when you want to switch to a cheaper, yet better CDN B, that would be a costly operation, because CDN B only supports a custom protocol.
To further reduce costs, it is necessary that we can do the preprocessing ourselves (be that video transcoding, image optimization or anything else). Also note that many CDNs do not support processing of files — but it can reduce the amount of bandwidth consumed significantly, and thereby the bill received every month.
That is why the meat of this thesis is about a daemon that makes it just as easy to use either push or pull CDNs and that gives you full flexibility in what kind of preprocessing you would like to perform. All you will have to do to integrate your web site with a CDN is:
This weekend on Sunday, February 7, we'll have a full day of Drupal talks at the 10th edition of FOSDEM, Europe's biggest, free-est and open-est software conference.
FOSDEM, is a free and non-commercial event organized by the community, for the community. Its goal is to provide Free and Open Source developers a place to meet. The Drupal project was granted a developer room at FOSDEM to do exactly that: to share knowledge about Drupal.
The presentations schedule for the Drupal devroom features interesting speakers such as Robert Douglass, Károly Négyesi, Roel de Meester and Kristof van Tomme and even more interesting subjects as mobile device design, AHAH, eID and Views 3. Everyone is invited to attend the presentations.
I will be talking about page loading performance once again. My presentation will be similar to the one I gave at DrupalCon Paris 2009, but extended with the goals for CDN integration module 2.0 and a look forward of what I'll work on for my master thesis.
Last but most definitely not least, Joeri Poesen will show off the File Conveyor set-up he uses for a powerful integration with a CDN (which was written as part of my bachelor thesis).
This is the brief version of my actual master thesis proposal, which is attached in PDF format.
My bachelor thesis was about making Drupal web sites load faster. 80 to 90% of the response time (as observed by the end user) is spent on downloading the components of a web page. Therefor this is also the part where optimizations have the largest effect.
To be able to prove the positive impact of optimizing the loading of the components of a web site — thereby proving that the work I was going to have done had a positive impact — I researched existing page loading profiling tools. Episodes (which refers to the various episodes in the page loading sequence) came out as a clear winner.
Also as part of my bachelor thesis, I wrote a simple Drupal module — the Episodes module — that could create simple charts to compare the average page loading time per day per geographic region.