Bachelor thesis on Drupal's page loading performance

Published on 18 October, 2008

I’ve alluded to it before, but now it’s also been officially approved: I’ll be doing my bachelor thesis on Drupal! I will focus on integrating Drupal with CDNs. Yay! :)

Don’t know what a CDN is? It’s short for Content Delivery Network; a network of (static file or streaming media) servers that are located around the globe. These servers all mirror each othersā€™ files. When a user requests a certain file from the CDN, the server that is the closest to the user will serve the file.
By using a CDN to serve the static components on your web site (CSS, JS, images, fonts), your web site will load much faster: the latency will be lower and the throughput will be greater.

You’d think that the tools necessary to synchronize files to a CDN are already available. Unfortunately, that’s not the case. The most important reason is that CDNs have only become cheaper very recently. Most CDNs assume that they are going to be used for the distribution of big files, and therefore a manual upload ā€” typically via (S)FTP ā€” of each file is acceptable. It’s not acceptable for dynamic (Drupal) web sites though!

I will take the CDN integration module that I wrote, extract the useful pieces and then rewrite it properly (i.e. actually scalable) for Drupal 6. Not only is the goal to write a scalable Drupal module (i.e. synchronization through PHP), but also to create a daemon. This daemon (in C++/Qt) will allow for far more efficient synchronization, because you don’t have to scan directories recursively for new and changed files anymore, you can take advantage of file system event monitors (inotify on Linux, FSEvents on Mac OS X and WMI on Windows) to track files without creating overhead.

Interested? For all details (the above really is just an extract), see my proposal (English, 5 pages) at the bottom of this blog post.

My promotor will be Prof. dr. Wim Lamotte and I will be guided by Stijn Agten and Maarten Wijnants.

Officially, I’ll be able to start working on this in February, all the way through July, for about 50% of my time. However, I may start doing some parts of the necessary research, development and potential Drupal core patches sooner, as time permits. Yes, core patches are possible, and even encouraged, because my promotor and guides don’t know Drupal themselves. They’ll exploit Drupal’s peer review process to evaluate my patches.
I’d like to avoid duplicate work, competition is not very useful in this area. Peer review and collaboration are. People who were planning to work on this or who would like to give their input, let me know. I’m sure we’ll be able to arrange something.

Finally, I’d like to thank Dries for doing a review of my proposal. It resulted in several clarifications.

Note: this daemon would of course be independent from Drupal, and therefore it could be reused for other CMSes (WordPress, Joomla, Plone, you name it) or even custom sites.

hey Wim.

Klinkt als een erg leuk project. Alles wat Drupal sneller maakt is natuurlijk zeer welkom. Houd je ons op de hoogte van de voortgang via je blog? Ik zou graag helpen met testen enzo. ;)

I like the idea, but I think for most sites it would already be great to solve the ā€œfiles problemā€. Once you need more than one PHP frontend server for a website, you need a way to share the files directory. Most sites use NFS for this now and that can cause a lot of problems. If your NFS server becomes slow (or NFS breaks for some reason), more and more Apache threads get locked while end users are downloading images and other files. Since you want to keep the maximum number of Apache threads on PHP servers reasonably low (because CPU is probably the limit there), the servers will hit maxclients your site goes down.

Now if you would be able to tell Drupal ā€œall files are served on files.mysite.com/ā€ instead of mysite.com/files, that would already be a great progress. A webserver that only serves static files can be configured completely different than one that has to process PHP and your complete webfarm would benefit of this, even if you donā€™t work with a CDN and just install a few servers to server the static files.

It would be great If you could also include support for that in your thesis. Basically itā€™s the same problem and I think the module might even get more users for this use case than for full blown CDN integration. (VRT for example, would certainly be interested IMO)

You know where to find me if you want to discuss this further. ;-)

I agree. The word ā€˜CDNā€™ can be replaced by ā€˜any kind of file serverā€™ throughout the proposal. So, you could have a separate server farm with your static file servers.

The ā€˜mysite.com/filesā€™ vs ā€˜files.mysite.comā€™ problem is already in the process of being resolved, see the <a href=ā€http://drupal.org/node/214934ā€™>hook_file_server() core patch. Iā€™d love your feedback there. Because of time constraints Jakub Suchy (meba) has volunteered to help get that core patch in. Once thatā€™s in, supporting static file servers becomes a no-brainer. Itā€™s a tiny subset of what Iā€™d like to support, because the complexity of supporting static file servers isnā€™t high enough to justify a thesis.

I will of course compare static file servers with CDNs in my thesis.

Static file servers make sense when your entire audience is in the same geographical area. CDNs make more sense when youā€™ve got an international web site. So for the VRT (for the non-Belgians amongst us: itā€™s the Belgian national television, whom are using Drupal), it would indeed make more sense to have static file servers.

akahn

16 years 3 months ago

Drupal development for college credit, pretty sweet deal if you ask me. A question: if you will need to patch core for this module to work, do you expect those patches to be accepted? If thatā€™s your intention, maybe this CDN integration should be done on Drupal 7? Another note is that if this is a module you intend to be released and used by the community, I hope you factor in documentation/support into the thesis project, since that is an important part of maintainership. Good luck!

I indeed expect those core patches to be accepted. But itā€™ll go through the normal Drupal peer review process (my promotor has already confirmed that that wonā€™t be an issue, but even a good thing), so it wonā€™t require special treatment.

I will focus on Drupal 6, because it could easily be 2009 when Drupal 7 will be out, while my thesis will (have to) be finished by July. However, the core patch will be against Drupal 7, while Iā€™ll maintain a backport of the same patch for Drupal 6.

I have yet to discuss the documentation aspect with my promotor. Support is something that will only happen after Iā€™ve completed my thesis, since thatā€™s when Iā€™ll release it publicly.

akahn

16 years 3 months ago

Cool, makes sense about your Drupal 6 and 7 strategy. I guess this wouldnā€™t really work within the academic framework, but it would be neat to make a release earlier and potentially have people collaborating and submitting patches to your project along the way. I can imagine a thesis advisor (even one who ā€˜getsā€™ open source) not going for this, though. ;)

Thatā€™s YSlow, not some load testing tool! :)

And I doubt the proposal qualifies as a paper ā€¦ or does it? Iā€™m unexperienced in that area.

Thanks for the encouragement! :)

zanoman

15 years 8 months ago

Hi Wim, Just wondering if things evolved, because module is not updated as promised (february) :)

More seriously, Iā€™m considering installing it but frequent drupal updates nowadays with this module old 5.5 patches seems to be a pain.

I hope youā€™ll find a way to avoid hacking so it will scale smoothly with drupal updates.

Keep me informed.

gdtechindia

15 years 8 months ago

Hi, I am interested in using your code on a live website. Let me know if you have something ready for Drupal 5 and Drupal 6 as well.

Regards GD