I’ve alluded to it before, but now it’s also been officially approved: I’ll be doing my bachelor thesis on Drupal! I will focus on integrating Drupal with CDNs. Yay! :)
Don’t know what a CDN is? It’s short for Content Delivery Network; a network of (static file or streaming media) servers that are located around the globe. These servers all mirror each othersā files. When a user requests a certain file from the CDN, the server that is the closest to the user will serve the file.
By using a CDN to serve the static components on your web site (CSS, JS, images, fonts), your web site will load much faster: the latency will be lower and the throughput will be greater.
You’d think that the tools necessary to synchronize files to a CDN are already available. Unfortunately, that’s not the case. The most important reason is that CDNs have only become cheaper very recently. Most CDNs assume that they are going to be used for the distribution of big files, and therefore a manual upload ā typically via (S)FTP ā of each file is acceptable. It’s not acceptable for dynamic (Drupal) web sites though!
I will take the CDN integration module that I wrote, extract the useful pieces and then rewrite it properly (i.e. actually scalable) for Drupal 6. Not only is the goal to write a scalable Drupal module (i.e. synchronization through PHP), but also to create a daemon. This daemon (in C++/Qt) will allow for far more efficient synchronization, because you don’t have to scan directories recursively for new and changed files anymore, you can take advantage of file system event monitors (inotify on Linux, FSEvents on Mac OS X and WMI on Windows) to track files without creating overhead.
Interested? For all details (the above really is just an extract), see my proposal (English, 5 pages) at the bottom of this blog post.
My promotor will be Prof. dr. Wim Lamotte and I will be guided by Stijn Agten and Maarten Wijnants.
Officially, I’ll be able to start working on this in February, all the way through July, for about 50% of my time. However, I may start doing some parts of the necessary research, development and potential Drupal core patches sooner, as time permits. Yes, core patches are possible, and even encouraged, because my promotor and guides don’t know Drupal themselves. They’ll exploit Drupal’s peer review process to evaluate my patches.
I’d like to avoid duplicate work, competition is not very useful in this area. Peer review and collaboration are. People who were planning to work on this or who would like to give their input, let me know. I’m sure we’ll be able to arrange something.
Finally, I’d like to thank Dries for doing a review of my proposal. It resulted in several clarifications.
Note: this daemon would of course be independent from Drupal, and therefore it could be reused for other CMSes (WordPress, Joomla, Plone, you name it) or even custom sites.
cool
hey Wim.
Klinkt als een erg leuk project. Alles wat Drupal sneller maakt is natuurlijk zeer welkom. Houd je ons op de hoogte van de voortgang via je blog? Ik zou graag helpen met testen enzo. ;)
Yep, just subscribe to the
Yep, just subscribe to the feed of the ābachelor thesisā tag :)
To CDN or not to CDN
I like the idea, but I think for most sites it would already be great to solve the āfiles problemā. Once you need more than one PHP frontend server for a website, you need a way to share the files directory. Most sites use NFS for this now and that can cause a lot of problems. If your NFS server becomes slow (or NFS breaks for some reason), more and more Apache threads get locked while end users are downloading images and other files. Since you want to keep the maximum number of Apache threads on PHP servers reasonably low (because CPU is probably the limit there), the servers will hit maxclients your site goes down.
Now if you would be able to tell Drupal āall files are served on files.mysite.com/ā instead of mysite.com/files, that would already be a great progress. A webserver that only serves static files can be configured completely different than one that has to process PHP and your complete webfarm would benefit of this, even if you donāt work with a CDN and just install a few servers to server the static files.
It would be great If you could also include support for that in your thesis. Basically itās the same problem and I think the module might even get more users for this use case than for full blown CDN integration. (VRT for example, would certainly be interested IMO)
You know where to find me if you want to discuss this further. ;-)
Static file server support is a byproduct
I agree. The word āCDNā can be replaced by āany kind of file serverā throughout the proposal. So, you could have a separate server farm with your static file servers.
The āmysite.com/filesā vs āfiles.mysite.comā problem is already in the process of being resolved, see the <a href=āhttp://drupal.org/node/214934ā>hook_file_server() core patch. Iād love your feedback there. Because of time constraints Jakub Suchy (meba) has volunteered to help get that core patch in. Once thatās in, supporting static file servers becomes a no-brainer. Itās a tiny subset of what Iād like to support, because the complexity of supporting static file servers isnāt high enough to justify a thesis.
I will of course compare static file servers with CDNs in my thesis.
Static file servers make sense when your entire audience is in the same geographical area. CDNs make more sense when youāve got an international web site. So for the VRT (for the non-Belgians amongst us: itās the Belgian national television, whom are using Drupal), it would indeed make more sense to have static file servers.
Drupal development for
Drupal development for college credit, pretty sweet deal if you ask me. A question: if you will need to patch core for this module to work, do you expect those patches to be accepted? If thatās your intention, maybe this CDN integration should be done on Drupal 7? Another note is that if this is a module you intend to be released and used by the community, I hope you factor in documentation/support into the thesis project, since that is an important part of maintainership. Good luck!
Core patch will be for Drupal 6 & 7
I indeed expect those core patches to be accepted. But itāll go through the normal Drupal peer review process (my promotor has already confirmed that that wonāt be an issue, but even a good thing), so it wonāt require special treatment.
I will focus on Drupal 6, because it could easily be 2009 when Drupal 7 will be out, while my thesis will (have to) be finished by July. However, the core patch will be against Drupal 7, while Iāll maintain a backport of the same patch for Drupal 6.
I have yet to discuss the documentation aspect with my promotor. Support is something that will only happen after Iāve completed my thesis, since thatās when Iāll release it publicly.
Ah
Cool, makes sense about your Drupal 6 and 7 strategy. I guess this wouldnāt really work within the academic framework, but it would be neat to make a release earlier and potentially have people collaborating and submitting patches to your project along the way. I can imagine a thesis advisor (even one who āgetsā open source) not going for this, though. ;)
delivery testing
You mention load testing tools in your paper.
Is this image a result of such a tool? http://drupal.org/node/207490
If so, which one?
I look to seeing your journey on this project.
That's YSlow
Thatās YSlow, not some load testing tool! :)
And I doubt the proposal qualifies as a paper ā¦ or does it? Iām unexperienced in that area.
Thanks for the encouragement! :)
Have you considered posting your module?
Hi,
Have you considered posting a link to your module http://drupal.org/project/cloudfront here http://developer.amazonwebservices.com/connect/entryCreate!default.jspaā¦ ?
That would make it easier for users of CloudFront to find your work.
Regards,
Tal CloudFront GM
The module is still in development!
Thanks for letting me know! :)
However, the module is still in development, itās not yet ready for prime time. Iāll definitely post it over there when itās finished!
Hi Wim, Just wondering if
Hi Wim, Just wondering if things evolved, because module is not updated as promised (february) :)
More seriously, Iām considering installing it but frequent drupal updates nowadays with this module old 5.5 patches seems to be a pain.
I hope youāll find a way to avoid hacking so it will scale smoothly with drupal updates.
Keep me informed.
Hi, I am interested in using
Hi, I am interested in using your code on a live website. Let me know if you have something ready for Drupal 5 and Drupal 6 as well.
Regards GD