In this very brief article, I highlight the key properties of CDNs: what differentiates them and which technical implications you should keep in mind.
A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity.
It is extremely hard to decide which CDN to use. In fact, by just looking at a CDN’s performance, it is close to impossible (see “Content Owners Struggling To Compare One CDN To Another” and “How Is CDNs Network Performance For Streaming Measured?”)!
That is why CDNs achieve differentiation through their feature sets, not through performance. Depending on your audience, the geographical spread (the number of PoPs around the world) may be very important to you. A 100% SLA is also nice to have — this means that the CDN guarantees that it will be online 100% of the time.
You may also choose a CDN based on the population methods it supports. There are two big categories here: push and pull. Pull requires virtually no work on your side: all you have to do, is rewrite the URLs to your files: replace your own domain name with the CDN’s domain name. The CDN will then apply the Origin Pull technique and will periodically pull the files from the origin (that is your server). How often that is, depends on how you have configured headers (particularly the Expires
header). It of course also depends on the software driving the CDN — there is no standard in this field. It may also result in redundant traffic because files are being pulled from the origin server more often than they actually change, but this is a minor drawback in most situations. Push on the other hand requires a fair amount of work from your part to sync files to the CDN. But you gain flexibility because you can decide when files are synced, how often and if any preprocessing should happen. That is much harder to do with Origin Pull CDNs. See this table for an overview:
Pull | Push | |
---|---|---|
Transfer protocol | none | FTP, SFTP, WebDAV, Amazon S3 … |
Advantages | virtually no setup |
|
Disadvantages |
| setup |
It should also be noted that some CDNs, if not most, support both Origin Pull and one or more push methods.
The last thing to consider is vendor lock-in. Some CDNs offer highly specialized features, such as video transcoding. If you then discover another CDN that is significantly cheaper, you cannot easily move, because you are depending on your current CDN’s specific features.
My aim is to support the following CDNs in this thesis:
- any CDN that supports Origin Pull
- any CDN that supports FTP
- Amazon S3 and Amazon CloudFront. Amazon S3 (or Simple Storage Service in full) is a storage service that can be accessed via the web (via REST and SOAP interfaces). It is used by many other web sites and web services. It has a pay-per-use pricing model: per GB of file transfer and per GB of storage.
Amazon S3 is designed to be a storage service and only has servers in one location in the U.S. and one location in Europe. Recently, Amazon CloudFront has been added. This is a service on top of S3 (files must be on S3 before they can be served from CloudFront), which has edge servers everywhere in the world, thereby acting as a CDN.
This is a republished part of my bachelor thesis text, with thanks to Hasselt University for allowing me to republish it. This is section five in the full text.
Previously in this series:
pull refresh
You can control the frequency of how often a pull-CDN refreshes with some URL query trickery:
<?php /** * We use this so that we don’t have to bust the CDN every time someone updates content. * * @param string $granularity = ‘hour’ * One of ‘hour’, ‘minute’. * @return string * URL query string, including ‘?’. */ function _tck_flash_query_string($granularity = ‘hour’) { $query_string = ‘?’. substr(variable_get(‘css_js_query_string’, ‘0’), 0, 1); if ($granularity == ‘minute’) { return $query_string . date(‘Ymdhi’, $_SERVER[‘REQUEST_TIME’]); } return $query_string . date(‘Ymdh’, $_SERVER[‘REQUEST_TIME’]); } ?>CDN-specific
This is CDN-specific: not every CDN supports this, because there is no standard for it. E.g. SimpleCDN doesn’t support the sample code you posted.
For which CDN is this?
Thanks
Thanks for this read mate. Well, this is my first visit to your blog! But I admire the precious time and effort you put into it, especially into interesting articles you share here!
Pingback
[…] To the visitor, all content delivery networks operate the same way. Edge locations across the globe provide critical content to them at a great accelerated pace due to their proximity. However, to the webmaster, they often work in radically different ways. […]
Google and CDN
A lot of people do not know it, but what you wrote regarding CDN: “The server selected for delivering content to a specific user is typically based on a measure of network proximity.” is actually the key for success of Google. Reportedly they struggled a lot with the fast delivery of their services, especially video delivery was a huge problem and solved it with connecting users to the servers in the immediate proximity. A problem with this particular part of the technology was also the reason for one of the rare outages of gmail a while ago. If the net neutrality will be defeated this area could become really important in the future. Best regards.
Pull vs. Push
Which do you personally think deliver the content to the clients faster, pull or push?
My thinking is that pull has to go get the content the first time it is requested from another source (if your cdn and hosting are form separate vendors), so it would take longer to deliver it than push, because push already has the content in it’s network, and it just has to deliver it to the edge.
Your thoughts?