In this article, seven distinctly different page loading profiling tools are compared: UA Profiler, Cuzillion, YSlow, Hammerhead, Apache JMeter, Gomez/Keynote/WebMetrics/Pingdom, Jiffy and Episodes. “Profiling” must be interpreted rather broadly: some of the tools cannot measure actual performance but are useful to gain insight in page loading performance characteristics.
If you can not measure it, you can not improve it.
— Lord Kelvin
The same applies to page loading performance: if you cannot measure it, you cannot know which parts have the biggest effect and thus deserve your focus. So before doing any real work, we will have to figure out which tools can help us analyzing page loading performance. “Profiling” turns out to be a more accurate description than “analyzing”:
In software engineering, performance analysis, more commonly today known as profiling, is the investigation of a program’s behavior using information gathered as the program executes. The usual goal of performance analysis is to determine which sections of a program to optimize — usually either to increase its speed or decrease its memory requirement (or sometimes both).
So a list of tools will be evaluated: UA Profiler, Cuzillion, YSlow, Hammerhead, Apache JMeter, Gomez/Keynote/WebMetrics/Pingdom and Jiffy/Episodes. From this fairly long list, the tools that will be used while improving Drupal’s page loading performance will be picked, based on two factors:
- How the tool could help improve Drupal core’s page loading performance.
- How the tool could help Drupal site owners to profile their site’s page loading performance.
1. UA Profiler
UA Profiler is a crowd-sourced project for gathering browser performance characteristics (on the number of parallel connections, downloading scripts without blocking, caching, et cetera). The tests run automatically when you navigate to the test page from any browser — this is why it is powered by crowd sourcing.
It is a handy reference to find out which browser supports which features related to page loading performance.
Cuzillion was introduced on April 25, 2008 so it is a relatively new tool. Its tag line, “‘cuz there are zillion pages to check” indicates what it is about: there are a lot of possible combinations of stylesheets, scripts and images. Plus they can be external or inline. And each combination has different effects. Finally, to further complicate the situation, all these combinations depend on the browser being used. It should be obvious that without Cuzillion, it is an insane job to figure out how each browser behaves:
Before I would open an editor and build some test pages. Firing up a packet sniffer I would load these pages in different browsers to diagnose what was going on. I was starting my research on advanced techniques for loading scripts without blocking and realized the number of test pages needed to cover all the permutations was in the hundreds. That was the birth of Cuzillion.
Cuzillion is not a tool that helps you analyze any existing web page. Instead, it allows you to analyze any combination of components. That means it is a learning tool. You could also look at it as a browser profiling tool instead of all other listed tools, which are page loading profiling tools.
Here is a simple example to achieve a better understanding. How does the following combination of components (in the tag) behave in different browsers?
- an image on domain 1 with a 2 second delay
- an inline script with a 2 second execution time
- an image on domain 1 with a 2 second delay
First you create this setup in Cuzillion (see the attached figure: “The example situation created in Cuzillion”). This generates a unique URL. You can then copy this URL to all browsers you would like to test.
As you can see, Safari and Firefox behave very differently. In Safari (see the attached figure: “The example situation in Safari 3”), the loading of the first image seems to be deferred until the inline script has been executed (the images are displayed when the light purple bars become dark purple). In Firefox (see the attached figure: “The example situation in Firefox 3”), the first image is immediately rendered and after a delay of 2 seconds — indeed the execution time of the inline script — the second image is rendered (the images are displayed when the gray bars stop). Without going into details about this, it should be clear that Cuzillion is a simple, yet powerful tool to learn about browser behavior, which can in turn help to improve the page loading performance.
YSlow is a Firebug extension (see the attached figure: “YSlow applied to drupal.org”) that can be used to analyze page loading performance through thirteen rules. These were part of the original fourteen rules — of which there are now thirty-four — of “Exceptional Performance”, as developed by the Yahoo! performance team.
YSlow 1.0 can only evaluate these thirteen rules and has a hardcoded grading algorithm. You should also remember that YSlow just checks how well a web page implements these rules. It analyzes the content of your web page (and the headers that were sent with it). For example, it does not test the latency or speed of a CDN, it just checks if you are using one. As an example, because you have to tell YSlow (via Firefox’
about:config) what the domain name of your CDN is, you can even fool YSlow into thinking any site is using a CDN — tricking YSlow into thinking drupal.org is using a CDN is easy (see the attached figures: “The original YSlow analysis” and “The resulting YSlow analysis”).
That, and the fact that some of the rules it analyzes are only relevant to very big web sites. For example, one of the rules (#13, “Configure ETags”) is only relevant if you are using a cluster of web servers. For a more in-depth article on how to deal with YSlow’s evaluation of your web sites, see Jeff Atwood’s “YSlow: Yahoo’s Problems Are Not Your Problems”. YSlow 2.0 aims to be more extensible and customizable: it will allow for community contributions, or even web site specific rules.
Since only YSlow 1.0 is available at the time of writing, I will stick with that. It is a very powerful and helpful tool as it stands, it will just get better. But remember the two caveats: it only verifies rules (it does not measure real-world performance) and some of the rules may not be relevant for your web site.
Hammerhead (see the attached figure: “A sample Hammerhead run”), announced in September 2008 is a Firebug extension that should be used while developing. It measures how long a page takes to load and it can load a page multiple times, to calculate the average and mean page load times. Of course, this is a lot less precise than real-world profiling, but it allows you to profile while you are working. It is far more effective to prevent page loading performance problems due to changes in code, because you have the test results within seconds or minutes after you have made these changes!
Of course, you could also use YSlow (see the YSlow section) or FasterFox, but then you have to load the page multiple times (i.e. hammer the server, this is where the name comes from). And you would still have to set up the separate testing conditions for each page load that Hammerhead already sets up for you: empty cache, primed cache and for the latter there are again two possible situations: disk cache and memory cache or just disk cache. Memory cache is of course faster than disk cache; that is also why that distinction is important. Finally, it supports exporting the resulting data into CSV format, so you could even create some tools to roughly track page loading performance throughout time.
5 Apache JMeter
Apache JMeter is an application designed to load test functional behavior and measure performance. In the perspective of profiling page loading performance, the relevant features are: loading of web pages with and without its components and measuring the response time of just the HTML or the HTML and all the components it references.
However, it has several severe limitations:
- Because it only measures from one location — the location from where it is run, it does not give a good big picture.
- It is not an actual browser, so it does not download components referenced from CSS or JS files.
- Also because it is not an actual browser, it does not behave the same as browsers when it comes to parallel downloads.
- It requires more setup than Hammerhead (see the Hammerhead section), so it is less likely that a developer will make JMeter part of his workflow.
It can be very useful in case you are doing performance testing (How long does the back-end need to generate certain pages?), load testing (how many concurrent users can the back-end/server setup handle?) and stress testing (how many concurrent users can it handle until errors ensue?).To learn more about load testing Drupal with Apache JMeter, see John Quinn’s “Load test your Drupal application scalability with Apache JMeter” article and part two of that article.
- limited number of measurement points
- no real-world browsers are used
- unsuited for Web 2.0
- paid & closed source
6.1 Limited number of measurement points
These services poll your site at regular or irregular intervals. This poses analysis problems: for example, if one of your servers is very slow just at that one moment that any of these services requests a page, you will be told that there is a major issue with your site. But that is not necessarily true: it might be a fluke.
6.2 No real-world browsers
Most, if not all of these services use their own custom clients (as mentioned in Scott Ruthfield’s Jiffy presentation at Velocity 2008). That implies their results are not a representation of the real-world situation, which means you cannot rely upon these metrics for making decisions: what if a commonly used real-world browser behaves completely differently? Even if the services would all use real-world browsers, they would never reflect real-world performance, because each site has different visitors and therefor also a different mix of browsers.
6.3 Unsuited for Web 2.0
onload event as the “end time” for response time measurements. In Web 1.0, that was fine. But as the adoption of AJAX has grown, the
onload event has become less and less representative of when the page is ready (i.e. has completely loaded), because the page can continue to load additional components. For some web sites, the “above the fold” section of a web page has been optimized, thereby loading “heavier” content later, below the fold. Thus the “page ready” point in time is shifted from its default.
In both of these cases, the
onload event is too optimistic, as explained in Steve Souder’s Episodes white paper.
There are two ways to measure Web 2.0 web sites (covered by the Episodes presentation):
- manual scripting: identify timing points using scripting tools (Selenium, Keynote’s KITE, et cetera). This approach has a long list of disadvantages: low accuracy, high switching costs, high maintenance costs, synthetic (no real-world measurements).
If we would now work on a shared implementation of this approach, then we would not have to reinvent the wheel every time and switching costs would be much lower. See the Jiffy/Episodes section later on.
6.4 Paid & closed source
The end user is dependent upon the third party service to implement new instrumentations and analyses. It is typical for closed source applications to only implement the most commonly asked feature and because of that, the end user may be left out in the cold. There is a high cost for the implementation and a also a very high cost when switching to a different third party service.
Jiffy (presented at Velocity 2008 by Scott Ruthfield — alternatively, you can view the video of that presentation) is designed to give you real-world information on what is actually happening within browsers of users that are visiting your site. It shows you how long pages really take to load and how long events that happen while or after your page is loading really take. Especially when you do not control all the components of your web site (e.g. widgets of photo and music web sites, contextual ads or web analytics services), it is important that you can monitor their performance. It overcomes four major disadvantages that were listed previously:
- it can measure every page load if desired
- well-suited for Web 2.0, because you can configure it to measure anything
- open source
Jiffy consists of several components:
Jiffy.js: a library for measuring your pages and reporting measurements
- Apache configuration: to receive and log measurements via a specific query string syntax
- Ingestor: parse logs and store in a database (currently only supports Oracle XE)
- Reporting toolset
- Jiffy Firebug extension, (see the attached figure: “The Jiffy Firebug extension”)
Jiffy was built to be used by the WhitePages web site and has been running on that site. At more than 10 million page views per day, it should be clear that Jiffy can scale quite well. It has been released as an open source project, but at the time of writing, the last commit was on July 25, 2008. So it is a dead project.
- Episodes’ goal is to become an industry standard. This would imply that the aforementioned third party services (Gomez/Keynote/WebMetrics/Pingdom) would take advantage of the the instrumentations implemented through Episodes in their analyses.
- Most of the implementation is built into browsers (
Steve Souders outlines the goals and vision for Episodes succinctly in these two paragraphs:
The goal is to make Episodes the industrywide solution for measuring web page load times. This is possible because Episodes has benefits for all the stakeholders. Web developers only need to learn and deploy a single framework. Tool developers and web metrics service providers get more accurate timing information by relying on instrumentation inserted by the developer of the web page. Browser developers gain insight into what is happening in the web page by relying on the context relayed by Episodes.
Most importantly, users benefit by the adoption of Episodes. They get a browser that can better inform them of the web page’s status for Web 2.0 apps. Since Episodes is a lighter weight design than other instrumentation frameworks, users get faster pages. As Episodes makes it easier for web developers to shine a light on performance issues, the end result is an Internet experience that is faster for everyone.
A couple of things can be said about the current codebase of Episodes:
episodes-compat.js. The latter is loaded on-the-fly when an older browser is being used that does not support
window.postMessage(). These files are operational but have not had wide testing yet.
- It uses the same query string syntax as Jiffy uses to perform logging, which means Jiffy’s Apache configuration, ingestor and reporting toolset can be reused, at least partially.
- It has its own Firebug extension (see the attached figure: “The Episodes Firebug extension”).
There is not a single, “do-it-all” tool that you should use. Instead, you should wisely combine all of the above tools. Use the tool that fits the task at hand.
However, for the scope of this thesis, there is one tool that jumps out: YSlow. It allows you to carefully analyze which things Drupal could be doing better. It is not necessarily meaningful in real-world situations, because it e.g. only checks if you are using a CDN, not how fast that CDN is. But the fact that it tests whether a CDN is being used (or Expired headers, or gzipped components, or â€¦) is enough to find out what can be improved, to maximize the potential performance.
This kind of analysis is exactly what I will perform in the next section.
There is one more tool that jumps out for real, practical use: Episodes. This tool, if properly integrated with Drupal, would be a key asset to Drupal, because it would enable web site owners to track the real-world page loading performance. It would allow module developers to support Episodes. This, in turn, would be a good indicator for a module’s quality and would allow the web site owner/administrator/developer to carefully analyze each aspect of his Drupal web site.
I have created this integration as part of my bachelor thesis, the Episodes module. More on this in a follow-up article.