Master thesis proposal: "Web Performance Optimization: Analytics"

This is the brief version of my actual master thesis proposal, which is attached in PDF format.


Introduction

My bachelor thesis was about making Drupal web sites load faster. 80 to 90% of the response time (as observed by the end user) is spent on downloading the components of a web page. Therefor this is also the part where optimizations have the largest effect.

To be able to prove the positive impact of optimizing the loading of the components of a web site — thereby proving that the work I was going to have done had a positive impact — I researched existing page loading profiling tools. Episodes (which refers to the various episodes in the page loading sequence) came out as a clear winner.

Also as part of my bachelor thesis, I wrote a simple Drupal module — the Episodes module — that could create simple charts to compare the average page loading time per day per geographic region.

Despite its obvious (intended) lack of optimizations, it was sufficient to prove that File Conveyor — the daemon that I wrote to automatically sync files to any CDN, regardlesss of the file transfer protocol used — when integrated with a Drupal web site and thus providing CDN integration for that web site, had a positive impact: the test web site consistently loaded about twice as fast, especially for visitors with slower internet connections, such as visitors from Brazil. Without this proof-of-concept implementation, I would never have been able to prove the positive impact on performance.

Context

More and more companies are paying attention to page loading performance. Notable recent proposals include SPDY (a proposed new version of HTTP, with much better performance characteristics), Resource Packages (zipping several resource files together into one package, to reduce the number of requests), Web Timing (a proposed specification to integrate parts of Episodes' functionality directly into the browser, to allow for more accurate and more complete measurements).
To top it off, Google is almost certainly going to include page loading performance ("page speed") as a ranking factor (they've already included it in Webmaster Tools and are providing a faster DNS service, Google Public DNS.

Problem

Simply implementing all known tricks is not enough, because using a CDN might speed up your web site for half your visitors and slow it down for the other half — although that’s an extremely unlikely scenario. That’s why you need to be able to do Continuous Profiling (cfr. Continuous Integration).

Continuous Profiling means that you are continuously monitoring your real- world page loading performance: you must track the page loading characteristics of each loaded page! That by itself is easy: all it requires is to integrate Episodes with your web site. The actual problem lies in analyzing the collected data. To be able to draw meaningful conclusions from the collected data, we need to apply data mining techniques as well as visualizing the conclusions that are found.

So what I think is needed, is something like Google Analytics, but for page loading performance instead of just page loads.

Proposal

So that is exactly what my proposal is: an analytics suite for tracking page loading performance. An application that can automatically extract conclusions out of Episodes logs and visualize them.

Update — accepted!

Great news, my master thesis proposal has been accepted! My promotor will be professor Jan Van den Bussche, whom is a well-known researcher and an excellent speaker. He has many, many publications on query optimization, data mining and related fields in theoretical computer science.

After this semester's exams (which will be in January), we will discuss the details. If you're

For more details, see the full version in PDF format. (The LyX file is also available.)

AttachmentSize
proposal.pdf288.25 KB

Comments

Pingback

[...] Master thesis proposal: "Web Performance Optimization: Analytics" | Wim Leers wimleers.com/blog/master-thesis-proposal-web-performance-optimization-analytics – view page – cached This is the brief version of my actual master thesis proposal, which is attached in PDF format. [...]

Pingback

[...] post: Master thesis proposal: "Web Performance Optimization: Analytics … By admin | category: Uncategorized | tags: cunningham, natural, pdf, performance, [...]

Master Thesis

Wim, I wish you the best luck in getting this approved as a master thesis subject. Hoping that the final result will be something that alot of people can take profit from. I'm sure speed and page loading time are already a huge factor in the eyes of consumers, so everything to improve that benefits everybody directly and indirectly!

I'm expecting nothing else then great work from this and I'm sure you will be able to do it!

Good luck

How's Barcelona?

Thanks Nick! :)

How's Barcelona?

Best wishes on your graduate

Best wishes on your graduate studies.

Accepted!

My proposal got accepted! :)

Feed & git repository

Congratulations!

Glad to hear your proposal has been accepted. This looks like it's going to be a really interesting thesis.

Thanks John! I have no doubt

Thanks John! I have no doubt it will be challenging, but hopefully it'll be actually interesting and useful in practice as well :)

terrific exploration

what fun it is to read your travels to the far edges of front end performance land. you make academia sound fun again!

I like exploring!

Thanks moshe! :) Coming from your mouth, that means a lot!

Nice... I will be following

Nice... I will be following this closely :).

Not only web design?

I thought you were only interested in web design — apparently not :)

Nah, I'm interested in a

Nah, I'm interested in a whole bunch of disciplines :-). Maybe I'm not able to dig as deep into some of them as I would like to, but learning something along the way is always a benefit, no? =].

Definitely!

Diverse skills are *always* a benefit :)

Interesting proposal Wim

Interesting proposal Wim, looking forward to see more practical solutions from you as well.

Proposal

This would rock when integrated with piwik, don't you think?

Thanks for mentioning that!

Thanks for mentioning that! I've been pointed at piwik before, but had totally forgotten about it. Are there more of these projects you can point me to?

Integration with "normal" web analytics would definitely be very useful. I'm not sure if it will be doable given the time constraints, but I'll definitely try! :)

Congrats on your Thesis

Congrats on getting your Masters thesis accepted! It's a very interesting subject. I work on analytics too (next-gen data mining + visualization). Page loading performance isn't a parameter we have looked into, but it's a great idea to measure its impact. Anyways here our stuff: http://www.data-applied.com.

web performance

Looks really interesting especially if, like Google Analytics, your 'web performance analytics' will be free and accessible to the masses.

Well, it will definitely be

Well, it will definitely be free since it will be open source. I hope it'll be good enough (in terms of performance, UI and requirements) so that it will be accessible to the masses.

My goal is as for my bachelor thesis: make something that is actually useful.

Thanks for the comment! Since you're with Aptimize: if you're interested in providing feedback along the way, let me know!

Hi Wim, thanks, we're

Hi Wim, thanks, we're interested in following your progress. Good luck with your research and if you think we can help in any way just get in contact.

Awesome! Thanks :)

Awesome! Thanks :)

Pingback

[...] with the goals for CDN integration module 2.0 and a look forward of what I'll work on for my master thesis. Last but most definitely not least, Joeri Poesen will show off the File Conveyor set-up he uses [...]

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h2> <h3> <h4> <h5> <h6> <pre> <s>
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Lines and paragraphs break automatically.
  • Insert Flickr images: [flickr-photo:id=230452326,size=s] or [flickr-photoset:id=72157594262419167,size=m].
Syndicate content