master thesis

18 October, 2011

Orientation at Facebook

While I obviously can’t publish the details here, the orientation was very cool. The guy who was doing orientation was very energetic and enthusiastic, and this definitely had a positive effect. He explained how the company functions (flatness for the win!), the rationale behind some of its core technologies and products.

Badge and notebook!

What’s also very amazing, is that he’d only been there for 4 months!
In fact, as you talk to more and more Facebook employees, you’ll learn that most of them have actually joined in the past year or so. It’s amazing. It’s also very strange if you’re not used to the start-up culture and the optimistic atmosphere that’s seemingly inherent to Silicon Valley.

In the afternoon, we got our laptops (either MacBook Pros or Lenovo Thinkpads) and phones (iPhones, although you can request an Android device later on). Quite impressive, seeing dozens of new devices lined up in rows and waiting to be used productively.

After the orientation was wrapped up (which included a tour of the headquarters), there was a Happy Hour (i.e. beer), which I skipped to go and meet my manager, Okay Zed, and the rest of the Site Speed team.

17 October, 2011

Saying goodbye

I didn’t expect the goodbye to be easy, but I never expected it to be so hard, either. I think it was one of the hardest things I ever did, on that 23rd of September, 2011.

I was going to miss my friends and family back home, but that’s absolutely nothing in comparison with the goodbye to Anneleen (my girlfriend — she’s awesome!). It was very hard. We barely managed. I wish I could’ve taken her with me. The only way we managed was by telling ourselves that it’s too big an (career) opportunity to pass on, and that the experience I’d gain at Facebook would help my career and thus us for the rest of our lives.

The flight

Facebook booked the flight with British Airways. I’m used to flying with lowcost airlines such as Ryan Air, Brussels Airlines, and so on; so I expect to have to pay for everything.

Well, that simply doesn’t apply to British Airways. The flight booked for me by Facebook’s travel agent to London was in Economy class, but the one to San Francisco was in “Club World class”1. More about that later.

15 August, 2011

On July 1, 2011, I successfully defended my master thesis at Hasselt University’s Expertise Centre for Digital Media. As usual, it’s very hard to compress the entire spectrum of interesting things to explain in the small allowed period of time that we’re allotted (15 minutes this time). I spent a lot of time polishing my presentation to make sure it was as understandable as possible (despite the fast talking pace), but also as interesting as possible. And apparently it paid off!

Afterwards, I received a lot of very positive feedback my presentation from those attending the defense presentation. Fortunately, the content itself was also deemed interesting and solid: I received a score of 80% (16/20)! I’m of course very satisfied with this result :)

However, it doesn’t end here


Update August 16, 21:30 CET

Now that Steve Souders tweeted about this, I think it’s necessary to link from this post to important related information:

25 July, 2011

The last blog post I wrote about my master thesis was on June 1st. The final blog post has been long overdue. To the (very few) readers interested in the technical details, I apologize for the long delay in writing about the last part.
That last blog post was about FP-Growth. This one is about FP-Stream. Whereas FP-Growth can analyze static data sets for patterns, FP-Stream is capable of finding patterns over data streams. FP-Stream relies on the FP-Growth for significant parts, but it’s considerably more advanced. So, in essence, this phase only adds the capability to mine over a stream of data. While that may sound like it is not much, the added complexity of achieving this turns it into a fairly large undertaking.

1 June, 2011

The previous blog post covering my master thesis was about the libraries I wrote for detecting browsers and locations: QBrowsCap and QGeoIP.
On the very day that was published, I reached the first implementation milestone, which implied that it was already finding causes of slow page loads, but not over exactly specified periods of time, but rather over each chunk of 4,000 lines that was read from an Episodes log file. To achieve this, an implementation of the FP-Growth algorithm was completed, which was then modified to add support for item constraints.

FP-Growth {#FP-Growth}

Thoroughly explaining the FP-Growth algorithm would lead us too far. Hence, I’ll include a brief explanation below. For details, I refer to the original paper, “Mining frequent patterns without candidate generation” by J. Han, J. Pei, Y. Yin and R. Mao which can easily be downloaded when searched for through Google Scholar.

22 April, 2011

I’m thrilled to announce that I’ll be joining Facebook’s Site Speed team in Palo Alto, California on September 26, 2011 for a 12-week internship!

After almost two months of being in contact with Facebook, I finally got the liberating call with the verdict yesterday evening: I’ve been accepted!

Backstory {#backstory}

For those of you who want to read it, here’s the full backstory.

Excitement {#excitement}

On February 24, I was contacted via the contact form on my website by Jason Sobel of Facebook. He’s a member of the Site Speed team and mentioned their article about BigPipe (which is the technology they developed to make Facebook load twice as fast). Apparently he had come across my master thesis and my website (i.e. this website) and was interested in my work on making websites faster. Jason asked if I was up for a chat some time to find out what I’ve been working on and so he could give a sense of what the Facebook Site Speed team does. There even was a mention of possibly joining Facebook: “maybe our team would be an interesting opportunity for you?”.

1 March, 2011

In December and January, I’ve continued working on my master thesis, while simultaneously preparing for my exams in January (which I passed without problems).
In a previous blog post, I had indicated that I ran into problems while parsing dates: Qt uses the system locale for this, but on Mac OS X there turned out to be a severe performance problem with that functionality. I solved that by developing QCachingLocale, which is a class that introduces a caching layer to prevent said performance degradations.

Further parsing {#further-parsing}

Now, parsing the date was of course only one tiny part of the problem: I also had to parse the episodes information embedded in each Episodes log file line (which is trivial), as well as map the IP address to a physical location and an ISP and map the user-agent string to a platform and actual browser.
Finally, we also want to map the episode duration to either duration:slow, duration:acceptable or duration:fast. This is called ‘discretization’: continuous values (in our case: durations) are mapped to discrete values.

31 December, 2010

This year, Performance Planet did an advent calendar again, just like last year. I was also invited to write an article, and gladly accepted the invitation. I wrote about WPO Analytics, which is what my master thesis is about. It’s quite strange to see your name appear among the big names of Yahoo, Facebook and Google, but at the same time it’s reassuring that my efforts have not been in vain.
The following article is a 1:1 copy of my “WPO Analytics” article for the 2010 Performance Calendar.

Introduction

Web performance monitoring services such as Gomez, Keynote, Webmetrics, Pingdom, Webpagetest (which was also featured in last year’s web performance advent calendar) and recent newcomers such as Yottaa are all examples of synthetic performance monitoring (SPM) tools.

21 November, 2010

QCachingLocale speeds up Qt’s slow QSystemLocale::query() calls by caching the answers. This seems to be particularly necessary on Mac OS X 10.6.

The other day I was working on my master thesis, on the parser that is going to parse Episodes log files. I had finished a rough version that parses all fields on an Episodes log line. Unfortunately, performance turned out to be extremely poor: 4.8 seconds for parsing 1000 lines.

After a bit of research, it became clear that it was the call to QDateTime::fromString() that was the cause of the performance issues. Unable to figure it out on my own — I tried for an hour or so, I hopped onto the #qt IRC channel and I posted a simple test case that could reproduce the problem:

19 November, 2010

After almost a year since the last master thesis blog post, it’s about time to finally break the silence.

Much has happened since then.

I’ve read a lot for my literature study. It’s quite an adaptation (and a challenge!) to read virtually solely about data mining and statistics. Many of the papers were poorly written (in the typical, extremely awful, overly verbose Academic English). It’s an even larger challenge to actually write about it, in a consistent manner that’s sufficiently formal, yet also understandable.
This is also the reason I haven’t blogged about the progress of my literature study: it is so technical, abstract and complex that it is extremely unlikely that it would have piqued anyone’s interest (although it actually is very cool, sometimes). To be honest, the only thing that kept me going was the anticipation of being able to build something truly useful, possibly game-changing.

Fortunately, on June 24, 2010, 15:00 I successfully defended the literature study of my master thesis, resulting in a score of 16/20!