Project: Pattern Miner

published on April 9, 2012

Technologies: 
C++
Data mining
Qt
Time range: 
September 2011 to December 2011
Client: 
As a: 
student

From my “Looking for a job” blog post:

While still developing my master thesis, I was contacted by Facebook.
I passed Facebook’s technical interviews and then flew out to Silicon Valley in September 2011. There, I was part of the Site Speed team. I worked on “regular Site Speed team stuff”, but most of my time was devoted to my intern project.
My intern project was basically about making my master thesis useful for Facebook: making it more generic and integrating it with existing internal Facebook tools, to detect patterns in performance data.
It is currently running in production. It’s used by the Site Speed team for detecting performance problems and will be used by two other teams.
There are currently five pattern mining jobs that are mining data streams. The biggest job analyzes 17 million samples per day, but splits each sample into 5 separate ones so that’s about 85 million per day — that’s almost 1,000 per second. Per sample, 10—11 attributes are analyzed, so that’s about 900 million attributes analyzed per day. And that’s just one of the five jobs.