Thought this was cool: Misc Updates
How big is Facebook data? I got this update from my collaborator Aapo Kyrola:
This morning, there are more than one billion people using Facebook actively each month….
Facebook has also shared a number of key metrics with users along with the announcement, including 1.13 trillion Likes since its 2009 launch (note that this is actually probably higher, since the official press document contained a note accidentally left in from an editor about rolling back the number because of info shared previously with Businessweek), 140.3 billion friend connections, 219 billion photos uploaded, 17 billion location-tagged posts and 62.6 million songs played some 22 billion times.
- Separate code from data.
- Separate input data, working data and output data.
- Save everything to disk frequently.
- Separate options from parameters.
- Do not use global variables.
- Record the options used to generate each run of the algorithm.
- Make it easy to sweep options.
- Make it easy to execute only portions of the code.
- Use checkpointing.
- Write demos and tests.
Following John’s good practice, Tianqi used some of those ideas for competing in KDD CUP. And here is a summary of his experience. Specifically, Tianqi uses Makefiles for managing multiple and complex execution scripts.
from Large Scale Machine Learning and Other Animals: http://bickson.blogspot.com/2012/10/misc-updates.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2FsYXZE+%28Large+Scale+Machine+Learning+and+Other+Animals%29