Thought this was cool: Surprising Hacker News Data Analysis | RJMetrics Blog
Obligatory Plug (let’s get this out of the way): I’m the co-founder of a company called RJMetrics. We develop hosted software that helps online businesses make smarter decisions using their data. I used RJMetrics to do all of this analysis and only scraped the surface of what our tool can do. If you’d like to see what RJMetrics can do for your business, sign up for a free 30 day trial. OK, onto the good stuff…
A few days ago, I was lamenting to my co-founder Jake about a frustrating problem: my blog content had stopped making it to the front page of Hacker News. While my posts are admittedly formulaic (I usually get my hands on some never-before-seen data and analyze it in RJMetrics), they always seemed to work their way to the top.
But lately I’ve been coming up dry. My TechCrunch guest post on how start-ups approach patents? Nah. My piece on never-before-seen Pinterest data? Fail. How about new data on the behind-the-scenes world of VC deal sharing? Another bomb.
I had some self-serving theories: Hacker News had devolved, succumbed to voter rings, or maybe just become too mainstream. Jake, as he often does, offered up alternative theory: my content sucks.
Jake proposed that the content landscape has become more competitive as HN has grown and that my content hasn’t improved fast enough to keep up.
As with most of our arguments, we decided to let the data decide. I used ThriftDB’s HNSearch API to pull down a complete history of Hacker News submissions, comments, and scores. I then plugged the data into an RJMetrics Dashboard and went to work answering some questions about the evolution of community, content, and competition on Hacker News.
Read on to see the data behind findings like these:
- On Hacker News, the rate of new user registrations grew explosively in 2010, was flat in 2011, and is down in 2012.
- The total number of active users continues to grow because a high percentage of historical users continue to participate on HN even years after their initial registrations.
- Despite growth in the user population, the number of submissions made to Hacker News each week has held steady since 2011.
- If you want upvotes, use profanity and talk about hot startups. Steer away from big companies and sensationalist headlines.
The population of registered Hacker News users has grown considerably over time. As shown in the chart below, however, new user registrations flattened out around 10,000 new users per quarter in 2011 and appear to actually be slightly lower so far in 2012.
I was surprised to see this decline in growth rate, which goes against the argument that Hacker News has gone too mainstream. To me, the recent flatness suggests a market saturation point. If HN’s userbase is bounded by the number of new “startup tech enthusiasts” arriving on the scene each year, its base may not be changing much after all.
As we all know, registered users don’t tell the whole story. It’s the active users that make up a community. Below is the number of users who performed at least one action by quarter (note that these are limited to submissions or comments because I don’t have access to user-specific voting data).
Currently, about 30,000 users submit articles or comments per quarter, up about 7x from the levels of late 2008. However, this number is not climbing very quickly. In the last quarter for example, despite about 7,700 new registered users, the number of users who submitted an article or comment increased by only about 600.
This suggests that there may be a large number of new registrations who are DOA, only participate as readers/voters, or that there is drop-off in user activity over time. A cohort analysis will shed some light on this for us.
Hacker News Cohort Analysis
I pulled this out of RJMetrics in about 30 seconds. The chart below shows, for the “Q1″ cohort of each full calendar year since 2008, the number of registrants who submit or comment in each quarter of their lifecycle.
There are a few noteworthy trends here:
- Consistently, about 75% of the users who register will submit an article or comment in their first quarter as a user.
- In the second quarter, the number consistently drops to the 30-40% range.
- In later months, the participation range stabilizes around 20-30%, but there is a clear distinction between each year’s cohorts. By about 2 years out, the active user percentage of each year’s cohort is 3-5% lower than the previous year’s cohort.
This chart is remarkable for a number of reasons. While it’s clear that the engagement of the average new user is declining over time, I think that the more unusual (and impressive) take-away is that such a consistent percentage of registered users return each quarter, even years after their original registrations.
If anyone out there suspected that the “old guard” had given up on HN, this chart proves them wrong. The number of users from my 2008 vintage that are still using the site is actually holding quite steady around the same level it was at in 2009 and 2010.
Highly Active Users
Doing all of these analyses by quarter tells us about users with a minimum level of activity, but it doesn’t tell us much about “very active” users. As you might imagine, with each activity a user conducts, she is more likely to conduct another.
To see if “hyperactivity” is a trait that has increased or decreased with new cohort of users, we looked at the percent of users by quarter who performed at least 30 actions in their first 90 days after registration.
As you can see, since 2009 the percentage of new users that meet this threshold of being “highly active” has dropped from around 4%-5% to around 2%. When combined with the cohort analysis, we start to see a picture of an average user who is active but less deeply engaged with each new cohort.
At this point, I was growing concerned that Jake might be right. These trends were concerning but far from damning. To get the real answers, I was going to have to turn my attention to the content that was beating me to the front page.
With a fixed 30 slots available on the HN homepage, it becomes statistically less likely to make it to that coveted top-30 slot with each new submission that comes in. Amazingly, however, the number of submissions to HN in the past two years has been… well… flat.
Despite growth in the user population, the number of stories competing for the top spots each day has held steady in recent history. I think this again speaks to market saturation– there are only so many stories that are relevant to this community. (And spammers have learned that flooding the community with off-topic links doesn’t yield page views.)
Interestingly, if you look at the number of upvotes cast each day, the trend is similar. For the past two years, the same number of stories have been competing for about the same number of votes each day.
I no longer buy Jake’s argument that the volume of competition has increased. The only question left is which has gotten worse: my content or the community’s taste?
As a next step, I decided to categorize the posts in the database into buckets based on key words that appeared in their titles. I measure “popularity” by the average number of points earned by submissions with these words in their titles.
I chose to categorize content by the mention of things like big companies (i.e., Amazon, Google), Hot Startups (i.e. Pinterest, Instagram), Sensationalism (i.e. Best, Worst, First), Programming Languages (everything I could think of), and Profanity (which was fun). Note that not all content on HN falls into one of these categories and that the overall average score for any post is about 11 points.
Apparently, if you want to write a popular article you should avoid sensationalism and be sure to swear in your title.
Unfortunately for me, however, the tastes of the Hacker News community have largely held stable over the past four years. It’s hard to make an argument that there has been a cataclysmic shift toward embracing sensationalism or a deviation from the core focus on technical content.
Post Performance By Content’s Domain Name
I looked at domains that hosted at least 20 submissions in 2012 and ranked them by average number of points per submission. The top 20 are below.
I was also curious how the domain names of popular news sites performed. The results were a bit surprising.
As you can see, content from mainstream tech blogs like TechCrunch and PandoDaily perform about average, while blogs like Mashable and Business Insider perform extremely poorly. Also interesting is the enormous gap between the New York Times, whose content tops this list, and the Wall Street Journal, whose content performs among the worst. This speaks to the quality of the average piece of content from each of these news sources (at least in the eyes of HN community).
I think this exercise can be summed up in a few simple conclusions:
- By the numbers, Hacker News hasn’t changed much in the past two years. New members compensated for attrition, the most passionate users haven’t left, and approximately the same number of submissions and votes happen every day.
- Specific companies come and go, but the community cares about programming, startups, and controversy.
- Once you’re hooked, you’re hooked. Users still participating in the community 6 months after joining will most likely still be participating years later.
So, why has my content had such a rough time making it to the front page? It’s because my content hasn’t been tailored to my audience. (In other words, I think I owe Jake five bucks.)
What Can I Do Better?
Remember those three posts of mine that bombed? One about Pinterest data, one about software patent cohorts, and one about VC collusion? Here are some facts:
- HN doesn’t care about Pinterest. The average submission with “Pinterest” in the title has received 6 just points, one of the lowest average scores out of any company name I investigated.
- As a buzzword, cohort analysis peaked last year. It had an average score of 13 in 2011 but only averages only 5 points in 2012.
- While things like acquisitions rack up the points, “venture” in general just doesn’t grab attention like it used to, earning an average score of only 7 points.
As it turns out, following my formula for writing posts only gets me so far. I’ve been picking the wrong subject matter to hit it big on Hacker News. I need a little sizzle with my steak, but the keys to success aren’t any different than they ever were.
Want to perform this kind of analysis (and much, much more) on your company’s data? Click Here to try RJMetrics free for 30 days.