top | item 4408764

A Statistical Portrait of a Y Combinator Batch

118 points| glaugh | 13 years ago |blog.statwing.com | reply

38 comments

order
[+] polyfractal|13 years ago|reply
This is a really excellent example of using your own product to generate interesting content as a way to drive traffic back to your product.

Great work Statwing. Can't wait until I have some data that needs analyzing so I can use your service.

[+] martythemaniak|13 years ago|reply
"Higher values for Number of Employees/Contractors (FTEs) are weakly associated with higher values for Average Age of Company's Founders (Rounded)"

I had a suspicion that that might be true, but I wonder why that is? Perhaps older founders tackle problems that need more domain expertise and more people? Or perhaps they can rely on savings and have been able to bootstrap a little better than high-school/college grads?

Anyway, good job on StatWing, I love playing around with numbers and graphs. Perhaps some public datasets will help people get more familiar with the app and serve as demo.

[+] achompas|13 years ago|reply
I wonder why that is?

Let's not rush to find an explanation, now!

Looking at the data, we see that two outliers (age 30 with 10 employees, age 33 with 12 employees) are driving what is admittedly a "small" correlation. These two companies are about 4 standard deviations away from the mean of 1.4 employees/company.

They're arguably outliers, and I suspect they're skewing any effect we might be seeing.

Of course, a small, skewed sample (tech companies in YC) obviously means we can't infer a damn thing beyond YC members to begin with, but its worth pointing out those outliers.

[+] jeremyjh|13 years ago|reply
It would be interesting to know if the average hours worked per week would also be correlated with # of employees and age. My hypothesis is that older founders will not be as likely to get involved in a startup that would have them doing everything themselves and working the kind of hours thus entailed. They will tend to go for those startups where for whatever reason (early revenue streams ) they can staff up sooner.
[+] simonw|13 years ago|reply
Maybe older founders are more likely to have previous experience with hiring full time or freelance employees, and are hence less likely to doing everything themselves.
[+] austinlyons|13 years ago|reply
Thanks for posting. It would be fun to see the ages of accepted YC applicants compared with the rejected applicants. I'm not sure how easy it would be to get the data of the rejected applicants though. Maybe they would self-report their information if you posted something here on HN.
[+] glaugh|13 years ago|reply
Really good idea. We should definitely do that.
[+] incision|13 years ago|reply
I wonder, is that that spike at 39 thinking "Shit, I'm about to turn 40. It's now or never!"
[+] mindcrime|13 years ago|reply
As a 39 year old who is definitely feeling the "Shit, I'm about to turn 40. It's now or never!" thing, I'd say that sounds pretty likely to me. (note: I'm not in the YC batch, but am a startup founder, so I'm just referring to the general issue here, of feeling the need to go the entrepreneurial route now and not later).

Granted, it's just one anecdote, but that definitely rings true here.

[+] revorad|13 years ago|reply
I wondered the same about the peak at 27 and the spike at 29. But, this is a pretty small dataset, so shouldn't read too much into it.
[+] rhplus|13 years ago|reply
The 'spike' is the difference between 1 datapoint and 2 datapoints.
[+] wtvanhest|13 years ago|reply
It would be interesting to put this against the ages of Gen Y distribution. I believe this grouping would actually look relatively old to the peak in population if we assumed only Gen Y would apply.

I am basing this on my memory that 1990 was the peak year for those born in Gen Y. (I cannot find the data set to back it up, but I bet someone else knows where to get it).

+ a few outside of Gen Y.

[+] breckenedge|13 years ago|reply
Good stuff. I imported and played with a regulatory dataset.

The results mostly confirm industry suspicions that enforcement differs the most based on what region an operator is in (poor regulatory performance operators are mostly located in the same regulatory region).

What was neat was how little individual manufacturer's designs mattered. But over time, it was either hugely advantageous or hugely disadvantageous to simultaneously operate multiple types of designs. Example: in 2007 it was about 5% better to simultaneously operate multiple designs, but in 2010, it was about 17% worse.

Also confirmed that it was much, much better (from a penalty standpoint) to find and self-disclose regulatory non-compliance rather than to let the regulator find it.

Awesome work guys! Will there be an ability to play with the time dimension soon?

[+] glaugh|13 years ago|reply
Awesome! That's really cool.

Time is tricky. That's among our most requested features, though. So we won't get to it in the very very near future, but its definitely on the roadmap.

Thanks for the comments, really appreciate it!

[+] alexshye|13 years ago|reply
I'm interested in the 15% of YC startups with a single founder. I wonder if that number is higher than average compared to other classes? And if so, how much higher? I'm also curious if there is a correlation with the average age.

Anyone privy to this information and willing to share?

[+] zackzackzack|13 years ago|reply
So I rifled through the source of the webpage and found there is no way to download the actual data from that page. All the calculations are being done on the server side and the summary results are getting sent over via HTTP.

An interesting model for sure and one that will ultimately make for technical sense but enterprise woes in the future. I'm not sure if businesses will want to upload the data that would most benefit from the StatWing treatment. It looks like they have realized that though. Maybe aiming to cut their teeth on people who generate a lot of data and then take a stab at going enterprise via partnerships with other companies that already have a strong presence in big companies but are lacking in the analytics.

[+] jimmytucson|13 years ago|reply
Just out of curiosity, what made you think you would find the raw data on the client?
[+] pdog|13 years ago|reply
Any thoughts on why the ages of 26 and 27 appear to be the mode (especially for "social" startups)?
[+] Cyranix|13 years ago|reply
Speaking as a 27-year-old, it seems fairly reasonable to me. This is the age where you've had roughly 5 years of career experience in software or web development (if you went to college), and in the adolescence of your career you may have encountered a problem or a market that seems interesting and may also have a desire to be your own boss and avoid the tedium/politics/etc. of the places you've been so far (because, naturally, you won't make the same poor decisions!).

Not sure that I can speak to the prevalence of social startups in this age range, apart from the obvious "kids these days" take on it. Bear in mind, though, that it's not purely a representation of what 26- and 27-year-old founders are doing -- it's also reflective of YC's position.

[+] noahth|13 years ago|reply
As a member of that age cohort (but not a founder, much less one funded by YC), I feel qualified to speculate wildly about this!

We were exactly the cohort who got facebook off the ground - I joined in March of '04, spring of my freshman year - so while we may not be "social web natives" depending on your interpretation of such a noxious term, we're definitely well-acquainted with it. Probably well enough acquainted to feel that we know what features are missing, what niches are underserved, or something along those lines, with existing social services.

[+] ekianjo|13 years ago|reply
Statistical data? I mean, the few graphs are interesting, but that's VERY little data being displayed at all. I was expecting much more before clicking this link. Tufte would be mad at the abuse of space for the ridiculously small amount of data actually displayed.
[+] redcircle|13 years ago|reply
Statwing is doing it right: it is super easy to navigate to their home page from the blog, by clicking on their prominent logo, which brings you directly to their main page.
[+] qq66|13 years ago|reply
What exactly is a "social" company vs. a non-social one? Some companies are clearly in one bucket or another but I'm wondering what kinds of companies are near the boundary.
[+] lejohnq|13 years ago|reply
Sorry that's a little confusing, especially since most companies nowadays have social components. For this dataset we categorized social based on whether or not the social component is critical to their business.
[+] jedberg|13 years ago|reply
I love this product. It makes a hard concept easy, it looks pretty, and it's fast.

I can't wait to see what this team does next!

[+] achompas|13 years ago|reply
It makes a hard concept easy

I love what Statwing is doing here, but they could be providing people with enough information to be "dangerous."

The employee count vs. founder age analysis in another thread is a perfect example. Posters are trying to explain why employee counts rise with founder ages, when a glance at the plot suggests the effect results from two companies with abnormally-high (~4 standard deviations from the mean) employee counts.

Statwing is definitely pretty and fast! I'm curious, however, to see how they'll work to help people with diverse backgrounds interpret results.

[+] chimi|13 years ago|reply
Can you let us download the data and run our own analyses?
[+] glaugh|13 years ago|reply
Unfortunately not. While there's nothing particularly identifying about the dataset, we collected this data with the understanding that we wouldn't do that. Sorry!