I just wanted to chime in that we're a YC company as well (S16), and I'm thankful to the HN community for having been supportive through our whole journey.
It's a great idea, but I can't believe that the market is that large for this kind of data for 2 reasons: 1 - there's certainly a point of diminishing returns; and 2 - having good, clean data that's proprietary is a _huge_ differentiator. If I am the leader in autonomous driving, I doubt I'd want to pay someone else to help them train models that will help my competitors.
The problem I see with wading into other subfields (like my own) that need high quality training datasets, is that the datasets may be proprietary, and may not really overlap that much between companies in the same industry. For example, assembly line datasets for companies making almost the same product may be vastly different. I'm really struggling to see how you can possibly achieve the same scale in other industries.
Is it weird sharing the same name as a fashion icon ;)
And I'm curious about your ML "stack". Particularly the chicken and egg problem. Are you using something like Tensorflow with pre-trained binaries, perhaps from a vendor? Or is it 100% proprietary. Thanks!
Congratulations on your fast growth! It is always great to see examples of companies like yours actually solving real-world problems in AI with original ideas and obtaining large clients that rely on your work.
I'm really looking forward to more of what Scale will do in the future!
I saw someone on Twitter post that this was a real life “Not Hotdog” from HBO’s Silicon Valley, though ironically this company doesn’t actually use AI or ML at all it’s just scaled human contract workers.
There’s some social commentary in there somewhere.
We do use AI and ML to help making the labeling process more efficient, but you are correct we do have scaled human insight that ensures very high quality.
One difference from "Not Hotdog" is that our data is used to power the algorithms of other AI/ML companies like OpenAI, Waymo, Lyft, etc., so it's imperative that we have impeccable quality. That necessitates humans to ensure accuracy, particularly in safety-critical applications like self-driving cars.
I don't understand startup valuations well, so would appreciate someone more knowledgeable throwing some light on how these valuations are made. Would I be in the ballpark in assuming that they have a Sales ARR of $125M. At a sales multiple of 8x (for SaaS cos) makes them worth $1B.
The $125M is around 12 large customers with contracts of $10M each, which buys them services of 2500 labeling contractors for 2000 hrs/year at $2/hr ($4K/yr).
At some point they will stop being a services company which carry a low multiple and switch to automated labeling without contractors (ala self driving cars) or develop some unique IP that they sell as a service?
One major risk is that their customers simply build teams in-house. Microsoft has had a very large team for years (and MSR / Ofer Dekel has actually published a lot of useful research on how to handle “crowdsourced” labels). Companies have been building productive off-shore labeling / moderation teams since the early days of Crowdflower. At some point, it’s not just the cost that makes sense, but rather the Product team wants a reliable workforce that they can control.
Another risk is that the well-funded self-driving customers go belly-up. However, one important facet is that dead players don’t release much data. MobilEye has a vast dataset (including images from not just Tesla but other automakers) but that data isn’t going anywhere. Neither is Nvidia’s 180PB of HD recordings. (Release or transfer in part requires dealing with PII of the people in the recordings. Now if only the offshore labelers weren’t handed PII for free...).
The valuation is likely a forward-looking bet on AI as whole versus the current suite of contracts. Anybody using an off-the-shelf model will want some labels after their first proof of concept. I wouldn’t argue that the math makes sense but rather that demand does look underserved.
> It’s built a set of software tools that take a first pass at marking up pictures before handing them off to a network of some 30,000 contract workers, who then perform the finishing touches.
Bit of an AI novice here, I did Norvig's course a few years ago and never worked in the field, but how can a machine take a "first pass" at labelling without being trained? What information is it using to apply labels to the first set of data? How does this approach differ from a conventional classifier? Would the initial guesses essentially be random?
The business model is, initially, selling human labeling services to owners of data (like Uber or Google), using third world cheap labor to keep costs low. This is very much like a call center service.
Once a sufficiently large corpus of human labeled data is available (across clients and datasets probably), that labeled data is used to train a 'first pass' labeling system.
It then becomes a virtuous cycle. Now the labeling is done in two phases. The first pass system makes its best guess, which is then reviewed by the existing human work force. Over time the first pass labeler gets better and better, till only very tricky/borderline cases need human intervention.
The end game is anyone's guess. Pretty clever biz model hack.
So they have a set of pretrained model that they are running as first pass and either verify the result with humans, or filter out the low confidence results and give them to humans.
Note that this creates a positive feedback loop. I.e. as they get more results they can improve the initial stage.
I'm a little confused about what the business model here is. It sounds like they are selling labeled data to companies, and doing this by "label[ing] most of the objects automatically" and then having humans review these labels. So does this mean they are using some unsupervised method to label data, and then selling that to people who want to train supervised models? Why aren't they instead just beating out the people they sell to by solving the same problems without labeled data?
Presumably they have a supervised model that they've trained on all their labeled data so far (possibly pooled across clients). They'd use this to estimate labels for their data, and then have humans correct it. They're basically doing the standard supervised data training loop.
If I had to guess, the long term plan probably is to move up the stack and sell the models to their clients.
I used to train AI to help researchers find more relevant papers at http://iris.ai This was nothing more than just classifying. Would this kind of opportunity be available for data remotaskers at Scale. Best regards for groundbreaking work
We have many clients who have switched from Hive. There’s usually a step change improvement in quality and scalability—up to 10x improvement in error rates.
I really dislike this sort of journalism. Theranos was founded by a 19 year old too. That one didn't work out so well. Was it because the founder was so young? The board so oblivious? (a bit of both if you read the book)
What does it really matter how "old" the founder is, does the business have a workable business plan? Can it be profitable? Do people pay enough money for its goods and services to return a net income? Those are interesting questions. That it was started by a teenager is not, to my way of thinking, particularly relevant.
I'd much prefer that the article focus on these things which helps us understand the value that they bring to the market and what makes them unique.
Agreed. I don't object to company or founder exposés such as these, but I do find the article's focus on the founders' age to be misplaced and distasteful. I also think that these kinds of breathless adorations of the latest "wunderkind" to contribute to the widespread perception/reality of ageism within the industry.
I understand it makes for a more click-attracting title, but imagine if the title was "Silicon Valley's Latest Unicorn Is Run By an Asian" - it sounds offensive, at least to me. Preferably the articles would focus on something other than a protected class.
Note that age is not technically a protected class, only "advanced age", defined as 40+, but I personally think that making unbounded age a protected class would be beneficial for everyone. Refusing to hire someone because they're "too young" is equally as offensive as refusing because they're "too old".
Startup business is show business, hence obsession with youthfulness and cheap drama.
Customers flock to fame, press and investors do as well - they “know” where the customers will be so they go there, increasing the chances of success in a positive feedback loop. That’s how show business works.
I think it depends. If the age is being used to promote ageism, then this is not good. If however, the age is used to encourage young people to start businesses, that's positive. So... context and tone?
I think we have to just come to terms with the reality that ad-dependent journalism is inherently click-baity. Anyone who says otherwise is delusional or lying.
In essence, the company pays third worlders a pittance to transfer humanity's skills to the machine. The skill transfer is limited to what can be done with a mouse and screen, but since that's where most human ability is currently manifested, it's hardly a limitation. What happens to the serfs once the transfer is complete? Do they realize they are exchanging temporary wages for eternal futility?
I like how the investors rationalized this devil's deal and the usurpation of the poor: "If you could be pulling a rickshaw or labeling data in an air-conditioned internet café, the latter is a better job."
I'm an American who began doing online gig work (for a different company, not Scale) while homeless. It allowed me to pay down debt and get back into housing under circumstances where a normal job was out of the question.
This worked in part because my income was portable, so I was able to take a train to a more affordable area to get a place within my limited budget. This ability to move at will and take my income with me was historically largely limited to the Jet Set and comfortably well-off retirees.
For many people, doing gig work is a tremendous opportunity with a very big upside. It can be a huge improvement in both their standard of living and quality of life.
Most people decrying such labor arrangements aren't doing anything whatsoever to offer a better alternative. Color me unimpressed.
I work at Scale - I've met a bunch of people that work on our platform, and seeing the impact that it's had on their lives is actually a huge source of inspiration to me. There's a writeup highlighting some of their stories at https://scale.com/blog/positive-externalities - based on my personal experience, I can say that it's not bullshit.
[+] [-] ayw|6 years ago|reply
I just wanted to chime in that we're a YC company as well (S16), and I'm thankful to the HN community for having been supportive through our whole journey.
[+] [-] woeirua|6 years ago|reply
The problem I see with wading into other subfields (like my own) that need high quality training datasets, is that the datasets may be proprietary, and may not really overlap that much between companies in the same industry. For example, assembly line datasets for companies making almost the same product may be vastly different. I'm really struggling to see how you can possibly achieve the same scale in other industries.
[+] [-] ArtWomb|6 years ago|reply
Is it weird sharing the same name as a fashion icon ;)
And I'm curious about your ML "stack". Particularly the chicken and egg problem. Are you using something like Tensorflow with pre-trained binaries, perhaps from a vendor? Or is it 100% proprietary. Thanks!
[+] [-] ignoramous|6 years ago|reply
If I may, can you please tell us:
As your business has grown, what has changed the most in terms of how you run it?
What were some of the biggest challenges you've overcome and any major obstacles you see in the near future for the business?
Who are your mentors?
Thanks.
[+] [-] barakados|6 years ago|reply
What principles/rules did you stick to when growing your company that you thought helped improve the culture/profits?
Thanks again for acknowledging the Hacker Network community!
[+] [-] rvz|6 years ago|reply
I'm really looking forward to more of what Scale will do in the future!
[+] [-] opportune|6 years ago|reply
[+] [-] atemerev|6 years ago|reply
[+] [-] tempsy|6 years ago|reply
There’s some social commentary in there somewhere.
[+] [-] ayw|6 years ago|reply
One difference from "Not Hotdog" is that our data is used to power the algorithms of other AI/ML companies like OpenAI, Waymo, Lyft, etc., so it's imperative that we have impeccable quality. That necessitates humans to ensure accuracy, particularly in safety-critical applications like self-driving cars.
[+] [-] choppaface|6 years ago|reply
[+] [-] vadym909|6 years ago|reply
The $125M is around 12 large customers with contracts of $10M each, which buys them services of 2500 labeling contractors for 2000 hrs/year at $2/hr ($4K/yr).
At some point they will stop being a services company which carry a low multiple and switch to automated labeling without contractors (ala self driving cars) or develop some unique IP that they sell as a service?
[+] [-] choppaface|6 years ago|reply
Another risk is that the well-funded self-driving customers go belly-up. However, one important facet is that dead players don’t release much data. MobilEye has a vast dataset (including images from not just Tesla but other automakers) but that data isn’t going anywhere. Neither is Nvidia’s 180PB of HD recordings. (Release or transfer in part requires dealing with PII of the people in the recordings. Now if only the offshore labelers weren’t handed PII for free...).
The valuation is likely a forward-looking bet on AI as whole versus the current suite of contracts. Anybody using an off-the-shelf model will want some labels after their first proof of concept. I wouldn’t argue that the math makes sense but rather that demand does look underserved.
[+] [-] pkaye|6 years ago|reply
Machine learning indeed.
[+] [-] ayw|6 years ago|reply
You can see some videos of what this looks like in this Twitter thread: https://twitter.com/BW/status/1158407524216909826
[+] [-] Areading314|6 years ago|reply
[+] [-] scalrname|6 years ago|reply
[+] [-] kareemsabri|6 years ago|reply
[+] [-] plinkplonk|6 years ago|reply
Once a sufficiently large corpus of human labeled data is available (across clients and datasets probably), that labeled data is used to train a 'first pass' labeling system.
It then becomes a virtuous cycle. Now the labeling is done in two phases. The first pass system makes its best guess, which is then reviewed by the existing human work force. Over time the first pass labeler gets better and better, till only very tricky/borderline cases need human intervention.
The end game is anyone's guess. Pretty clever biz model hack.
[+] [-] streetcat1|6 years ago|reply
Note that this creates a positive feedback loop. I.e. as they get more results they can improve the initial stage.
[+] [-] khannate|6 years ago|reply
[+] [-] fnbr|6 years ago|reply
If I had to guess, the long term plan probably is to move up the stack and sell the models to their clients.
[+] [-] jonas_kgomo|6 years ago|reply
[+] [-] swampthinker|6 years ago|reply
[+] [-] ayw|6 years ago|reply
[+] [-] sindergirl|6 years ago|reply
[+] [-] streetcat1|6 years ago|reply
From a technical perspective, can someone just post labelling task to mechanical Turk? what is the difference here?
[+] [-] troquerre|6 years ago|reply
[+] [-] MetalGuru|6 years ago|reply
[+] [-] yters|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] scottrogers86|6 years ago|reply
[+] [-] ChuckMcM|6 years ago|reply
What does it really matter how "old" the founder is, does the business have a workable business plan? Can it be profitable? Do people pay enough money for its goods and services to return a net income? Those are interesting questions. That it was started by a teenager is not, to my way of thinking, particularly relevant.
I'd much prefer that the article focus on these things which helps us understand the value that they bring to the market and what makes them unique.
[+] [-] minimaxir|6 years ago|reply
[3 points] Scale AI (YC S16) raises $100M at $1B+ valuation to go beyond AI data labeling: https://news.ycombinator.com/item?id=20615657
[11 points] Scale (YC S16) Raises $100M from Accel and Founders Fund at $1B Valuation: https://news.ycombinator.com/item?id=20614672
[+] [-] gervase|6 years ago|reply
I understand it makes for a more click-attracting title, but imagine if the title was "Silicon Valley's Latest Unicorn Is Run By an Asian" - it sounds offensive, at least to me. Preferably the articles would focus on something other than a protected class.
Note that age is not technically a protected class, only "advanced age", defined as 40+, but I personally think that making unbounded age a protected class would be beneficial for everyone. Refusing to hire someone because they're "too young" is equally as offensive as refusing because they're "too old".
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] DenisM|6 years ago|reply
Customers flock to fame, press and investors do as well - they “know” where the customers will be so they go there, increasing the chances of success in a positive feedback loop. That’s how show business works.
San Francisco is the new Hollywood, basically.
[+] [-] jvalencia|6 years ago|reply
[+] [-] tempsy|6 years ago|reply
[+] [-] notadoc|6 years ago|reply
Me too. The era of lightweight clickbait headlines can't end soon enough.
[+] [-] invalidOrTaken|6 years ago|reply
Harshness is in order---regarding the business plan. But the age is a total red herring.
[+] [-] codesushi42|6 years ago|reply
Don't fret, it is not likely to last much longer.
[+] [-] notrlyai|6 years ago|reply
[deleted]
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] sindergirl|6 years ago|reply
I like how the investors rationalized this devil's deal and the usurpation of the poor: "If you could be pulling a rickshaw or labeling data in an air-conditioned internet café, the latter is a better job."
[+] [-] DoreenMichele|6 years ago|reply
This worked in part because my income was portable, so I was able to take a train to a more affordable area to get a place within my limited budget. This ability to move at will and take my income with me was historically largely limited to the Jet Set and comfortably well-off retirees.
For many people, doing gig work is a tremendous opportunity with a very big upside. It can be a huge improvement in both their standard of living and quality of life.
Most people decrying such labor arrangements aren't doing anything whatsoever to offer a better alternative. Color me unimpressed.
[+] [-] samclearman|6 years ago|reply
[+] [-] Jagat|6 years ago|reply
and this isn't wrong either
"If you could be pulling a rickshaw or labeling data in an air-conditioned internet café, the latter is a better job."