top | item 41638987

Xkcd 1425 (Tasks) turns ten years old today

1003 points| ulrischa | 1 year ago |simonwillison.net | reply

467 comments

order
[+] ml_basics|1 year ago|reply
It's quite remarkable how much the goal posts have shifted when it comes to what is impressive with AI/ML. Things like this are a good reminder.

10 years ago the GAN paper came out and everyone was excited how amazing the generated image quality was (https://arxiv.org/abs/1406.2661)

The amount of progress we've made is mind boggling.

[+] ethbr1|1 year ago|reply
One quip I heard that stuck with is:

'Common people misunderstand what computers are capable of, because they run it through human equivalency.

E.g. a child can do basic arithmetic, and a computer can do basic arithmetic. A child can also speak, so surely a computer can speak.'

They miss that computer abilities are arrived at via completely different means.

Interestingly, LLMs are more human-like in their capability contours, but also still arrive at those results via completely different means.

[+] madaxe_again|1 year ago|reply
Man, I can’t tell you how much labour modern LLMs would have saved me at my business, 10-15 years ago.

An awful lot of what we ended up dealing with was awful data - the worst example I can think of was a big old heap of textual recipes that the client wanted normalised, so they could be scaled up/down, have nutritional information, etc. - about 180,000 of them, all UGC.

This required mountains of regexes for pre-processing, and then toolchains for a small army of interns to work through every. single. one. and normalise it - we did what we could, trying to pull out quantities and measures and ingredients and steps, but it was all such slop it took thousands of man-hours, and then many more to fix the messes the interns made.

With an LLM, it could have been done… more or less instantly.

And this is just one example of so, so many times that we found ourselves having to turn a heap of utter garbage into usable data, where an LLM would have been able to just do it.

Anyway. I at least managed to assuage my past torment by seeing the writing on the wall and stocking up on NVDA at about the time I was wrestling with this stuff.

[+] seydor|1 year ago|reply
Feels like the amount of progress decreased abruptly after openAI released chatGPT and everyone closed off their research in hopes of $$$$.
[+] parpfish|1 year ago|reply
i think the shift in expectations has a lot to do with a change in audience.

it used to be that fancy new ML models would be discussed among ML practitioners that had enough background/context to understand why seemingly little improvements were a big deal and what reasonable expectations would be for a model.

but now a new ML (sorry "AI") model is evaluated by the general public that doesn't know the technical background but DOES know the marketing hype. you can give them an amazing language model that blows away every language-related benchmark but they'll have ridiculous expectations so it's always a disappointment.

i'm still amazed when language models do relatively 'simple' things with grammar and syntax (like being able to understand which objects different a pronouns are referencing), but most people have never thought about language or computers in a way that lets them see how hard and impressive that is. they just ask it a question like 'what should i eat for dinner' and then get mad when it recommends food they dont like.

[+] bumby|1 year ago|reply
"People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years"

I've heard this applied to all kinds of human goals, but it seems apt for AI expectations as well.

[+] fouronnes3|1 year ago|reply
Arguably the goal post for AGI has moved about as much, if not more. One wonders if Turing reading a 2024 LLM chat transcript would say "but it's not really thinking!".
[+] Workaccount2|1 year ago|reply
It's clear people feel threatened.

Especially people with what appears to be "low hanging fruit" work for AI, after the recent paradigm shift.

[+] jessekv|1 year ago|reply
This one always felt off to me. Humans spent millennia working out the navigation problem.

The comic exists in this brief window of time where one task was finally "solved" and the other one was just getting started.

I'll add that if you think training models takes a lot of energy, try launching fleets of rockets to maintain an artificial satellite constellation.

[+] moritonal|1 year ago|reply
I always saw it commenting on the difference between what non-techies perceive as hard. Multiple times in my career a single off the cuff requirement in a meeting changed the estimate of a project by several months.
[+] tzs|1 year ago|reply
> This one always felt off to me. Humans spent millennia working out the navigation problem.

Even the navigation problem still offers some challenges that most apps fail to address. Consider the store locator function common on retail business websites and apps. They usually just compute the straight line distance from you to the stores and show the stores within some particular range, sorted by distance.

That's probably fine most of the time, but consider a place like Seattle and its surrounding areas. Suppose you are in Kingston, which is on the west side of Puget Sound about 5 miles away from the east side, which is the side Seattle is on.

The Walgreens store locator shows 10 stores when searching for stores near Kinston, and 9 of them are on the Seattle side of Puget Sound. Crossing the Sound there is a 30 minute ferry ride that costs around $20 each way if you are bringing your car.

The one it shows on the west side of the Sound is on Bainbridge Island and that is probably not the one someone in Kinston would go to. They would go to the one in Silverdale. It's actually closer to Kinston than the one on Bainbridge by road distance, but slightly farther away straight line.

The one in Silverdale is on their list, as are three in Bremerton and one in Port Orchard, which are all closer in terms of time and travel expenses to Kingston than are any of the ones on the Seattle side, but you only see those on the map if you hit the "load more" button. Once brings in Silverdale and a couple in Bremerton, and twice brings in the rest.

Similar for businesses whose site has an option to find items in stock locally. They often report an item is locally available, but it turns out to only be in stores across the Sound.

[+] mewpmewp2|1 year ago|reply
That is kind of the point. It seems like navigational thing would be much more complicated to the layman, yet anyone can do it in few hours now. While a thing that is seemingly more simple, would take years because it isn't easily solved and served as an API. Although it is now.
[+] rtpg|1 year ago|reply
I don’t think the point is about GPS, it’s about GIS. So it’s not the navigation problem, it’s a “is this point in this polygon” problem. Which is… a bit easier
[+] cortesoft|1 year ago|reply
Your observation doesn't contradict the point of the comic. It isn't about which tasks are difficult in totality, it is talking about which tasks are difficult with our current technology.

The idea is that non-software developers don't know which tasks current technology can solve trivially and which tasks can't be. Yes, the distribution of those tasks into the two buckets changes over time, but it is still not easily knowable by lay-people.

Everything we do today would be extremely difficult to re-create from scratch, but that doesn't mean it is hard to do - because we DON'T have to re-create it from scratch.

[+] hmottestad|1 year ago|reply
When I was a student we got a task where we had to spell check some text. This was super easy because we could fit the entire dictionary in memory.

Hadn’t always been that easy. Once upon a time someone was paging in and out their dictionary from a floppy disk. Not to forget about the compression they had to implement from scratch.

[+] josefx|1 year ago|reply
> Humans spent millennia working out the navigation problem.

And you think we spend any less time trying to identify food animals that lay tasty eggs?

[+] roomey|1 year ago|reply
That's a really good point.

It only makes sense if we ignore the "standing on the shoulders of giants bit".

[+] zulban|1 year ago|reply
I'd love to see a source but I'm pretty sure the energy, time, and money that has gone into compute infrastructure is far greater than our space programs.
[+] cubefox|1 year ago|reply
> I'll add that if you think training models takes a lot of energy, try launching fleets of rockets to maintain an artificial satellite constellation.

It's not the training what makes it difficult! It's the necessary research to invent machine learning algorithms which can be used to train a model to recognize birds. For multiple decades, this was way harder than maintaining a satellite constellation.

[+] m463|1 year ago|reply
Also another window...

The National Park Service was started August 25, 1916 (only 108 years ago)

:)

[+] weinzierl|1 year ago|reply
One could say, "This didn't age well." but I think the real point of "it can be hard to explain the difference between the easy and the virtually impossible" is only reinforced by an almost ironic twist that switched the hard and easy around. Who would have thought ten yeas ago?
[+] dools|1 year ago|reply
> Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive.

Case in point: the other day my daughter was doing a presentation and she said "Dad can you help me find a picture of the word HELLO spelled out in vegetables?"

I was like "CAN I!!?!?! This sounds like a job for ChatGPT".

I'll tell you what: ChatGPT can give you a picture of a cat wearing a space suit drinking a martini but it definitely cannot give you the word HELLO spelled out in vegetables.

I ended up getting it to give me each individual letter of the alphabet constructed with vegetables and she pasted them together to make the words she wanted for her presentation.

[+] dvh|1 year ago|reply
Just detect if photo contains common bird color and ship it. We'll fix it later when we decimate the competitors.
[+] itslennysfault|1 year ago|reply
I did this tutorial series to try to get some context/foundation in deep learning, and the first lesson was building the bird thing from this comic. It was really easy and fun. The whole course is great. Highly recommend for anyone who has a programming background and wants to get a solid intro to deep learning.

https://course.fast.ai/

[+] Dave_Rosenthal|1 year ago|reply
What no-one is pointing out is that LLMs have made almost as much progress on the first part of the request as the second! ChatGPT writes me a is_point_in_national_park function and points me to the relevant shapefile in like ~30 seconds. That's a few hundred times speedup of the "few hours" referenced in the comic.
[+] spaceman_2020|1 year ago|reply
Vision LLM is really remarkable

Had a project that involved describing and cataloging over 20,000 images.

Traditional method using real people would take months and crap load of money (the descriptions have to be customer-readable)

OpenAI’s vision API does it for cents per image. Must have spent under $200 for the whole thing

[+] BWStearns|1 year ago|reply
I wonder if we'll ever hit a critical mass of technical literacy where this kind of misunderstanding largely disappears. Ten years ago I would have said yes. Now I think the advances in UX/UIs and the appification of everything have insulated the median person from the details. That's good as far as individual products go, but in aggregate might lead to unrealistic expectations. I've heard younger folks ask questions about "why doesn't x just do y" that I previously could only have imagined my very non-technical parents' cohort asking.

At least in the 80s, when computers roughly equalled magic for much of the population (looking at you Wargames!), most people didn't really have to interact with it. Their expectations about computers were roughly as important as my expectations about alien life. But I'm afraid that magical thinking about tech will be of greater consequence both individually and societally.

[+] stefanos82|1 year ago|reply
@simonw, in case you read this, can I kindly ask you a tiny favor please?

Would it be too much to ask you to start livestreaming any coding of yours that can be shared publicly?

I would love to learn so many things from you, especially around your current ecosystem, that is Python, SQLite (data), and JavaScript.

[+] appendix-rock|1 year ago|reply
I’d here to hear what your definition of a non-tiny favour is!
[+] Almondsetat|1 year ago|reply
GPS is a ready-made infrastructure that took decades of hard work to build and maintain. When the comic was made, image recognition didn't didn't have the same done for it, but now with pre-trained models everyone can do it in 5 minutes too.
[+] cung|1 year ago|reply
It took a bit over five years, but now checking if it’s a photo of a bird is the easier task.
[+] qwertox|1 year ago|reply
Is it? I assume that you are thinking of using a 3rd-party API endpoint to which you upload the image so that the service decides for you if it is a bird and which kind of bird it is. Or you use something like Firebase.

Because if that is the way you'd solve this problem, then just sending lat/lon to a service to determine if it is in a national park is even easier, as it's just a GET request.

I'm still unsure about what would be harder to set up locally.

[+] unsigner|1 year ago|reply
"easier tasks" is arguable and arguably wrong

"task about which you will find more easy-looking tutorials hiding the complexity under a blanket of 3rd party code and services" is better

[+] marricks|1 year ago|reply
For a person to set up but definitely not how many cpu cycles are burned
[+] thaumasiotes|1 year ago|reply
> but now checking if it’s a photo of a bird is the easier task.

That depends on whether you care about getting the answer right. If you don't, it was always the easier task.

If you do, Seek by iNaturalist still can't do this job, and that's the only thing Seek is supposed to be able to do.

[+] consp|1 year ago|reply
At what confidence level are we talking about? With these over simplified questions (as in the xkcd) my guess would be the asker assumes 100%.
[+] munchler|1 year ago|reply
Your phone already does both automatically, so I’d call it a draw.
[+] riiii|1 year ago|reply
Back in the day I had a manager that didn't understand programming.

To him, it was just one button that would open this small info window. Just one button. Just one window.

It took him weeks to understand that we didn't have the data ready he wanted to show. We could do it, but it would take weeks of research and development.

[+] thih9|1 year ago|reply
Note that we are still within Randall’s expectations - the initial estimate for the project at the time was five years and ten years later there is a publicly available solution.

It would have been interesting to see the reverse - the problem becoming trivial in less time then the project’s estimate.

[+] ryzvonusef|1 year ago|reply
Iirc, Flickr had implement bird detection within a few months of this xkcd coming out?

EDIT: A month, https://code.flickr.net/2014/10/20/introducing-flickr-park-o...

It's so weird them explaining 'Deep Networks'. Language on AI has definitely changed in the past ten years.

Also, hilariously, the page they created to demonstrate this (http://parkorbird.flickr.com/) no longer works. Oh, how time flies.

explain XKCD page for good measure: https://www.explainxkcd.com/wiki/index.php/1425:_Tasks

[+] amp108|1 year ago|reply
To be fair, 10 years ago the programmer said "I'll need a research team and five years".
[+] righthand|1 year ago|reply
> Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive.

That’s because the idea of it being a super human intelligence (an undefined metric) is being sold. So you have to lie and say “it’s amazing, it’s going to change everything”. If I tell you “it’s okay and is often wrong” you wouldn’t buy my product would you? This is just to say I can’t blame that on easy/hard task agency, specifically.

=== addendumb ===

“it’s okay and is often wrong” Sounds like working with my junior coworker who I don’t enjoy pairing with. If I said “it’s impressive how the results are to the level of a junior engineer” you sell me on your product.

[+] 1f60c|1 year ago|reply
One thing I’ve always felt is that the relative difficulty of each task seems to have flipped? I could write a bird classifier in my sleep using fastai, but I have no idea how to do a GIS lookup.
[+] brazzy|1 year ago|reply
That has nothing to do with the difficulty of the task and everything with what APIs you are personally familiar with.
[+] moffkalast|1 year ago|reply
An LLM will tell you how to do the GIS lookup, but ironically as privacy laws become better it will genuinely become a harder and harder task unless the user explicitly wants you to do it.
[+] solardev|1 year ago|reply
It's not too bad.

Just pop up a dialog for the user. "Are you in a national park?"