There is a huge potential for language models to get close to messy text problems (many of which are in Excel and Sheet). I am the founder of promptloop.com - the author of this tweet has been an early user.
The challenge to making something like this, or Co-pilot / Ghostwrite, work well is about meeting users where they are. Spreadsheet users dont want to deal with API keys or know what temperature is - but anyone (like this tweet) can set up direct API use with generic models in 10 minutes. This document has all the code to do so ;). [1]
For non-engineers - or folks who need a reliable and familiar syntax to use at scale and across their org - promptloop [2] is the best way to do that. All comments in here are great though. We have been live with users since the summer - no waitlist. And as a note - despite the name "prompt engineering" has almost nothing to do with making this work at scale.
The most sensible use for AI that I can see at this time is for supporting humans in their work, but only where the system is set up so that the human has to do the work first, with the AI system looking for possible errors. For example the human drives the car, and the AI brakes when it senses dangerous conditions ahead, or the human screens test results for evidence of cancer and the AI flags where it disagrees so that the human might take another look. The opposite scenario with AI doing the work and humans checking for errors as is the case here will lead to humans being over reliant on less than perfect systems and producing outcomes with high rates of error. As AI improves and gains trust in a field, it can then replace the human. But this trust has to come from evidence of AI superiority over the long term, not from companies over-selling the reliability of their AI.
Humans are also less than perfect systems. Especially if they have to deal with monotonous tasks. A human might perform better on a 100 entries than an AI, but on 10 thousand? Of course you can distribute the workload, but you will balloon the costs (I'm talking about a future where GPT3 costs come down).
There must be a set of projects which are cost prohibited now due to having to pay humans but will become feasible exactly because of this tech. For a good portion of these, higher-than-human error rate will also be tolerable or at least correctable via a small degree of human intervention.
This won’t work because humans are lazy and fundamentally wired to expend the least amount of effort possible. Just the belief that you have an AI that will correct your mistakes, will make people expend less effort (even subconsciously), until it completely cancels out any additional error correction from the AI. Plus, workers will hate the fact that an AI could automatically do exactly what they are doing but they are doing it manually for “error correction”.
It only works the opposite way, where machines and AI handle the trivial cases and humans handle the non-trivial ones. Many people actually genuinely like to solve hard problems which require thinking and skill, most people strongly dislike mundane repetitive tasks.
Often the other way around is better. Example: Let AI give text classification a go then let a human perfect and check the result. Let AI
end your email but check you are happy with how it is physically phrased. Etc.
Human first scenarios will be more rare. And probably where the
human has to do it by law. Made up example: border control checking passport photos match face. Human checks and if they click OK then AI double checks.
I mostly agree. Happily, that's also what people will want as long as human participation is necessary. We'd generally prefer to write rather than correct an AI's writing, and prefer to drive rather than carefully watch an AI drive.
But when the AI is capable of something the person can't do (like Stable Diffusion creating images compared to me) the AI should take first chair.
The ability of language models to do zero-shot tasks like this is cool and all, but there is no way you should actually be doing something like this on data you care about. Like think about how much compute is going into trying to autofill a handful of zip codes, and you're still getting a bunch of them wrong.
I've used/added the USPS API into a system and it took practically no time at all to do it. I'm guessing that is significantly less time than building an AI tool. What's worse is that the thing that takes the least time to implement actually provides good data.
Microsoft and Google both have excellent formulas for dates - and are getting there for addresses. Right now - the most useful things you can accomplish in sheets center around what the underlying models are good at - general inference and generation based on text. Anything needing exact outputs should be a numeric formula or programmatic.
Non-exact outputs are actually a feature and not a bug for other use cases - but this takes a bit of use to really see.
Now wait for =deep_dream() or maybe =stable_diffusion() as a graph-generating function! (Graphs plotted with this function will of course zoom in infinitely but the further you go the more eyes and shiba dogs you'll notice in the corners ...)
Sadly current models are bad at plotting graphs with any kind of accuracy. We're still quite far off from getting them to do a properly labeled, colored pie chart with a legend.
Do I understand that correctly? When I have to create a spreadsheet like this, there are 2 options. Option 1 I write a table zipcode to state and use this table to generate my column. If I carefully check my table my spreadsheet would be okay. Option 2 I ask GPT3 to do my work. But I have to check the whole spreadsheet for errors.
I dealt with something similar. I was creating a large directory of childcare centres in Canada. I had thousands of listings with a url but no email address. I created a Mechanical Turk job to ask turkers to go to website and find an email address. Many came back with email addresses like admin@<<actualURL>>.com. After checking a few, I realized that the turkers were just guessing that admin@ would work and I'd approve their work. I ended up having to double check all the work.
This seems to be doing much worse than existing solutions: Google Maps probably wouldn't have gotten quite as many wrong if you just pasted those addresses into the search bar. However it could be interesting as a last shot if parsing the input failed using any other way.
"I tried parsing your messy input. Here's what I came up with. Please make sure it's correct then proceed with the checkout."
Of all the places spreadsheet is probably the one place you don’t want AI generated content. Half the time it’s financial info so sorta correct simply isn’t good enough
Spreadsheets are used for _waaaay_ more than just finances. I don't think it's anywhere near 50% finances. I can't recall where, but I saw a study from I think the 90s saying most of the spreadsheets they found were being used as Todo lists.
Maybe like 1 in my past 2y of many, many spreadsheets has been financing related. I think you might be overgeneralizing to an ungeneralizeably large group -- the set of all human spreadsheets.
I said it before: we need Copilot flash fill. Infer what the user wants the output to be from patterns and labels, so they can enter a few examples and then “extend” and automatically do the equivalent of a complex formula. e.g.
Formal | Informal
Lane, Thomas | Tommy Lane
Brooks, Sarah | Sarah Brooks
Yun, Christopher |
Doe, Kaitlyn |
Styles, Chris |
…
Automating something like this is extremely hard with an algorithm and extremely easy with ML. Even better, many people who use spreadsheets aren’t very familiar with coding and software, so they do things manually even in cases where the formula is simple.
I posed this exact question to character.ai's "Ask Me Anything" bot. It decided to redo the examples, too. The results:
> Lane, Thomas => Thomas Layne
> Brooks, Sarah => Sarah Brooksy
> Yun, Christopher => Chris Yun
> Doe, Kaitlyn => KD
> Styles, Chris => Chris Spice, Chris Chasm
I'm sure the bot overcomplicated an otherwise simple task, but I think there's always gonna be some creative error if we rely on things like that. It's funny though because these results are plausible for what a real person might come up with as informal nicknames for their friends.
GPT3 charges for every token read/written. What may be more useful is using GPT-3 not to manually run itself on every row, but to take the task and generate a sufficient function that fulfills the task.
The tasks on the first sheet is easily accomplished by flash fill in MS Excel and I suspect less prone to error. Not sure why flash fill is not more popular
This is compared to inadequate application of humans. it is not competing with people who know how to do regex and string parsing. It is for the people who put an office assistant to the task. It is better to inadequately apply AI here as opposed to inadequately apply a human who probably has more fun things to do.
What about subtle formatting differences (Country, Territory, Postal code is the norm. Doesn't have to be.). What if we applied this to hand written addresses? (Adding an OCR component).
I'm sure the USPS is already doing this and more, and if not, there's probably some AI jobs lined up for it :)
This is terrific stuff, honestly. I could see an Airtable integration being really quite useful. There were lots of times when I will run some quick scraping, some cleaning up via an Upworker, and then join against something else.
Here volume matters, and all misses are just lost data which I'm fine with. The general purpose nature of the tool makes it tremendous. There was a time when I would have easily paid $0.05 / query for this. The only problem with the spreadsheet setting is that I don't want it to repeatedly execute and charge me so I'll be forced to use `=GPT3()` and then copy-paste "as values" back into the same place which is annoying.
I would love to see a tool which uses GPT-3 to generate SQL from English.
Like: give me a list of all customers from London who purchased in January a laptop with more than 16GB of RAM and used a coupon between 10% and 25%. Sort it by price payd.
Everything required to make the tool already exists.
Just ran your exact query through OpenAI's Codex (model: code-davinci-002), and this was the result:
SELECT * FROM customers WHERE city = 'London' AND purchase_date BETWEEN '2019-01-01' AND '2019-01-31' AND product_name = 'laptop' AND product_ram > 16 AND coupon_percentage BETWEEN 10 AND 25 ORDER BY price_paid DESC;
The amount of 90% sensible, 10% ridiculously wrong computer generated crap we’re about to send into real humans’ brains makes my head spin. There’s truly an awful AI Winter ahead and it consists of spending a substantial amount of your best brain cycles on figuring out whether a real person wrote that thing to you (and it’s worth figuring out what they meant in case of some weird wording) or it was a computer generated fucking thank you note.
> The amount of 90% sensible, 10% ridiculously wrong computer generated crap we’re about to send
Agreed. Sooner or later a company is going to do this with its customers, in ways that are fine 95% of the time but cause outrage or even harm on outliers.
And if that company is anyone like Google, it'll be almost impossible for the customers to speak to a human to rectify things.
It depends on how people use the tools. For example the thank you note one -- if someone just prints off the output of this and sends it, yeah, that's bad.
But if someone uses this to do 90% of the work and then just edits it to make it personal and sound like themselves, then it's just a great time saving tool.
I mean, in this exact example, 70 years ago you'd have to hand address each thank you card by hand from scratch. 10 years ago you could use a spreadsheet just like this to automatically print off mailing labels from your address list. It didn't make things worse, just different.
You got it! After seeing a few tweet storms and articles that turn out to be GPT3 gibberish, I end up coming to HN more for my news because usually someone flags waste of time in the comments.
The software would save people 80% or the work and most are lazy enough to release it as is, instead of fixing the remaining 20%. That laziness will end up forcing legislation to flag and eventually ban or deprioritize all GPT content, which will result in a war of adversarial behaviors trying to hide generated stuff among real. Can’t have nice things!
In the sci fi movie "Her", the main character has a job with the "Beautiful Handwritten Letters Company", a service for the outsourcing of letter writing. It seemed bizarre to me, but now I can envision a future where people are so tired of not knowing if their letter is a fake generated by some descendant of GPT-3, and feel great relief knowing their note was instead written by a human third party.
Maybe? Is it really going to be all that different from the past thousand years where we've had 90% sensible, 10% ridiculously wrong[0] human-generated crap?
You need an AI that can understand when not to answer as opposed to some best effort guessing. Some of that input didn't have numbers in the right format so no zip code.
The hilarious one is changing the zip code to 90210. The AI basically accusing you of a typo because you obviously meant that more famous zip code.
General purpose AIs in situations where more targeted, simpler solutions are needed are going to be incredibly dangerous. Sure this AI can fly a plane 99.999% of the time, but every once in a while it does a nose dive because of reasons we cannot possibly understand or debug.
The author posted a follow up using a more advanced (and expensive) gpt3 model (davinci) which does a better job of parsing out the zip codes. It generally does a better job at everything, but if you can get away with one of the less expensive models then all the better.
If people want to put this sort of language in a thank-you note, I guess... I dunno, it always comes off as inauthentic to me, so I don't really care if I got mass produced or artisanal hand-crafted lies.
I remember in like 2007 or something, in the early days of Facebook, someone made a CLI interface to the FB API. And I wrote a random-timed daily cron job that ran a Bash script that checked "which of my FB friends have their birthday today", went through that list, selected a random greeting from like 15 different ones I'd put into an array, and posted this to the wall of person $i. Complete with a "blacklist" with names of close friends and family, where the script instead sent me an email reminder to write a manual, genuine post.
I used to have a golfed version of that script as my Slashdot signature.
pbmango|3 years ago
The challenge to making something like this, or Co-pilot / Ghostwrite, work well is about meeting users where they are. Spreadsheet users dont want to deal with API keys or know what temperature is - but anyone (like this tweet) can set up direct API use with generic models in 10 minutes. This document has all the code to do so ;). [1]
For non-engineers - or folks who need a reliable and familiar syntax to use at scale and across their org - promptloop [2] is the best way to do that. All comments in here are great though. We have been live with users since the summer - no waitlist. And as a note - despite the name "prompt engineering" has almost nothing to do with making this work at scale.
[1] https://docs.google.com/spreadsheets/d/1lSpiz2dIswCXGIQfE69d... [2] https://www.promptloop.com/
tomcam|3 years ago
Maybe not good to reveal customer names this way, unless they already disclosed it publicly
elil17|3 years ago
geoduck14|3 years ago
gbro3n|3 years ago
uh_uh|3 years ago
There must be a set of projects which are cost prohibited now due to having to pay humans but will become feasible exactly because of this tech. For a good portion of these, higher-than-human error rate will also be tolerable or at least correctable via a small degree of human intervention.
armchairhacker|3 years ago
It only works the opposite way, where machines and AI handle the trivial cases and humans handle the non-trivial ones. Many people actually genuinely like to solve hard problems which require thinking and skill, most people strongly dislike mundane repetitive tasks.
andreilys|3 years ago
Makes total sense to me.
quickthrower2|3 years ago
Human first scenarios will be more rare. And probably where the human has to do it by law. Made up example: border control checking passport photos match face. Human checks and if they click OK then AI double checks.
leereeves|3 years ago
But when the AI is capable of something the person can't do (like Stable Diffusion creating images compared to me) the AI should take first chair.
unknown|3 years ago
[deleted]
Imnimo|3 years ago
dylan604|3 years ago
mritchie712|3 years ago
Better one would be "based on these three columns, generate a cold outbound email for the person..."
it would suck to be on the receiving end of those, but the use case makes much more sense.
pbmango|3 years ago
Non-exact outputs are actually a feature and not a bug for other use cases - but this takes a bit of use to really see.
layer8|3 years ago
cstross|3 years ago
bee_rider|3 years ago
dj_mc_merlin|3 years ago
wesleyyue|3 years ago
Note: I'm the founder :) Happy to answer any questions.
Reply below with some sample data/problem and I'll reply with a demo to see if we can solve it out of the box!
trialskid86|3 years ago
mike256|3 years ago
giarc|3 years ago
unknown|3 years ago
[deleted]
chmod775|3 years ago
"I tried parsing your messy input. Here's what I came up with. Please make sure it's correct then proceed with the checkout."
sacrosancty|3 years ago
[deleted]
orblivion|3 years ago
CPLX|3 years ago
Havoc|3 years ago
cdrini|3 years ago
Maybe like 1 in my past 2y of many, many spreadsheets has been financing related. I think you might be overgeneralizing to an ungeneralizeably large group -- the set of all human spreadsheets.
unnah|3 years ago
Is there any reason to think the situation has substantially improved since then?
anon25783|3 years ago
contravariant|3 years ago
wstuartcl|3 years ago
armchairhacker|3 years ago
dwringer|3 years ago
> Lane, Thomas => Thomas Layne
> Brooks, Sarah => Sarah Brooksy
> Yun, Christopher => Chris Yun
> Doe, Kaitlyn => KD
> Styles, Chris => Chris Spice, Chris Chasm
I'm sure the bot overcomplicated an otherwise simple task, but I think there's always gonna be some creative error if we rely on things like that. It's funny though because these results are plausible for what a real person might come up with as informal nicknames for their friends.
scanr|3 years ago
vntok|3 years ago
swyx|3 years ago
Quarrelsome|3 years ago
camtarn|3 years ago
And for the second Kindle review, it summarized one point from the actual review, then completely made up two additional points!
Really impressive Sheets extension, but you'd have to be so careful what you applied this to.
unknown|3 years ago
[deleted]
jedberg|3 years ago
planetsprite|3 years ago
visarga|3 years ago
generates python, then executes
https://code-as-policies.github.io/
miohtama|3 years ago
tonmoy|3 years ago
baxtr|3 years ago
ACV001|3 years ago
a1371|3 years ago
krossitalk|3 years ago
I'm sure the USPS is already doing this and more, and if not, there's probably some AI jobs lined up for it :)
gpderetta|3 years ago
chime|3 years ago
https://en.wikipedia.org/wiki/DWIM
renewiltord|3 years ago
Here volume matters, and all misses are just lost data which I'm fine with. The general purpose nature of the tool makes it tremendous. There was a time when I would have easily paid $0.05 / query for this. The only problem with the spreadsheet setting is that I don't want it to repeatedly execute and charge me so I'll be forced to use `=GPT3()` and then copy-paste "as values" back into the same place which is annoying.
A4ET8a8uTh0|3 years ago
magic_hamster|3 years ago
greenie_beans|3 years ago
DeathArrow|3 years ago
Like: give me a list of all customers from London who purchased in January a laptop with more than 16GB of RAM and used a coupon between 10% and 25%. Sort it by price payd.
sandkoan|3 years ago
Just ran your exact query through OpenAI's Codex (model: code-davinci-002), and this was the result:
SELECT * FROM customers WHERE city = 'London' AND purchase_date BETWEEN '2019-01-01' AND '2019-01-31' AND product_name = 'laptop' AND product_ram > 16 AND coupon_percentage BETWEEN 10 AND 25 ORDER BY price_paid DESC;
I'd say it's pretty damn accurate.
skrebbel|3 years ago
PontifexMinimus|3 years ago
Agreed. Sooner or later a company is going to do this with its customers, in ways that are fine 95% of the time but cause outrage or even harm on outliers.
And if that company is anyone like Google, it'll be almost impossible for the customers to speak to a human to rectify things.
johnfn|3 years ago
jedberg|3 years ago
But if someone uses this to do 90% of the work and then just edits it to make it personal and sound like themselves, then it's just a great time saving tool.
I mean, in this exact example, 70 years ago you'd have to hand address each thank you card by hand from scratch. 10 years ago you could use a spreadsheet just like this to automatically print off mailing labels from your address list. It didn't make things worse, just different.
This is just the next step in automation.
mhh__|3 years ago
dzink|3 years ago
The software would save people 80% or the work and most are lazy enough to release it as is, instead of fixing the remaining 20%. That laziness will end up forcing legislation to flag and eventually ban or deprioritize all GPT content, which will result in a war of adversarial behaviors trying to hide generated stuff among real. Can’t have nice things!
nsxwolf|3 years ago
Domenic_S|3 years ago
brookst|3 years ago
[0] https://ncse.ngo/americans-scientific-knowledge-and-beliefs-...
whiddershins|3 years ago
roody15|3 years ago
unknown|3 years ago
[deleted]
est|3 years ago
mensetmanusman|3 years ago
ninefathom|3 years ago
Queue Fry "I'm scare-roused" meme...
breck|3 years ago
appleflaxen|3 years ago
29athrowaway|3 years ago
layer8|3 years ago
forgotusername6|3 years ago
The hilarious one is changing the zip code to 90210. The AI basically accusing you of a typo because you obviously meant that more famous zip code.
General purpose AIs in situations where more targeted, simpler solutions are needed are going to be incredibly dangerous. Sure this AI can fly a plane 99.999% of the time, but every once in a while it does a nose dive because of reasons we cannot possibly understand or debug.
dylan604|3 years ago
So of course a human developer made an AI that makes bad data.
harrisonjackson|3 years ago
ren_engineer|3 years ago
moralestapia|3 years ago
Only they aren't. Check the video again, they come out fine.
Edit: Oh dang, you're all right, several of them have wrong digits. :l
unknown|3 years ago
[deleted]
thatguymike|3 years ago
bee_rider|3 years ago
semi-extrinsic|3 years ago
I remember in like 2007 or something, in the early days of Facebook, someone made a CLI interface to the FB API. And I wrote a random-timed daily cron job that ran a Bash script that checked "which of my FB friends have their birthday today", went through that list, selected a random greeting from like 15 different ones I'd put into an array, and posted this to the wall of person $i. Complete with a "blacklist" with names of close friends and family, where the script instead sent me an email reminder to write a manual, genuine post.
I used to have a golfed version of that script as my Slashdot signature.
CobrastanJorji|3 years ago
unknown|3 years ago
[deleted]