Replacing a SQL analyst with 26 recursive GPT prompts

[+] mtc010170|3 years ago|reply

Hmmm... I'm surprised I'm not seeing anyone else question the validity of this taking "2 hours" Given that it's written on the blog for the product it's using, this reads to me a lot like a pure sales pitch. They want us to believe if you use Patterns (which is neat), your company will be much more cost-effective.

I'm not saying that's bad - that's probably the right thing to do with their company blog, and it's cool nonetheless. But I do get a little tired of people putting stuff out there like this that warps (some peoples) perception around how long things actually take. We wonder why, as an industry, we misjudge timelines on things left and right.

Even if we take it at face value, this is a person who's intimately familiar with this product. So sure, it's easy to set things up when we've done it a bunch of times. If you were doing this, solving the novel problem that you're faced with, is that how long it would take? Plus that's not really what most of us get paid to do. We have to learn on the fly and figure stuff out as it comes.

So rather than have the provocative headline and conclusion, like a lot of other people have commented... this is really something that could amplify that $50/hour employee, not take their job away. And maybe we shouldn't read into the alleged speed so much. YMMV.

[+] kvh|3 years ago|reply

Author here, I’ve updated the post. The first draft of this app and blog post took me two hours, but I kept coming back with new ideas and tweaks throughout the week. By the end, I’d certainly spent more than two hours (more like 8?), so you’re right, I just failed to update the post. The main point stands — it’s surprisingly good for the amount of effort put in (although unclear how much more juice you could get out of gpt with more effort. Clear diminishing returns)

[+] sosuke|3 years ago|reply

I love the time estimates. 2 hours after spending 3 weeks figuring out how to get everything playing nicely together.

[+] Julesman|3 years ago|reply

I haven't found GPT this reliable for coding. I've been maxing my hourly usage of ChatGPT since it launched and then switching to CoPilot and I have lots of good things to say about it. But reliability is not one of them.

It has a tendency to ignore instructions, as mentioned, but also to get hung up on certain approaches or to use a different approach each time its asked. I'd guess it's very reliable for text generation. But for code, I'm pretty sure the quality of the result would vary quite a from instance to instance.

This could very well cut the work needed greatly. But it doesn't come close to replacing anyone. ... Yet. Give it two years.

[+] onos|3 years ago|reply

>> They want us to believe if you use Patterns (which is neat)…

What do they do? I can’t tell.

[+] MuffinFlavored|3 years ago|reply

> if you use Patterns (which is neat)

Wasn't sure of their proposition/hadn't heard of them.

> Run and deploy web apps, task queues, massively parallel compute jobs, machine learning models, GPUs, and much more with a single unified framework for code and infrastructure.

[+] tragomaskhalos|3 years ago|reply

Anyone who's been asked more than a couple of times for data that requires a non-trivial bit of ad-hoc SQL will know the sinking "oh shit" feeling that comes when you subsequently realise you borked the query logic in some subtle way and have accordingly emailed out a completely bogus answer/report.

From the article it doesn't seem that GPT is significantly better or worse than a human in this regard, although an experienced analyst would over time decrease their number of such errors.

The best fix imo is to slather a battery of views over your data to minimise the risk of getting the joins wrong, and it'd be interesting to see how that approach could improve the bot's quality.

[+] TeMPOraL|3 years ago|reply

I've done too little SQL to be traumatized by it, but I experienced it plenty of times doing more complex data processing on the backend.

I would e.g. spend a week on first discussing a solution, then researching both theoretical background and relevant libraries, and finally writing code to do the math and render the answers (probability distributions). I eyeball them, they seem to all match expectations, so I declare victory, post some nice screenshots to team Slack, and close the ticket.

Feeling high on success, I expect the weekend to start nicely - except, winding down on that Friday evening, I follow that little thought at the back of my head, that I realize was there since the start, quietly poking for my attention. I run one more sanity check, different from the ones I did before. The results come out completely wrong. Only then it dawns on me that I made a terrible mistake early on - I misunderstood how to implement certain step and forgot about a theoretical assumption that must hold. And so, instead of a relaxed weekend, I now dread having to explain to everyone on Monday that a) I actually need another week for this, and b) the nicely looking graphs they saw are all pure bullshit, and the things that looked correct only seemed so, and only by accident.

After a few such cases - including one where the higher-ups were nonplussed, because after I corrected the mathematical errors someone made a long time ago, the pretty graphs that looked nice on the demo stopped looking so nice - I learned to sanity-check my understanding of the problem as early as possible, and then also sanity-check the code I'm writing, and then keep testing as I go. It's all too easy to write "trivial" code that "obviously" does what you think it does, only to realize some time later that it's throwing out an entire dimension of the input data, and nobody noticed because test examples are too trivial, results look roughly correct, and no one is paying attention.

[+] jsdwarf|3 years ago|reply

To me this boils down to Churchhills adage "Don't trust a statistical report unless you fabricated it by yourself". There are so many subtleties when translating business questions into SQL e.g. was the average deal size in the post's "trend" example computed using average/median/geometric mean formula? Should the trend only contain historic or also projected values. No serious report comes without data definitions, then its an exercise to the reader to spot flaws there.

[+] fuy|3 years ago|reply

> slather a battery of views over your data

One needs to be careful with this approach in terms of query performance, though. Using simple views with a couple of joins and some filtering is fine, but be very wary of stacking more than 1-2 layers of views calling each other, and especially of using things like aggregates/window functions in views, if these views then are then used as building blocks for more complex queries.

That's a recipe for breaking query optimizers and ending up with very bad query plans.

[+] qwytw|3 years ago|reply

> doesn't seem that GPT is significantly better or worse than a human in this regard

Probably, but I think a human is much more likely to realize that they made a mistake and good-luck convincing GPT that it's wrong about something non trivial.

[+] sublinear|3 years ago|reply

> The best fix imo is to slather a battery of views over your data to minimise the risk of getting the joins wrong, and it'd be interesting to see how that approach could improve the bot's quality.

You're correct, but organizing queries into views and CTEs are an implementation detail that would be done by the dev in practice, but would not show up in a stackoverflow answer because it's probably not relevant to just answering the question.

I love how language equivalences such as informal vs formal code can throw off the quality of what the chat bots generate!

[+] arrosenberg|3 years ago|reply

You didn't replace a SQL Analyst, you just gave them a query generator. End data consumers don't understand the data model, assumptions, quirks, etc. If they fire the analyst, they are going to wind up drawing a lot of bad conclusions on anything more complicated than simple aggregations.

[+] ryanjshaw|3 years ago|reply

Maybe 5% of the business/operations people I've worked with would even want to do this. The rest prefer a smart human to walk them through building a report - most of the time this is because they don't actually know what they need, and they actually need an expert to sit down and figure it out with them.

[+] berkle4455|3 years ago|reply

You just described most SQL analysts I've worked with

[+] mrbungie|3 years ago|reply

You can easily add more components to start thinking about models/assumptions/etc, like adding interfaces to Data Catalogs (i.e. Glue Data Catalog).

As part of a POC I made, I built a similar bot without recursion for debugging and iterative query building though. It does the following:

- It predicts most probable entities from the question. - Searches AWS Glue Data Catalog for the most probable/useful tables. - It builds an Athena SQL Query from N most useful tables.

It obviously get it catastrophically wrong sometimes, but hell, it was a 3 hour POC. If you can make better indices that map entity->table relationships it should get better at searching tables. Add this kind of recursive/iterative debugging of queries, and you get at least something near a junior-level SQL Analyst.

These kind of bots are analogous to Stable Diffusion, they DO need a good prompter/puppeteer/solution-verifier. Most non-senior Data Analysts also need one anyways.

[+] seandoh|3 years ago|reply

We're building https://www.olli.ai/ to help with this. User's can ask questions in natural language, but more importantly, Olli can suggest what questions to ask based on the data.

[+] 6510|3 years ago|reply

But it does lower the bar for asking questions a lot. Before you ask a human you have to do a sanity check. Most obvious is asking the same as last week but it could also be something readily available elsewhere.

[+] jawns|3 years ago|reply

The problem is that you never really know whether the chat bot gets it right or terrifically wrong unless you already know how to do the task yourself.

And in some cases, paying an analyst $50/hr. for a higher degree of confidence than you can get from a $1 chat bot is still very much worth it.

The stakes are higher, too. If the chat bot gets it wrong, what are you going to do, fire it? There goes a small trickle of revenue to OpenAI. Whereas if the analyst gets it wrong, there goes their livelihood.

That said ... this will help the $50/hr. analyst improve their productivity!

[+] mjburgess|3 years ago|reply

In my experience, in many cases, the so-called "first draft" (of code, content, etc.) is more just helpful for a mental warm-up. It isn't actually suitable.

The time it takes to write/code/etc. isnt in the typing, that's trivial. It's in modelling the problem (, audience) and solving it.

ChatGPT isnt in the game of modelling problems and solving them, it possesses only superficial semantic characteristics of the questions posed

That its answers satisfy the requirements would require ChatGPT to ask you questions (and much else besides)

[+] gremlinsinc|3 years ago|reply

Imagine you keep someone on staff at 50/hr, or you just consult based on need, i.e. 15 minute consults are 1/4th the cost, and you simply just ping them to double-check queries you've ran, that look right but maybe could be optimized. You've just cut back from maybe 20/hours to 8 per week needed for that freelancer.

[+] spaceman_2020|3 years ago|reply

That's exactly what it is - a productivity booster. The point remains the same: it's going to theoretically put some SQL analysts out of work. If the $50/hr analyst is 10x faster at his job, you can replace 10 analysts with just 1.

[+] themodelplumber|3 years ago|reply

> $50/hr

Is this somewhere's going rate for a good data analyst?

[+] g051051|3 years ago|reply

> this is shockingly close to replacing an entire role at companies with only a couple hours of effort.

and

> It seems like there’s almost no limit to how good GPT could get at this.

I don't see how that's a valid conclusion given the results. 2 simple things right, moderate to difficult things wrong? Hardly a ringing endorsement.

[+] Eji1700|3 years ago|reply

Glad i'm not the only one who saw this.

Feels extremely overblown. Very impressive, don't get me wrong, and is yet another bridge closer to letting business side do plain text queries.

But that said, stuff like this has existed for awhile, and it's certainly not replacing analysts. For a start, this assumes a VERY clean model, which is of course what you strive for, but the moment you step out of FAANG you'll find that many places have a decent, but not ideal, model/warehouse/db which only makes things like this worse.

Getting the simple examples right is interesting, but hardly replacing an analyst. A basic background around tech and some skill navigating w3schools would replace an analyst if this was the bar.

The mildly complex queries being "almost" right is frankly horrifying. Others have mentioned how much of a "oh fuck" it is to send out bad data, and having some AI generate your answers an be SLIGHTLY off is basically worse than it being obviously off. You now have to check every query it generates for errors, because you won't be able to glance check if its sane or not.

Finally, at least where I work, these probably wouldn't even qualify as medium complexity queries? I don't really know how the industry at large would qualify these, but "using 2 inner joins, a where, a group, and a order" all falls under beginner skills in my eyes?

Its still fascinating, and I'm glad they showed HOW they did it, but this isn't replacing anyone, at least if you're paying taxes off the data.

[+] urbandw311er|3 years ago|reply

I came here to say the same! I was “shockingly close” to winning the lottery last week… …the guy who lives next door won.

[+] rowls66|3 years ago|reply

Based on the natural language query provided,

"Who were the largest biotech investors in 2022?"

I can think of at least six possible answers based on these questions: 1. Does largest mean dollar amount, or number of investments? 2. Would number of investments count companies invested in or funding rounds invested in? 3. Does largest mean the largest total dollar amount invested in 2022, or does it mean larges dollar amount of new investment in 2022?

It looks like ChatGPT chose the query to mean the investors with the largest dollar amount of new investment in 2022.

When you expand your natural language query to clarify all of these ambiguities, how far away are you from a SQL query? I am not sure, but I think that you are getting pretty close.

[+] TOMDM|3 years ago|reply

Fully agree, the next iteration of this should get the model to prompt you when it discovers ambiguity in your request.

User: "Who were the largest biotech investors in 2022?"

Result: For clarification, do you mean:

1. Largest number of investments?

2. Largest total dollar amount invested in 2022

3. Largest dollar amount of new investment in 2022?

4. Something else

User: option 3

Result: Okay, here are the results and query I used.

[+] unknown|3 years ago|reply

[deleted]

[+] LeanderK|3 years ago|reply

I love those ChatGPT projects! Of course it's silly and nobody is really replacing somebody with a program that confidently get half it's answers wrong. But it's fun to just naively let ChatGPT solve the problem.

But I wonder what it's going to look like in a few years. Currently, it's really just a demo that got surprisingly huge traction. I think the most pressing problem is not to get ChatGPT smarter but to get it more reliable. I think more realistic use-cases would emerge if we could build systems that have a better understanding when they are out of breath. I don't think this needs a revolutionary breakthrough just more science.

[+] knodi123|3 years ago|reply

> nobody is really replacing somebody with a program that confidently get half it's answers wrong

I gotta tell you, I use copilot to help with my coding, and it still sends a shiver down my spine when it writes the entire database migration based off of the filename, or writes 4 pages of automated tests that work right on the first try.

[+] nivenkos|3 years ago|reply

All the NoCode and LLM stuff feels like this though - it works well for simple demos, but is useless for the complexity of the real world especially if errors are costly.

[+] bob1029|3 years ago|reply

We looked at using all sorts of "AI" to write SQL based upon natural language prompts. As far as I am aware, the state of the art is still nowhere close enough in accuracy for us to lean into as a business.

This is the leaderboard I keep an eye on: https://yale-lily.github.io/spider

Ultimately, I don't think we will get there with semantic analysis or GPT-style techniques. There is always some human factor involved with whatever schema is developed, so you would probably need an AGI trained in the same business as whoever is being replaced by this thing.

[+] bluecoconut|3 years ago|reply

This is great~ There's been some really rapid progress on Text2SQL in the last 6 months, and I really thinking this will have a real impact on the modern data stack ecosystem!

I had similar success with lambdaprompt for solving Text2SQL (https://github.com/approximatelabs/lambdaprompt/) where one of the first projects we built and tested was a Text-to-SQL very similar to this

Similar learnings as well:

- Data content matters and helps these models do Text2SQL a lot

- Asking for multiple queries, and selecting from the best is really important

- Asking for re-writes of failed queries (happens occasionally) also helps

The main challenge I think with a lot of these "look it works" tools for data applications, is how do you get an interface that actually will be easy to adopt. The chat-bot style shown here (discord and slack integration) I can see being really valuable, as I believe there has been some traction with these style integrations with data catalog systems recently. People like to ask data questions to other people in slack, adding a bot that tries to answer might short-circuit a lot of this!

We built a prototype where we applied similar techniques to the pandas-code-writing part of the stack, trying to help keep data scientists / data analysts "in flow", integrating the code answers in notebooks (similar to how co-pilot puts suggestions in-line) -- and released https://github.com/approximatelabs/sketch a little while ago.

[+] jweir|3 years ago|reply

"Hi Dave, the query was taking too long so I optimized SQL query by adding the line `DROP invoices;` It has improved performance significantly. So far there are no orders to examine."

[+] gregw2|3 years ago|reply

I almost missed the cleverness of this joke about making a dumb SQL error of dropping a table to solve a problem, until I had nearly finished writing my correction post for the two other SQL errors buried in the joke (DROP vs truncate, orders vs invoices)... when the word "Dave" kept nagging at me and I realized the other two errors were the point of your joke.

Nice one. Thanks. :)

   — someone who corrects others’ SQL regularly

P.S. Based on my response to this post, ChatGPT3 spam posts on forums containing subtle errors may ironically actually increase online engagement as people write correction posts to hidden ChatGPT3 ones to help others / make themselves feel smart when "someone on the internet is wrong!"

[+] satisfice|3 years ago|reply

This is yet another formula for a buggy app, courtesy of a man who doesn’t think critically.

Somehow the image of 50,000 e-bikes in a landfill comes to mind, with a bankrupt founder pleading “but it was a cool idea!”

This is a cool idea, but nothing in this article explains how it is a responsible idea.

[+] rezonant|3 years ago|reply

While this is very cool, SQL was designed to be used by business people. We need to go back to that model, where we train the business people who need these analytics how to use SQL to uncover the result. That along with a rigorous policy for including the queries that produced the result so the query logic can be checked would go a long way to actually taking advantage of the data we're collecting as businesses.

[+] clusterhacks|3 years ago|reply

I have been spectacularly unable to get any business people to learn SQL.

I even find that very few will bother with reporting tools (eg Tableau) for simple self-service.

Instead, the expectation is that there will be a programmer/DBA/report writer position dedicated to producing analytics "on-demand."

[+] jeremyjh|3 years ago|reply

Sorry friend, but this is a bit out of touch. Maybe that was the original design intent of SQL, but understanding the application's data model is beyond a lot of sql analysts, much less their business partners.

[+] pphysch|3 years ago|reply

IME, the average SQL/RDBMS technical user doesn't even understand relational data modeling.

They treat it as a bad spreadsheet. JOINs are seen as an annoyance that should be avoided rather than something extremely powerful.

We are far away from average salesperson grasping it.

[+] Johnny555|3 years ago|reply

If you’re willing to accept unverified results from an AI chat bot, you may as well just let the end user make their best guess using a query builder himself. My company requires that any queries used for official reporting or provided to the exec team get blessed by the data sciences team to avoid errant data from bad queries, I’m not sure an AI chat bot would remove this need.

[+] ellisv|3 years ago|reply

davinci-003, ChatGPT, and others can be great tools. But they often give you exactly what you ask for (or at least try to) and a large part of writing SQL queries for analytics is figuring out what wasn't asked for but should have been. Good analysts will find outliers, data-smells, and ask questions rather than rush to returning an answer.

[+] eega|3 years ago|reply

> Playing around with GPT at this level you get the feeling that “recursive GPT” is very close to AGI. You could even ask GPT to reinforcement learn itself, adding new prompts based on fixes to previous questions. Of course, who knows what will happen to all this when GPT-4 drops.

Leaning out of the window way too much here. This has nothing to do with AGI, which would require an intrinsic understanding of not only SQL, but over, well, everything, not just a well-defined and easily checkable field like SQL.

Regarding GPT-4 - OpenAI‘s CEO Sam Altman stated that the expectations regarding GPT-4 are way over-hyped. People on the Internet talk as if AGI is coming in the guise of GPT-4, but it‘s „just“ going to be an incrementally better evolution of GPT-3.5.

Mind, I‘m in no way saying that LLM‘s aren’t exciting - they are to me - or that they will not change the world, but leave your horses in the stable.

[+] rexreed|3 years ago|reply

Is this a self-hosted GPT model? One of the smaller models? Fine tuned on Crunchbase data? Any insights into how this was put together?

[+] simonw|3 years ago|reply

It's using the GPT-3 API from OpenAI.

The article describes how the prompt itself is constructed - to include details of the database schema plus some example data rows, followed by the user's question.

The prompt is then sent to the GPT-3 API. The results are then sent through the API a few more times with some extra prompt engineering before deriving the final result by running the SQL against the original database.

[+] typpo|3 years ago|reply

I've been building something similar that handles the dirty business of formatting a large database into a prompt. Additional work that I've found helpful includes:

1. Using embeddings to filter context into the prompt

2. Identifying common syntax errors or hallucinations of non-existent columns

3. Flagging queries that write instead of read

Plus lots of prompt finessing to get it to avoid mistakes.

It doesn't execute the queries, yet. For an arbitrary db, it's still helpful to have a human in the loop to sanity check the SQL (for now at least).

Demo at https://www.querymuse.com/query if anyone's interested

[+] renewiltord|3 years ago|reply

This is great, of course. And I think the people who will get the most out of the new AI tools are those who can treat them as iterative assistants. The fact that not everyone can use tools this way has become apparent to me recently. e.g. people who use car driving assistants as if they're fully autonomous; or people who use Copilot and are upset the code is incorrect.

The point isn't for it to be correct, but for it to be so fast that it can be mostly correct and you can fix the last bit.

I use Copilot extensively for my Python glue code and it is positively fantastic. I also use it at my shell with copilot.vim with a quick C-x C-e and write a comment and let it write the code.

The iterative improvement nature of the tool means that I make faster progress. It doesn't have to get things right. It only has to make progress and be obvious how to make improvements.

For instance, I just bought some Reserved Instances (c6i) on AWS and I want to make sure that I don't have any c5 instances in there that I won't be covering. I hit C-x C-e and type in `# list all aws instances in tokyo that are c5` and then hit Enter and type `aws` and it completes the rest for me.

I can then run the query and edit it, or I can validate that it looks okay, etc. The point is that I'm a human capable of understanding what this machine is making. That makes me way faster. I don't need to check Stack Overflow, and the machine teaches me syntax etc. and puts it in my history.

It's the closest thing to the Primer from Neal Stephenson's Diamond Age and I love it.

[+] piyh|3 years ago|reply

I wonder if you could take down the analytics db with enough bad cross joins.

[+] hot_gril|3 years ago|reply

In some cases, it only takes one.

301 comments