Show HN: Natural-SQL-7B, a strong text-to-SQL model
362 points| thecalebf | 2 years ago |github.com | reply
Here is the HF page: https://huggingface.co/chatdb/natural-sql-7b
362 points| thecalebf | 2 years ago |github.com | reply
Here is the HF page: https://huggingface.co/chatdb/natural-sql-7b
[+] [-] rgbrgb|2 years ago|reply
What kind of applications would this be useful for? What can you build with an AI data science intern that's right 75% of the time?
As a programmer who always has to look stuff up when I SQL, I could definitely see asking something like this for a first draft of a query but it seems like I'm slightly better off asking the bigger models in these one-off cases (and I can run a 15b easily on my 64GB m1). If I'm in a corporate setting I'm not going to leak my schema into OpenAI's training data and there are definitely times when I'd want to run queries offline. Small/local models are great when you want to do a ton of queries (save $$).
A mini data scientist that could be queried by non-technical folks would be awesome but I wonder if there's a way to determine whether the query is falling in the 25% "incorrect" case... maybe there's a RAID-like consensus algorithm where you have multiple interrogate each other's answers to get a higher overall success rate.
Mostly thinking out loud :) but maybe ya'll have more ideas. Congrats on the release, OP!
[0]: https://github.com/defog-ai/sql-eval
[+] [-] fifilura|2 years ago|reply
2. Combining and slicing data is a craft, and doing it subtly wrong in one step can lead to fatal errors in the outcome.
And most importantly, it can be very difficult to notice. Numbers don't smell.
That is why I would be very hesitant to give a slightly more than trivial task to an engine that fails 25% of the time.
But I guess that is the same as any other programming task. Just that other programming tasks require a lot of boilerplate where an AI can help. SQL is much more straight to it.
Maybe it could be useful to ask questions that are similar to writing testcases "how can I verify that my query is doing the right thing"?
[+] [-] beefield|2 years ago|reply
I have written a bunch of more or less complicated SQL during my career. And I am pretty sure that if I need to write a SQL statement that's anything but select * from table, my output won't work 75% of time.
I may be special case, but typically if I work on a hard problem, it is not a single hard problem but a sh*tload of connected simple problems. If I can get someone to solve the simple problems 75% of the time correctly so that I can spend my time figuring out how those simple problems are to be connected, I'm ore than happy. And that's exactly how I use chatgpt. I have learned not to ask too complex questions from it. But the simple ones, it mostly aces and when it does not , they are easy to spot, as it is not that I could not have solved them myself, I just did not want to spend time for that. Now, if only the chatgpt was not almost as lazy as me to produce long simple stuff, that would be awesome.
[+] [-] internet101010|2 years ago|reply
Yeah this is the issue I have with all of the SQL generation stuff. Not only should the SQL be valid, a prompt like "generate a query that pulls sales for the last quarter" should generate the same output for everyone without fail. Vanna's business logic embedding is a good first step but even then it is only correct like 90% of the time with GPT-4.
Even then, it will only work if there are strong standards and data governance structures in place that everyone within an organization is aligned on. For example, "sales" can mean different things to different people and all of that needs to be buttoned up as well.
[+] [-] wg0|2 years ago|reply
That's the story of the LLMs in general.
The hype is free. Startup ecosystem are a bonus.
[+] [-] Closi|2 years ago|reply
The kind of stuff that it is very easy to validate if it works or not :)
I am building a warehouse management system at the moment, and it's great to quickly churn out lots of SQL views (particularly as the schema is changing/evolving slightly as I am writing it, so being able to go back to GPT4 to churn through the changes to some of the 'views' of my pages helps, even if it requires a little testing/validation).
[+] [-] brian_reardon|2 years ago|reply
[+] [-] whimsicalism|2 years ago|reply
[+] [-] dimask|2 years ago|reply
[+] [-] bagels|2 years ago|reply
[+] [-] bongodongobob|2 years ago|reply
[+] [-] fijiaarone|2 years ago|reply
[+] [-] davidy123|2 years ago|reply
[+] [-] lawxls|2 years ago|reply
[+] [-] lolpanda|2 years ago|reply
[+] [-] joshhart|2 years ago|reply
https://www.databricks.com/blog/announcing-public-preview-ai...
Many customers like it a lot. Although perhaps in your case if there are many pricing details it may not be quite accurate.
[+] [-] l5870uoo9y|2 years ago|reply
[1]: https://www.sqlai.ai/posts/enhancing-ai-accuracy-for-sql-gen...
[+] [-] mritchie712|2 years ago|reply
We (https://www.definite.app/) ended up abandoning text-to-sql in favor of answering questions with a semantic layer (which LLM's are far more effective against).
https://www.loom.com/share/a0d3c0e273004d7982b2aed24628ef40
[+] [-] satvikpendem|2 years ago|reply
[+] [-] thecalebf|2 years ago|reply
[+] [-] CharlesW|2 years ago|reply
Also, it's only weights AFAICT — no source training data/code is available.
[+] [-] itsoktocry|2 years ago|reply
This is cool, and up my alley. But that's not a complex question, it's a basic analytics question. Most analysts will be able to write something like that in their sleep.
I've been using ChatGPT for writing SQL, and it's mediocre. But it'll get better, I'm sure.
[+] [-] thecalebf|2 years ago|reply
I will update that to be a more truly difficult question. Appreciate the feedback!
[+] [-] int_19h|2 years ago|reply
On the other hand, telling GPT to generate SQL to query a data store as part of solving some task that requires inference from facts captured in that data store works surprisingly well - better than "function calls" with JSON, in my opinion. While such generated queries are also suboptimal, they still capture the intent correctly, and GPT is surprisingly adept at using nested subqueries to get the answer it needs in a single query. And when such generated SQL is wrong, it usually fails to parse (e.g. due to typos in field names), at which point you can just feed the error message back to the model and have it correct that.
[+] [-] buzzm|2 years ago|reply
[+] [-] owlstuffing|2 years ago|reply
[+] [-] CastFX|2 years ago|reply
https://bird-bench.github.io/
[+] [-] jimmytucson|2 years ago|reply
Like all things LLM, I don't know if this is about to make those responsibilities a lot easier, or just eliminate them altogether.
[+] [-] fwip|2 years ago|reply
[+] [-] zurfer|2 years ago|reply
This seems like a great base model, although I wonder if text-to-sql is good use case for small models. We are also building a tool in the space and I regularly wish gpt-4 to be even more knowledgable when answering. Even gpt 3.5 is not good enough for production.
[+] [-] thecalebf|2 years ago|reply
Would love to hear about what you are building!
[+] [-] zainhoda|2 years ago|reply
[+] [-] thecalebf|2 years ago|reply
[+] [-] zeroq|2 years ago|reply
(1) this is the first instalment, and it's already close to be a thousand times more useful for product owners and analytics than any airtable you can imagine.
(2) as much as I love being on point on every challenge, we're leaving in "good enough" economics for quite some time, and if this will be close enough that will be good enough for business.
[+] [-] Tycho|2 years ago|reply
[+] [-] moltar|2 years ago|reply
> This model was evaluated on SQL-Eval, a PostgreSQL-based evaluation framework developed by Defog for testing and alignment of model capabilities.
But this explains the testing part.
However, does it mean that only the PostgreSQL-flavour of SQL is supported?
Would it work for Trino flavour?
[+] [-] aussieguy1234|2 years ago|reply
[+] [-] xfalcox|2 years ago|reply
A model like this with a 32k long seq_len, like Mixtral, would be a killer for me.
[+] [-] thecalebf|2 years ago|reply
[+] [-] cuuupid|2 years ago|reply
e.g. Passing in some of my more complex table schemas related to flight data and asking about overflights, the model struggles to resolve out information related to aviation. However, GitHub Copilot writes me a perfect call to Prisma with the same single line instruction + information spanning the rest of my codebase.
[+] [-] tillvz|2 years ago|reply
When you're on a higher abstraction level, it also allows you to make clear definitions (e.g. for certain KPIs) and define business logic that always needs to be applied to get the correct results.
There you don't want to leave it up to chance that a filter gets hallucinated in or out when you ask e.g. about your company's revenue.
At Veezoo (https://www.veezoo.com) we have taken the approach that instead of going directly to SQL. So when a user asks a question, Veezoo translates it first into a query against the Knowledge Graph (which represents the business objects, their relationship etc.). From there we compile it into a SQL query depending on the target database (they all have slight differences) without any AI involvement. In this compilation step we also make sure that the business logic is properly applied.
[+] [-] thecalebf|2 years ago|reply
[+] [-] K0IN|2 years ago|reply
example: give me the revenue for all logistics firms
but in the database these might not be called "logistics" and may be called "transport" (or anything)
maybe there are some counters to this like finding unique values per column or even better use a grammar based approach, wich will select only valid entries.
but the simple text to SQL is at this point not the "hard thing to solve"
[+] [-] bm-rf|2 years ago|reply
[+] [-] owlstuffing|2 years ago|reply
1. https://github.com/manifold-systems/manifold/blob/master/man...
[+] [-] moltar|2 years ago|reply
I could generate DDL statements, of course. But wondering if this is the best way to hint at the model of the database structure.
Also, how would you go about supplying the very verbose descriptions of all of the data types? Would SQL comments be best? Postgres-style column comments?
Thanks!
[+] [-] delichon|2 years ago|reply
This would save gobs of compute.
[+] [-] thecalebf|2 years ago|reply
[+] [-] bottlepalm|2 years ago|reply
https://github.com/defog-ai/sql-eval/blob/main/data/question...
[+] [-] thecalebf|2 years ago|reply
I mention a little more about it here https://x.com/calebfahlgren/status/1754247740291207198?s=20