(no title)
swimwiththebeat | 2 years ago
A lot of our databases at work have columns with custom types and enums, and getting the LLM (Llama2) to write SQL queries to robustly answer natural language questions about the data is tough. It requires a lot of instruction prompting, context, and question-SQL examples (few-shot learning), and it still fails in unexpected ways. It's a tough ask for people to use a tool like this if they can't trust the results all the time. It's also a bit infeasible to scale this to tens or hundreds of tables across our data warehouse.
It's great that a lot of people are trying to crack this problem, I'm curious to try this model out. I'd also love to see if other people have tried solving this problem and made any headway.
tdoehmen|2 years ago
In that sense I emphasized in our Blogpost that users should think of it as a documentation oracle that always gives you the exact DuckDB SQL query snippet you are looking for, which is a tremendoues time-saver if you have an abstrat idea of the query you want to write, but you're just not sure about the syntax, expecially with DuckDB having so many functions and SQL extensions.
Here are a few exammples:
- create tmp table from test.csv
- load aws credentials from 'test' profile
- get max of all columns in rideshare table
- show query plan with runtimes for 'SELECT * FROM rideshare'
- cast hvfhs_license_num column to int
- get all columns ending with _amount from taxi table
- show summary statistics of rideshare table
- get a 10% reservoir sample of rideshare table
- get length of drivers array in taxi table
- get violation_type field from other_violations json column in taxi table
- get passenger count, trip distance and fare amount from taxi table and oder by all of them
- list all tables in current database
- get all databases starting with test_
[edit: fixed list formatting]
swimwiththebeat|2 years ago
vgt|2 years ago
We at MotherDuck took an incremental approach. We launched something more akin to lane-assist. We're calling it FixIt - an in-flow visual aid to help you identify and fix errors in your SQL [0].
I think there's gobs of opportunities to improve the analytics experiences without resorting to "L5 self-driving" (e.g. full text-to-results)
[0] https://motherduck.com/blog/introducing-fixit-ai-sql-error-f...
qsort|2 years ago
But as you say custom types and encoded domain knowledge is extremely tough and as a result it's very tough to "transfer" the system to different databases.
cognaitiv|2 years ago
1. Business users aren’t prepared to talk to their data in meaningful ways and this is an opportunity for LLMs to enhance the way users ask questions.
2. SQL modeling languages exist (although I’m not sure there are well maintained open source good ones and this is the biggest obstacle to what I’m working on now) and LLMs can extend these effectively by adding components such as dimensions, metrics, relationships, filters, etc. with less chance of hallucination
3. Deterministic SQL generation from a modeling repository is easier to troubleshoot and provide guarantees than end-to-end generation.
4. Existing SQL can be parsed to generate modeling components that can be committed to the model repository with LLM assistance
5. Much of the richness of going to data to answer questions is context, e.g., how does this dept compare to others, this month to same month last year, etc. Modeling languages are good at expressing these transformations, but business users and often analysts aren’t good at capturing all the permutations of these types of comparisons. Another area where LLMs can help apply tooling.
IMO, LLMs are more effective at using tools than generating complex outputs.
PheonixPharts|2 years ago
One of the biggest challenges I've personally seen in this space is business "leaders" pushing teams to ship products asap lest they loose face among their fellow CEOs for not pushing out "AI" products before everyone else.
I'm fairly optimistic about LLMs being able to truly be transformative, but it's not going to be through forcing the bread-dead UX of hoisting yet another slightly re-imagined chat interface on users.
The idea of "talking to your data" is a promising one, and anyone who has worked for a large data driven org will quickly agree that organizing and searching in-house data is not a solved problem from the UX end of things. But to truly solving these problems, even/especially with LLMs, is going to require thought and experimentation. Something few "business leaders" have patience for.
pacbard|2 years ago
It can work to support business analysts to crank out more reporters, but I wouldn’t roll it out to all my staff.
swimwiththebeat|2 years ago
ipsum2|2 years ago
DebtDeflation|2 years ago
"What was the average order size per customer for product XYZ in the West region?"
Imagine turning that one loose against the typical legacy system.
I wouldn't trust an LLM to figure out the joins or aggregate calculation, LET ALONE the definition of a customer, a product, or a region.
But it would almost certainly generate AN answer.
zainhoda|2 years ago