top | item 39132424

(no title)

I see so many business leaders touting the promise of LLMs allowing business to "talk" to their data. The promise does sound enticing, but it's actually kind of hard to get working in practice.

A lot of our databases at work have columns with custom types and enums, and getting the LLM (Llama2) to write SQL queries to robustly answer natural language questions about the data is tough. It requires a lot of instruction prompting, context, and question-SQL examples (few-shot learning), and it still fails in unexpected ways. It's a tough ask for people to use a tool like this if they can't trust the results all the time. It's also a bit infeasible to scale this to tens or hundreds of tables across our data warehouse.

It's great that a lot of people are trying to crack this problem, I'm curious to try this model out. I'd also love to see if other people have tried solving this problem and made any headway.

discuss

tdoehmen|2 years ago

I've been working on the DuckDB-NSQL model on MotherDuck side. I fully agree that general text-2-sql, in the sense of "give me a question, and I'll produce you an arbitrary complex query", is a very tough problem and I actually believe that that's not the right problem to solve. Not necessarily because models are not (going to be) cabable enough, but rather because it's way too hard for humans to express in a single prompt what they actually want. Furthermore, it's simply not the right UX for most SQL users. A much better approach IMO is to keep the SQL analysts in the driver seat, and provide nuanced support wherever/whenever they need it most. The FixIt feature we recently launched goes into the same direction: https://motherduck.com/blog/introducing-fixit-ai-sql-error-f...

In that sense I emphasized in our Blogpost that users should think of it as a documentation oracle that always gives you the exact DuckDB SQL query snippet you are looking for, which is a tremendoues time-saver if you have an abstrat idea of the query you want to write, but you're just not sure about the syntax, expecially with DuckDB having so many functions and SQL extensions.

Here are a few exammples:

- create tmp table from test.csv

- load aws credentials from 'test' profile

- get max of all columns in rideshare table

- show query plan with runtimes for 'SELECT * FROM rideshare'

- cast hvfhs_license_num column to int

- get all columns ending with _amount from taxi table

- show summary statistics of rideshare table

- get a 10% reservoir sample of rideshare table

- get length of drivers array in taxi table

- get violation_type field from other_violations json column in taxi table

- get passenger count, trip distance and fare amount from taxi table and oder by all of them

- list all tables in current database

- get all databases starting with test_

[edit: fixed list formatting]

swimwiththebeat|2 years ago

Thanks for replying, that's a perspective I didn't consider. The capability to "talk to your data" just seems so enticing as a solution that I was tunnel-visioned into that UX. If I'm understanding correctly, what you're suggesting is more of a SQL assistant to help people write the correct SQL queries instead of writing the entire SQL query from scratch to answer a generic natural-language question?

vgt|2 years ago

I agree. I think full text-to-results (even bypassing text-to-SQL) is akin to L5 self-driving cars. It's easy to get to some reasonable level, say 90%. But to get to the point, where folks can fully trust the system and don't need a steering wheel (or know SQL) may take decades.

We at MotherDuck took an incremental approach. We launched something more akin to lane-assist. We're calling it FixIt - an in-flow visual aid to help you identify and fix errors in your SQL [0].

I think there's gobs of opportunities to improve the analytics experiences without resorting to "L5 self-driving" (e.g. full text-to-results)

[0] https://motherduck.com/blog/introducing-fixit-ai-sql-error-f...

qsort|2 years ago

I've worked on a similar problem and we have pretty much the same issues as you. An idea that makes things better is having an intermediate representation that exposes the key tables of your dwh and gets compiled to SQL. This allows you to have somewhat better security (even if the model outputs garbage it can't do too much damage because nothing except exactly what you want is even representable), and somewhat better explainability (you can't guarantee the model will get it right, but you can see what SQL gets executed rather than "magic 8-ball says no").

But as you say custom types and encoded domain knowledge is extremely tough and as a result it's very tough to "transfer" the system to different databases.

cognaitiv|2 years ago

Lots of folks taking this approach and feels like the wrong entry point, e.g., similar to asking LLMs to generate bytecode when compilers exist or 3d printing a machine vs. building a machine from 3d printed components.

1. Business users aren’t prepared to talk to their data in meaningful ways and this is an opportunity for LLMs to enhance the way users ask questions.

2. SQL modeling languages exist (although I’m not sure there are well maintained open source good ones and this is the biggest obstacle to what I’m working on now) and LLMs can extend these effectively by adding components such as dimensions, metrics, relationships, filters, etc. with less chance of hallucination

3. Deterministic SQL generation from a modeling repository is easier to troubleshoot and provide guarantees than end-to-end generation.

4. Existing SQL can be parsed to generate modeling components that can be committed to the model repository with LLM assistance

5. Much of the richness of going to data to answer questions is context, e.g., how does this dept compare to others, this month to same month last year, etc. Modeling languages are good at expressing these transformations, but business users and often analysts aren’t good at capturing all the permutations of these types of comparisons. Another area where LLMs can help apply tooling.

IMO, LLMs are more effective at using tools than generating complex outputs.

PheonixPharts|2 years ago

> but it's actually kind of hard to get working in practice

One of the biggest challenges I've personally seen in this space is business "leaders" pushing teams to ship products asap lest they loose face among their fellow CEOs for not pushing out "AI" products before everyone else.

I'm fairly optimistic about LLMs being able to truly be transformative, but it's not going to be through forcing the bread-dead UX of hoisting yet another slightly re-imagined chat interface on users.

The idea of "talking to your data" is a promising one, and anyone who has worked for a large data driven org will quickly agree that organizing and searching in-house data is not a solved problem from the UX end of things. But to truly solving these problems, even/especially with LLMs, is going to require thought and experimentation. Something few "business leaders" have patience for.

pacbard|2 years ago

My biggest concern is that this will allow people to type a question and get a number back from the database, without being able to tell if the query is right or if the LLM just made up something.

It can work to support business analysts to crank out more reporters, but I wouldn’t roll it out to all my staff.

swimwiththebeat|2 years ago

I've actually done exactly what @qsort suggested and outputted the intermediate SQL query and raw data generated by that query when generating the response back to the user. That definitely helps in establishing more trust with the customer since they can verify the response. My approach right now is to just be honest with our customers in the capabilities of the tool, acknowledge its shortcomings, and keep iterating over time to make it better and better. That's what the team in charge of our company-wide custom LLM has done and it's gained a surprising amount of traction and trust over the last few months.

ipsum2|2 years ago

Llama2 wasn't trained on code, so it's no surprise that you're having trouble generating SQL.

DebtDeflation|2 years ago

I assume that "Talk to your data" presupposes the existence of some simple view that has been created and that embeds 99% of the required business logic.

"What was the average order size per customer for product XYZ in the West region?"

Imagine turning that one loose against the typical legacy system.

I wouldn't trust an LLM to figure out the joins or aggregate calculation, LET ALONE the definition of a customer, a product, or a region.

But it would almost certainly generate AN answer.

zainhoda|2 years ago

I’m trying to solve for this with my project using RAG and (at least based on what people say in Discord), it’s working really well for them: https://github.com/vanna-ai/vanna