top | item 39133155

(no title)

tdoehmen | 2 years ago

I've been working on the DuckDB-NSQL model on MotherDuck side. I fully agree that general text-2-sql, in the sense of "give me a question, and I'll produce you an arbitrary complex query", is a very tough problem and I actually believe that that's not the right problem to solve. Not necessarily because models are not (going to be) cabable enough, but rather because it's way too hard for humans to express in a single prompt what they actually want. Furthermore, it's simply not the right UX for most SQL users. A much better approach IMO is to keep the SQL analysts in the driver seat, and provide nuanced support wherever/whenever they need it most. The FixIt feature we recently launched goes into the same direction: https://motherduck.com/blog/introducing-fixit-ai-sql-error-f...

In that sense I emphasized in our Blogpost that users should think of it as a documentation oracle that always gives you the exact DuckDB SQL query snippet you are looking for, which is a tremendoues time-saver if you have an abstrat idea of the query you want to write, but you're just not sure about the syntax, expecially with DuckDB having so many functions and SQL extensions.

Here are a few exammples:

- create tmp table from test.csv

- load aws credentials from 'test' profile

- get max of all columns in rideshare table

- show query plan with runtimes for 'SELECT * FROM rideshare'

- cast hvfhs_license_num column to int

- get all columns ending with _amount from taxi table

- show summary statistics of rideshare table

- get a 10% reservoir sample of rideshare table

- get length of drivers array in taxi table

- get violation_type field from other_violations json column in taxi table

- get passenger count, trip distance and fare amount from taxi table and oder by all of them

- list all tables in current database

- get all databases starting with test_

[edit: fixed list formatting]

discuss

order

swimwiththebeat|2 years ago

Thanks for replying, that's a perspective I didn't consider. The capability to "talk to your data" just seems so enticing as a solution that I was tunnel-visioned into that UX. If I'm understanding correctly, what you're suggesting is more of a SQL assistant to help people write the correct SQL queries instead of writing the entire SQL query from scratch to answer a generic natural-language question?

nerpderp82|2 years ago

I believe that is what they are saying.

I have found LLMs to be extremely helpful in mapping between schemas as well as helping me formulate queries where, because of decay, data in tables and column names, etc don't map to what you think they would.

You need to provide as much context as you can to the LLM. So full schema definitions, and histographic summarization and samples from the tables themselves.