I just watched the author of this feature and blog post give a talk at the DataCouncil conference in Oakland, and it is obvious what a huge amount of craft, ingenuity, and care went into building it. Congratulations to Hamilton and the MotherDuck team for an awesome launch!
Thanks for sharing this update with the world and including it on the local ui too.
Feature request: enable the tuning of when Instant SQL is run and displayed. The erroring out with flashing updates at nearly every keystoke while expanding on a query is distracting for me personally (my brain goes into troubleshooting vs thinking mode). I like the feature (so I will keep it on by default), but I’d like to have a few modes for it depending on my working context (specifically tuning of update frequency at separation characters [space, comma], end of statement [semicolon/newline], and injections [paste/autocomplete]).
This is probably stupid, but at the hope of helping others through exposing my own ignorance -- I'm having trouble actually installing and running the preview... I've downloaded the preview release duckdb binary itself, then when I try to run "duckdb -ui", I'm getting this error:
Extension Autoloading Error:
An error occurred while trying to automatically install the required extension 'ui':
Failed to download extension "ui" at URL "http://extensions.duckdb.org/0069af20ab/osx_arm64/ui.duckdb_..." (HTTP 403)
Extension "ui" is an existing extension.
Is it looking to download the preview version of the extension, but getting blocked/unauthorized (hence the 403 forbidden response)? Or is there something about the auto-loading behavior that I'm supposed to disable maybe?
It would be even better if SQL had pipe syntax. SQL is amazing, but its ordering isn’t intuitive, and only CTEs provide a reliable way to preview intermediate results. With pipes, each step could clearly show intermediate outputs.
Example:
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> WHERE total_spent > 1000
|> INNER JOIN customers USING(customer_id)
|> CALL ENRICH.APOLLO(EMAIL > customers.email)
|> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country
The PRQL[1] syntax is built around pipelines and works pretty well.
I added a similar "get results as you type" feature to the SQLite integration in the Logfile Navigator (lnav)[2]. When entering PRQL queries, the preview will show the results for the current and previous stages of the pipeline. When you move the cursor around, the previews update accordingly. I was waiting years for something like PRQL to implement this since doing it with regular SQL requires more knowledge of the syntax and I didn't want to go down that path.
Obviously one advantage of SQL is everyone knows it. But conceptually, I agree. I think [1]Malloy is also doing some really fantastic work in this area.
This is one of the reasons I'm excited about DuckDB's upcoming [2]PEG parser. If they can pull it off, we could have alternative dialects that run on DuckDB.
While I would certainly agree with you that putting the FROM clause first would be a significant improvement to SQL and was a genuine design mistake, this otherwise feels more like just wanting SQL to be less declarative and more imperative. Wanting it to be more like LINQ and less like relational algebra.
That, I think, is most developers' real sticking point with SQL. It's not object-relational impedance mismatch between their application and the data store, it's imperative-declarative impedance mismatch between their preferred or demonstrated talent. They are used to thinking about problems in exactly one way, so when they struggle to adapt to a different way of thinking about the problems they assume their familiarity is what's more correct.
I think this is why the same developers insist that XML/HTML is "just a markup language." Feeding a document into an executable to produce output isn't really significantly different than feeding imperative language into a compiler. The only real difference is that one is Turing complete, but Turning completeness is not a requirement of programming languages.
Nothing comes close to the power of mongodb aggression pipelines.. when used in production apps it reduces the amount of code significantly for us by doing data modeling as close as possible to the source
I really like duckdb's notebooks for exploration and this feature makes them even more awesome, but the fact that I can't share, export or commit them into a git repo feels extremely limiting. It's neat-ish that it dodfoods and store them in a duckdb database. It even seems to stores historical versions, but I can't really do anything with it..
Me too (author of the post here). In fact, I was watching a seasoned data engineer at MotherDuck show me how they would attempt to debug a regex in a CTE. As a longtime SQL user, I felt the pain immediately; haven't we all been there before? Instant SQL followed from that.
If there are any DuckDB engineers here, I just want you to know that your tool has been incredible for my work in bioinformatics/biotech. It has the flexibility/simplicity that biological data (messy, changing constantly) requires.
At times I've done crude implementations of similar functionality, by basically just taking the current string on change and concatenating with " LIMIT 20" before passing it to the database API and then rerendering a table if the result is an associative array rather than an error message.
I think this would be better if it was combined with information about valid words in the cursor position, which would likely be a bit more involved but achievable through querying the schema and settling on a subset of SQL. It would help people that aren't already fluent in SQL to extract the data they want. Perhaps allow them to click the suggestions to add them to the query.
I've done partial implementations of this too, that query the schema for table or column names. It's very cheap even on large, complex schemas, so it's fine to just throw every change at the database and check what drops out. In practice I didn't get much out of either beyond the fun of hacking up an ephemeral tool, or I would probably have built some small product around it.
Amazing work. Motherduck and the duckdb ecosystem have done a great job of gathering talented engineers with great taste. Craftsmanship may be the word I’m looking for - I always look forward to their releases.
I spent the first two quarters of 2024 working on observability for a build-the-plane-as-you-fly-it style project. I can’t express how useful the cte preview would have been for debugging.
it looks cool, but i wish i could just see the entire table that im about to query. i always start my queries with a quick `select * from table limit 10;` then go about adding the columns and joins
Indeed, we are! We worked with DuckDB Labs to add the query_location information, which we're also enriching with the tokenizer to draw a path through the AST to the cursor location. I've been wanting to do this since forever, and now that we have it, there's actually a long tail of inspection / debugging / enrichment features we can add to our SQL editor.
This is a very cool feature. I don't know how useful it is or how I'd use it right now but I think I am going to get into some benchmarking and performance tweaking soon and this could be handy.
Yes it comes from a desire to impose intuition from other contexts onto something instead of building intuition with that thing.
SQL is a declarative language. The ordering of the statements was carefully thought through.
I will say it's harmless though, the clauses don't have any dependency in terms of meaning so it's fine to just allow them to be reordered in terms of the meaning of the query, but that's true of lots and lots of things in programming and just having a convention is usually better than allowing anything.
For example, you could totally allow this to be legal:
def
for x in whatever:
print(x)
print_whatever(whatever):
There's nothing ambiguous about it, but why? Like if you are used to seeing it one way it just makes it more confusing to read, and if you aren't used to seeing it the normal way you should at least somewhat master something before you try to improve it through cosmetic tweaks.
I think you see this all the time, people try to impose their own comfort onto things for no actual improvement.
You can use any variation of DuckDB valid syntax that you want! I prefer to put from first just because I think it's better, but Instant SQL works with traditional select __ from __ queries.
Edit: never mind, thanks for the replies! I had missed the part where it showed visualizing subqueries, which is what I wanted but didn't think it did. This looks very helpful indeed!
> Getting the AST is a big step forward, but we still need a way to take your cursor position in the editor and map it to a path through this AST. Otherwise, we can’t know which part of the query you're interested in previewing. So we built some simple tools that pair DuckDB’s parser with its tokenizer to enrich the parse tree, which we then use to pinpoint the start and end of all nodes, clauses, and select statements. This cursor-to-AST mapping enables us to show you a preview of exactly the SELECT statement you're working on, no matter where it appears in a complex query.
> What would be helpful would be to be able to visualize intermediate results -- if my cursor is inside of a subquery, show me the results of that subquery.
But that's exactly what they show in the blog post??
honestly this kind of instant feedback wouldve saved me tons of headaches in the past - you think all these layers of tooling are making sql beginners pick it up faster or just overwhelming them?
Please finally add q language with proper integration to your tables so that our precious q-SQL is available there. Stop reinventing the wheel, let's at least catch up to the previous generation (in terms of convenience). Make the final step.
carlineng|10 months ago
ryguyrg|10 months ago
wodenokoto|10 months ago
ryguyrg|10 months ago
Awesome video of feature: https://youtu.be/aFDUlyeMBc8
Disclaimer: I’m a co-founder at MotherDuck.
rancar2|10 months ago
Feature request: enable the tuning of when Instant SQL is run and displayed. The erroring out with flashing updates at nearly every keystoke while expanding on a query is distracting for me personally (my brain goes into troubleshooting vs thinking mode). I like the feature (so I will keep it on by default), but I’d like to have a few modes for it depending on my working context (specifically tuning of update frequency at separation characters [space, comma], end of statement [semicolon/newline], and injections [paste/autocomplete]).
strgcmc|10 months ago
Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'ui': Failed to download extension "ui" at URL "http://extensions.duckdb.org/0069af20ab/osx_arm64/ui.duckdb_..." (HTTP 403) Extension "ui" is an existing extension.
Is it looking to download the preview version of the extension, but getting blocked/unauthorized (hence the 403 forbidden response)? Or is there something about the auto-loading behavior that I'm supposed to disable maybe?
theLiminator|10 months ago
jakozaur|10 months ago
Example:
FROM orders |> WHERE order_date >= '2024-01-01' |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id |> WHERE total_spent > 1000 |> INNER JOIN customers USING(customer_id) |> CALL ENRICH.APOLLO(EMAIL > customers.email) |> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country
tstack|10 months ago
I added a similar "get results as you type" feature to the SQLite integration in the Logfile Navigator (lnav)[2]. When entering PRQL queries, the preview will show the results for the current and previous stages of the pipeline. When you move the cursor around, the previews update accordingly. I was waiting years for something like PRQL to implement this since doing it with regular SQL requires more knowledge of the syntax and I didn't want to go down that path.
[1] - https://prql-lang.org [2] - https://lnav.org/2024/03/29/prql-support.html
hamilton|10 months ago
This is one of the reasons I'm excited about DuckDB's upcoming [2]PEG parser. If they can pull it off, we could have alternative dialects that run on DuckDB.
[1] https://www.malloydata.dev/ [2] https://duckdb.org/2024/11/22/runtime-extensible-parsers.htm...
da_chicken|10 months ago
That, I think, is most developers' real sticking point with SQL. It's not object-relational impedance mismatch between their application and the data store, it's imperative-declarative impedance mismatch between their preferred or demonstrated talent. They are used to thinking about problems in exactly one way, so when they struggle to adapt to a different way of thinking about the problems they assume their familiarity is what's more correct.
I think this is why the same developers insist that XML/HTML is "just a markup language." Feeding a document into an executable to produce output isn't really significantly different than feeding imperative language into a compiler. The only real difference is that one is Turing complete, but Turning completeness is not a requirement of programming languages.
metadata|10 months ago
https://cloud.google.com/blog/products/data-analytics/simpli...
It's pretty neat:
Edit: formattingcrooked-v|10 months ago
wodenokoto|10 months ago
NDizzle|10 months ago
cdchhs|10 months ago
motoboi|10 months ago
Why is it a killer feature? First of all, LLMs complete text from left to right. That alone is a killer feature.
But for us meatboxes with less compute power, pipe syntax allow (much better) code completion.
Pipe syntax is delightful to work with and makes going back to SQL a real bummer moment (please insert meme of Kate Perry kissing the earth here).
ergest|10 months ago
hantusk|10 months ago
gervwyk|10 months ago
XCSme|10 months ago
falcor84|10 months ago
matsonj|10 months ago
worldsayshi|10 months ago
crmi|10 months ago
ryguyrg|10 months ago
jpambrun|10 months ago
akshayka|10 months ago
https://github.com/marimo-team/marimo
hamilton|10 months ago
RyanHamilton|10 months ago
ayhanfuat|10 months ago
hamilton|10 months ago
RobinL|10 months ago
input_data = duckdb.sql("SELECT * FROM read_parquet('...')")
step_1 = duckdb.sql("SELECT ... FROM input_data JOIN ...")
step_2 = duckdb.sql("SELECT ... FROM step_1")
final = duckdb.sql("SELECT ... FROM step_2;")
ako|10 months ago
biophysboy|10 months ago
cess11|10 months ago
I think this would be better if it was combined with information about valid words in the cursor position, which would likely be a bit more involved but achievable through querying the schema and settling on a subset of SQL. It would help people that aren't already fluent in SQL to extract the data they want. Perhaps allow them to click the suggestions to add them to the query.
I've done partial implementations of this too, that query the schema for table or column names. It's very cheap even on large, complex schemas, so it's fine to just throw every change at the database and check what drops out. In practice I didn't get much out of either beyond the fun of hacking up an ephemeral tool, or I would probably have built some small product around it.
jwilber|10 months ago
I spent the first two quarters of 2024 working on observability for a build-the-plane-as-you-fly-it style project. I can’t express how useful the cte preview would have been for debugging.
arrty88|10 months ago
matsonj|10 months ago
will do the same!
We are working on how to make it easy to switch from instant sql -> run query -> instant sql
Vaslo|10 months ago
arsalanb|10 months ago
mritchie712|10 months ago
hamilton|10 months ago
krferriter|10 months ago
RobinL|10 months ago
I've used sqlglot to do this in the past, but doing it natively would be nice
owlstuffing|10 months ago
1. https://github.com/manifold-systems/manifold/blob/master/man...
Jgrubb|10 months ago
wodenokoto|10 months ago
Is mother duck editor features available on-prem? My understanding is that mother duck is a data warehouse sass.
1egg0myegg0|10 months ago
-Customer software engineer at MotherDuck
unknown|10 months ago
[deleted]
hk1337|10 months ago
I'm assuming it's more of a user preference like commas in front of the field instead of after field?
ltbarcly3|10 months ago
SQL is a declarative language. The ordering of the statements was carefully thought through.
I will say it's harmless though, the clauses don't have any dependency in terms of meaning so it's fine to just allow them to be reordered in terms of the meaning of the query, but that's true of lots and lots of things in programming and just having a convention is usually better than allowing anything.
For example, you could totally allow this to be legal:
There's nothing ambiguous about it, but why? Like if you are used to seeing it one way it just makes it more confusing to read, and if you aren't used to seeing it the normal way you should at least somewhat master something before you try to improve it through cosmetic tweaks.I think you see this all the time, people try to impose their own comfort onto things for no actual improvement.
hamilton|10 months ago
crazygringo|10 months ago
Noumenon72|10 months ago
> Getting the AST is a big step forward, but we still need a way to take your cursor position in the editor and map it to a path through this AST. Otherwise, we can’t know which part of the query you're interested in previewing. So we built some simple tools that pair DuckDB’s parser with its tokenizer to enrich the parse tree, which we then use to pinpoint the start and end of all nodes, clauses, and select statements. This cursor-to-AST mapping enables us to show you a preview of exactly the SELECT statement you're working on, no matter where it appears in a complex query.
geysersam|10 months ago
But that's exactly what they show in the blog post??
hamilton|10 months ago
r3tr0|10 months ago
Except for system performance data.
You can checkout our sandbox at
https://yeet.cx/play
potatohead24|10 months ago
hamilton|10 months ago
almosthere|10 months ago
acdanger|10 months ago
1egg0myegg0|10 months ago
xdkyx|10 months ago
porridgeraisin|10 months ago
gitroom|10 months ago
makotech221|10 months ago
(129304 rows affected)
CurtHagenlocher|10 months ago
ltbarcly3|10 months ago
thenaturalist|10 months ago
First, repeat data analyst queries are a usage driver in SQL DBs. Think iterating the code and executing again.
Another huge factor in the same vein is running dev pipelines with limited data to validate a change works when modelling complex data.
This is currently a FE feature, but underneath lies effective caching.
The underlying tech is driving down usage cost which is a big thing for data practitioners.
hamilton|10 months ago
sannysanoff|10 months ago
cess11|10 months ago
datadrivenangel|10 months ago