top | item 39781418

Show HN: DaLMatian – Text2sql that works

44 points| alandu | 1 year ago |dalmatian.ai

Hey HN, we've built DaLMatian, a text2sql product that meets the needs of data analysts working with enterprise data. We built this app because as a data analyst at an enterprise I could not find a text2sql product that was (1) actually useful for my day-to-day and (2) easy to set up on my computer. Existing products either fall apart when tested on gnarly enterprise data/queries or require going through a sales/integration process that I wasn't in a position to push for - I just wanted something that I could quickly set up to help make my job easier. Our goal is to make this a reality for any data analyst that feels the same.

There are many constraints that make this reality difficult to achieve. The product needs to scale to databases with millions of columns and extract business logic from very complex queries. It also needs to be fast, at least faster than an analyst would take to write the query. On top of all this, an analyst needs to be allowed to use it from a security standpoint. Our app meets all the key requirements of an enterprise data analyst while also being lightweight enough to run locally on a typical laptop.

Here's how it works. To get started, you simply need to open a file of past queries in our IDE (try it here: https://www.dalmatian.ai/download) and add a file with your database schema (instructions here: https://www.dalmatian.ai/docs#configuration). There is also an option to connect a database to auto pull your schema (no actual data is seen by the LLM). We do not see anything you input since the app is local and the only external connection is with OpenAI. It's just like asking ChatGPT for help with queries, but in a streamlined way.

If you'd download our free IDE and try to break it, we'd love to hear what you come up with!

28 comments

order

laidoffamazon|1 year ago

Two notes:

1) I appreciate that it's said to be local first but the fact that it depends on an OpenAI API usage is...kinda a big hole in that? The organization I work in wouldn't really accept this for approval, and from the title I was hoping that this would be a local-first fine tuned (or fine-tunable) LLM.

2) The about page stating that you met at Princeton is a huge bear signal for me. I don't think tools should be adopted based on how much of an elite (cognitive or financial or social or athletic or whatever) their creators are, and given the use of the OpenAI APIs I question why the "top ML conferences" bit is here at all.

jddj|1 year ago

The trend of these apps (admittedly, there are worse offenders than these guys) which stress how your data is completely safe, encrypted in transit, not stored on our servers, yours forever...by the way, everything is piped straight into OpenAI is a bit tiring.

alandu|1 year ago

1 - yes our current solution does require you to be allowed to use ChatGPT/OpenAI. Unfortunately the accuracy using smaller models (even GPT-3.5) is poor. We don't see a local model (which will be much worse than GPT-3.5) even with fine tuning being anywhere close to good enough (would also require a really large number of queries). So we are relying on GPT-4 for now.

2 - agreed the background isn't why anyone should adopt a tool, just wanted to share our story. I would add that creating a good wrapper can actually be quite challenging, need to synthesize many pieces under constraints like memory, compute, speed, accuracy.

activatedgeek|1 year ago

In AI/ML research, text to SQL always sounded to me of merely academic interest, in the sense that the outputs are easily verifiable and make for a good proof of concept of a language model's (or a translation model's) capabilities.

But looks like there are plenty of products coming out in this area, and it has me wondering: what is the actual big picture for enterprises here?

I would assume enterprises employ enough people to write yet another query for whatever use case.

- Is the expectation that in the future, we can bring the flexibility of SQL-like languages to people unfamiliar with SQL?

- Perhaps a salesperson unfamiliar with SQL would like to conduct an analysis. Is the volume and variety of such queries so high that optimizing for the turnaround time from an SQL query designed by data analyst to the salesperson to consume the results is so worthwhile?

Perhaps I am underestimating the scale of the problem but would love some insider perspective here.

alandu|1 year ago

I used to get slammed with so many requests that my boss had to tell the sales team to reduce the number of questions and only ask highest priority ones. Analytics teams serve a lot of different teams in an org, and the requests can really pile up. I was basically a bottleneck, which was a lose-lose for me since I was slammed with work and for business stakeholders too since they had to either wait a long time for responses or were limited in what they could even ask.

moqca|1 year ago

Cant get this to work. Instructions are very unclear. Was unable to open a snowflake connection. Uploaded schema in a csv file. No indication of what needs to be done next. Assume that manage context queries is where it pulls info from. Added a query and provided a dsecription. Tried Q&A, nothing happened

l5870uoo9y|1 year ago

If you are looking for a text to sql solution, I can in all modesty recommend my own https://www.sqlai.ai. Adding schemas can be done in any format and it automatically parsed/optimized by AI for optimal performance.

alandu|1 year ago

If you open a .sql file into the workspace, the queries in that file will be auto parsed and used as context for Q&A. If you're willing, would love to help debug - could you email support@dalmatian.ai

HanClinto|1 year ago

Have you run this against UNITE? I'm curious to see how it benchmarks against other text2sql tools:

https://github.com/awslabs/unified-text2sql-benchmark

alandu|1 year ago

We have not come across any benchmark dataset that's actually worth evaluating on because the questions are not representative of real world enterprise problems. They don't reflect the degree of context needed to answer domain/business-specific questions accurately.

pelagicAustral|1 year ago

Why make it a full VSCode download instead of a plugin?

alandu|1 year ago

There are other product additions in the works, like hooking it up to your locally opened Slack. The plugin would be limiting

bdcravens|1 year ago

Recommendation: your HN post shouldn't tell us more about the company and product than your website does.

alandu|1 year ago

Thanks we will rethink how the product is presented on our website!