top | item 39785867

(no title)

moqca | 1 year ago

How does it handle large schemas? A quick schema dump shows that we have around 100K columns between all of our tables. Say we use 10% of that.

How easy is it to select which tables should be taken into account?

Is there an intermediate context layer step to work with this?

Do you need to provide working examples for the tool to know how joins and relations are usually handled?

discuss

order

l5870uoo9y|1 year ago

> How does it handle large schemas?

For now we don't particularly handle huge schema, being over 10-20k tokens. I would suggest adding only the tables you need. But I will hopefully this week test a solution that automatically pre-determines which tables to include for huge schemas before the actual SQL query generation and thereby using relatively few tokens.

> How easy is it to select which tables should be taken into account?

You can edit the schema using code editor (it will be in CSV). That is relatively easy. I previously had a solution where you could manually tick off tables to include, but found it a bit cumbersome. Might add that back though. Ideally I want everything you run without the user having to include/exclude tables.

> Is there an intermediate context layer step to work with this?

I will test adding this. Must also be performant in terms of speed and reliability.

> Do you need to provide working examples for the tool to know how joins and relations are usually handled?

Normally AI can infer that from the schema. If not you can "teach" AI using RAG: https://www.sqlai.ai/posts/enhancing-ai-accuracy-for-sql-gen...

lmeyerov|1 year ago

We are working with enterprise-scale schemas, and beyond SQLs like databricks, are doing Splunk, Elastic, graph DBs, etc, where many tables/cols is common. Part of the trick includes ideas like using embedding models and continuous learning for schema subsetting & tuning. We are deploying in govs, banks, insurance companies, etc, happy to chat if sounds relevant -- louie.ai . Notebooks, dashboarding, headless APIs, pen testing, audit logs, RBAC modes, a lot needed for enterprise settings!