top | item 38998232

(no title)

sagaro | 2 years ago

All these products that pitch about using AI to find insights from your data always end up looking pretty in demos and fall short in reality. This is not because the product is bad, but because there is enormous amount of nuance in DB/Tables that becomes difficult to manage. Most startups evolve too quickly and product teams generally tries to deliver by hacking some existing feature. Columns are added, some columns get new meaning, some feature is identified by looking at a combination of 2 columns etc. All this needs to be documented properly and fed to the AI and there is no incentive for anyone to do it. If the AI gives the right answer, everyone is like wow AI is so good, we don't need the BAs. If the AI gives terrible answers they are like "this is useless". No one goes "wow, the data engineering team did a great job keeping the AI relevant".

discuss

joshstrange|2 years ago

I couldn’t agree more. I’ve hooked up things to my DB with AI in an attempt to “talk” to it but the results have been lackluster. Sure it’s impressive when it does get things right but I found myself spending a bunch of time adding to the prompt to explain how the data is organized.

I’m not expecting any LLM to just understand it, heck another human would need the same rundown from me. Maybe it’s worth keeping this “documentation” up to date but my take away was that I couldn’t release access to the AI because it got things wrong too often and I could anticipate every question a user might ask. I didn’t want it to give out wrong answers (this DB is used for sales) since spitting out wrong numbers would be just as bad as my dashboards “lying”.

Demo DBs aren’t representative of shipping applications and so the demos using AI are able to have an extremely high success rate. My DB, with deprecated columns, possibly confusing (to other people) naming, etc had a much higher error rate.

eurekin|2 years ago

Speculating

How about a chat interface, where you correct the result and provide more contextual information about those columns?

Those chats could be later fed back to the model and ran a DPO optimisation on top

eek2121|2 years ago

Welcome to AI in general.

Billions wasted on a pointless endeavor.

10 years from now folks are going to be laughing at how billions of dollars and productivity was flushed down the drain to support Microsoft Word 2.0.

AI is a bubble. Do yourself a favor and short (or buy put options) the companies that only have "AI" for a business model.

Also short Intel, because Intel.

lmeyerov|2 years ago

Our theory is we are having simultaneously a bit of a Google moment and a Tableau moment. There is à lot more discovery & work to pull it off, but the dam has been broken. It's been am exciting time to work through with our customers:

* Google moment: AI can now watch and learn how you and your team do data. Around the time Google pagerank came around, the Yahoo-style search engines were highly curated, and the semantic web people were writing xml/rdf schema and manually mapping all data to it. Google replaced slow and expensive work with something easier, higher quality, and more scalable + robust. We are making Louie.ai learn both ahead of time and as the system gets used, so data people can also get their Google moment. Having a tool that works with you & your team here is amazing.

* Tableau moment: A project or data owner can now guide a lot more without much work. Dashboarding used to require a lot of low-level custom web dev etc, while Tableau streamlined it so that a BI lead good at SQL and who understood the data & design can go much further without a big team and in way less time. Understanding the user personas, and adding abstractions for facilitating them, were a big deal for delivery speed, cost, and achieved quality. Arguably the same happened as Looker in introduced LookML and foreshadowed the whole semantic layer movement happening today. To help owners ensure quality and security, we have been investing a lot in the equivalent abstractions in Louie.ai for making data and more conversational. Luckily, while the AI part is new, there is a lot more precedent on the data ops side. Getting this right is a big deal in team settings and basically any time the stakes are high.

scoot|2 years ago

> Around the time Google pagerank came around, the Yahoo-style search engines were highly curated

Hmmm, no. Altavista was the go-to search engine at the time (launched 1995), and was a crawler (i.e. not a curated catalog/directory) based search. Lycos predates that but had keyword rather than natural language search.

Google didn't launch until 1998.

tucnak|2 years ago

Is that right? You do all that at Louie.ai?

j-a-a-p|2 years ago

Mostly agree. I suggest to keep using ETL and create a data warehouse that irons outs most of these nuances that are needed for a production database. On a data warehouse with good meta data I can imagine this will work great.

sagaro|2 years ago

I think getting clean tables/ETLs is a big blocker for move fast and break things. I would be more interested in actually github copilot style sql IDE (like datagrip etc.), which has access to all the queries written by all the people within a company. Which runs on a local server or something for security reasons and to get the nod from the IT/Sec department.

And basically when you next write queries, it just auto completes for you. This would improve the productivity of the analysts a lot. With the flexibility of them being able to tweak the query. Here if something is not right, the analyst updates. The Copilot AI keeps learning and giving weights to recent queries more than older queries.

Unlike the previous solution where if something breaks, you can do nothing till you clean up the ETL and redeploy it.

zurfer|2 years ago

that is correct. GPT-4 is good on well-modelled data out of the box, but struggles with a messy and incomplete data model.

Documenting data definitely helps to close that gap.

However the last part you describe is nothing new (BI teams taking credit, and pushing on problems to data engineers). In fact there is a chance that tools like vanna.ai or getdot.ai bring engineers closer to business folks. So more honest conversations, more impact, more budget.

Disclaimer: I am a co-founder at getdot.ai :)

lmeyerov|2 years ago

Agreed, maybe I wasn't clear enough. I don't view it as BI team vs platform team vs whoever. Maybe a decrease in the need for PhD AI consultants for small projects, or to wait for some privileged IT team for basic tasks, so they can focus on bigger things.

Instead of Herculean data infra projects, this is a good time for figuring out new policy abstractions, and finding more productive divisions of labor between different days stakeholders and systems. Machine-friendly abstractions and structure are tools for predictable collaboration and automation. More doing, less waiting.

More practically, an increasing part of the Louie.ai stack is helping get the time-consuming quality, guardrails, security, etc parts under easier control of small teams building things. As-is, it takes a lot to give a great experience.

sagaro|2 years ago

There used to be a company/product called Business Objects aka BO (SAP bought them), which had folks meticulously map every relationship. When done correctly, it was pretty good. You could just drag drop and get answers immediately.

So yes, I can understand if there is incentive for the startups to invest in Data Engineers to make well maintained data models.

But I do think, the most important value here is not the chatgpt interface, it is getting DEs to maintain the data model in a company where product/biz is moving fast and breaking things. If that is done, then existing tools (Power BI for instance has "ask in natural language" feature) will be able to get the job done.

The google moment, the other person talks about in another comment, is where google or 1998 didn't require a webpage owner to do anything. They didn't need him/her to make something in a different format. Use specific tags. Use some tags around key words etc. It was just "you do what you do, and magically we will crawl and make sense of it".

Here unfortunately that is not the case. Say in a ecom business which always delivers in 2 days for free, a new product is launched (same day delivery for $5 dollars), the sales table is going to get two extra columns "is_same_day_delivery_flag" and "same_day_delivery_fee". The revenue definition will change to include this shipping charges. A new filter will be there, if someone wants to see the opt in rate for how many are going for same day delivery or how fast it is growing. Current table probably has revenue. But now revenue = revenue + same_day_delivery_fee and someone needs to make the BO connection to this. And after launch, you notice you don't have enough capacity to do same day shipping, so sometimes you just have to return the fee and send it as normal delivery. Here the is_same_day_delivery_flag is true, but the same_day_delivery_fee is 0. And so on and on...

Getting DE to keep everything up to date in a wiki is tough, let alone a BO type solution. But I do hope getdot.ai etc. someone incentivizes them to change this way of doing things.

anon291|2 years ago

The AI needs to truly be 'listening' in in a passive way to all Slack messages, virtual meetings, code commits, etc and really be present whenever the 'team' is in order to get anything done.

XCSme|2 years ago

Or maybe the database documentation has to be very comprehensive and the AI should have access to it.

unknown|2 years ago

[deleted]