(no title)
sagaro
|
2 years ago
All these products that pitch about using AI to find insights from your data always end up looking pretty in demos and fall short in reality. This is not because the product is bad, but because there is enormous amount of nuance in DB/Tables that becomes difficult to manage. Most startups evolve too quickly and product teams generally tries to deliver by hacking some existing feature. Columns are added, some columns get new meaning, some feature is identified by looking at a combination of 2 columns etc. All this needs to be documented properly and fed to the AI and there is no incentive for anyone to do it. If the AI gives the right answer, everyone is like wow AI is so good, we don't need the BAs. If the AI gives terrible answers they are like "this is useless". No one goes "wow, the data engineering team did a great job keeping the AI relevant".
joshstrange|2 years ago
I’m not expecting any LLM to just understand it, heck another human would need the same rundown from me. Maybe it’s worth keeping this “documentation” up to date but my take away was that I couldn’t release access to the AI because it got things wrong too often and I could anticipate every question a user might ask. I didn’t want it to give out wrong answers (this DB is used for sales) since spitting out wrong numbers would be just as bad as my dashboards “lying”.
Demo DBs aren’t representative of shipping applications and so the demos using AI are able to have an extremely high success rate. My DB, with deprecated columns, possibly confusing (to other people) naming, etc had a much higher error rate.
eurekin|2 years ago
How about a chat interface, where you correct the result and provide more contextual information about those columns?
Those chats could be later fed back to the model and ran a DPO optimisation on top
eek2121|2 years ago
Billions wasted on a pointless endeavor.
10 years from now folks are going to be laughing at how billions of dollars and productivity was flushed down the drain to support Microsoft Word 2.0.
AI is a bubble. Do yourself a favor and short (or buy put options) the companies that only have "AI" for a business model.
Also short Intel, because Intel.
lmeyerov|2 years ago
* Google moment: AI can now watch and learn how you and your team do data. Around the time Google pagerank came around, the Yahoo-style search engines were highly curated, and the semantic web people were writing xml/rdf schema and manually mapping all data to it. Google replaced slow and expensive work with something easier, higher quality, and more scalable + robust. We are making Louie.ai learn both ahead of time and as the system gets used, so data people can also get their Google moment. Having a tool that works with you & your team here is amazing.
* Tableau moment: A project or data owner can now guide a lot more without much work. Dashboarding used to require a lot of low-level custom web dev etc, while Tableau streamlined it so that a BI lead good at SQL and who understood the data & design can go much further without a big team and in way less time. Understanding the user personas, and adding abstractions for facilitating them, were a big deal for delivery speed, cost, and achieved quality. Arguably the same happened as Looker in introduced LookML and foreshadowed the whole semantic layer movement happening today. To help owners ensure quality and security, we have been investing a lot in the equivalent abstractions in Louie.ai for making data and more conversational. Luckily, while the AI part is new, there is a lot more precedent on the data ops side. Getting this right is a big deal in team settings and basically any time the stakes are high.
scoot|2 years ago
Hmmm, no. Altavista was the go-to search engine at the time (launched 1995), and was a crawler (i.e. not a curated catalog/directory) based search. Lycos predates that but had keyword rather than natural language search.
Google didn't launch until 1998.
tucnak|2 years ago
j-a-a-p|2 years ago
sagaro|2 years ago
And basically when you next write queries, it just auto completes for you. This would improve the productivity of the analysts a lot. With the flexibility of them being able to tweak the query. Here if something is not right, the analyst updates. The Copilot AI keeps learning and giving weights to recent queries more than older queries.
Unlike the previous solution where if something breaks, you can do nothing till you clean up the ETL and redeploy it.
zurfer|2 years ago
Documenting data definitely helps to close that gap.
However the last part you describe is nothing new (BI teams taking credit, and pushing on problems to data engineers). In fact there is a chance that tools like vanna.ai or getdot.ai bring engineers closer to business folks. So more honest conversations, more impact, more budget.
Disclaimer: I am a co-founder at getdot.ai :)
lmeyerov|2 years ago
Instead of Herculean data infra projects, this is a good time for figuring out new policy abstractions, and finding more productive divisions of labor between different days stakeholders and systems. Machine-friendly abstractions and structure are tools for predictable collaboration and automation. More doing, less waiting.
More practically, an increasing part of the Louie.ai stack is helping get the time-consuming quality, guardrails, security, etc parts under easier control of small teams building things. As-is, it takes a lot to give a great experience.
sagaro|2 years ago
So yes, I can understand if there is incentive for the startups to invest in Data Engineers to make well maintained data models.
But I do think, the most important value here is not the chatgpt interface, it is getting DEs to maintain the data model in a company where product/biz is moving fast and breaking things. If that is done, then existing tools (Power BI for instance has "ask in natural language" feature) will be able to get the job done.
The google moment, the other person talks about in another comment, is where google or 1998 didn't require a webpage owner to do anything. They didn't need him/her to make something in a different format. Use specific tags. Use some tags around key words etc. It was just "you do what you do, and magically we will crawl and make sense of it".
Here unfortunately that is not the case. Say in a ecom business which always delivers in 2 days for free, a new product is launched (same day delivery for $5 dollars), the sales table is going to get two extra columns "is_same_day_delivery_flag" and "same_day_delivery_fee". The revenue definition will change to include this shipping charges. A new filter will be there, if someone wants to see the opt in rate for how many are going for same day delivery or how fast it is growing. Current table probably has revenue. But now revenue = revenue + same_day_delivery_fee and someone needs to make the BO connection to this. And after launch, you notice you don't have enough capacity to do same day shipping, so sometimes you just have to return the fee and send it as normal delivery. Here the is_same_day_delivery_flag is true, but the same_day_delivery_fee is 0. And so on and on...
Getting DE to keep everything up to date in a wiki is tough, let alone a BO type solution. But I do hope getdot.ai etc. someone incentivizes them to change this way of doing things.
anon291|2 years ago
XCSme|2 years ago
unknown|2 years ago
[deleted]