top | item 46815576

(no title)

maxchehab | 1 month ago

Trust is the hardest part to scale here.

We're building something similar and found that no matter how good the agent loop is, you still need "canonical metrics" that are human-curated. Otherwise non-technical users (marketing, product managers) are playing a guessing game with high-stakes decisions, and they can't verify the SQL themselves.

Our approach: 1. We control the data pipeline and work with a discrete set of data sources where schemas are consistent across customers 2. We benchmark extensively so the agent uses a verified metric when one exists, falls back to raw SQL when it doesn't, and captures those gaps as "opportunities" for human review

Over time, most queries hit canonical metrics. The agent becomes less of a SQL generator and more of a smart router from user intent -> verified metric.

The "Moving fast without breaking trust" section resonates, their eval system with golden SQL is essentially the same insight: you need ground truth to catch drift.

Wrote about the tradeoffs here: https://www.graphed.com/blog/update-2

discuss

order

data-ottawa|1 month ago

Yes, I’ve been working on this and you need a clear semantic layer.

If there are multiple paths or perceived paths to an answer, you’ll get two answers. Plus, LLMs like to create pointless “xyz_index” metrics that are not standard, clear, or useful. Yet i see users just go “that sounds right” and run with it.

maxchehab|1 month ago

Absolutely. We make it obvious to the user when a query/chart is using a non standard metric and have a fast SLA on finding/building the right metric.

It only works because all of the data looks the same between customers (we manage ad platform, email, funnel data).

So if we make an “email open rate” metric, that’ll amortize to other customers.