top | item 45371244

Ask HN: Non-career ending way to let agents touch internal structured data?

2 points| Hoshang07 | 5 months ago

right now i wouldn’t trust an ai agent anywhere near my warehouse with raw sql. too risky. what i really want is simple: let the agent only “search” a safe view that i define.. say a join of customers, tickets, and product events.. instead of having free rein over every table.

the approach i’m experimenting with is: i build a malloy model of that view (so it’s more graph-like and semantic), then i generate custom mcp tools on top of it. those tools basically act as guardrails - the agent can only call “get_customer_tickets” or “fetch_usage_events” instead of writing arbitrary queries. These tools are embedded into my malloy model so it's not really a sql query its firing on that view, instead it's searching by filtering from my view. finally, i give access to those tools into my agent builder.

this feels cleaner and safer, but i'm new to this and not sure if this is the right pattern. is anyone here tackling this differently? are there any efficient ways to give agents scoped, governed access to internal data without shooting yourself in the foot? what are some more things i should consider here?

4 comments

order

mindcrime|5 months ago

> right now i wouldn’t trust an ai agent anywhere near my warehouse with raw sql.

Same. And definitely not with read/write access. But even RO is risky due to the possibility of an unintentional DDOS via a bad query, etc.

> custom mcp tools on top of it. those tools basically act as guardrails - the agent can only call “get_customer_tickets” or “fetch_usage_events” instead of writing arbitrary queries.

This is more or less what I would do, although I would say it's neither here nor there to me whether one uses a view or not. But I'd definitely advocate building dedicated, purpose specific tools, with known queries, and let the agent use those. At least in the near-term.

If it proves too unwieldy to implement all those tools, I might could be convinced to let the agent generate its own queries to use, but I'd insist on a HITL mechanism to have a given query reviwed before it is ever allowed to be executed the first time.

Hoshang07|5 months ago

yes.. that makes a lot of sense to me. I'm almost imagining building an evals layer to show which views were useful (or not) for the agent to do it's thing.. so i'd know whether to increase/decrease the scope of the views + further tuning the custom mcp layers. thoughts on how you'd imagine doing this?

raxxorraxor|5 months ago

I currently don't allow agents to write arbitrary queries. But if I wanted to, I would probably just use the security model of the DB and give the AI a user that is restricted to data it can access.

AI is fairly apt at writing efficient queries for that matter and I use it a lot for this purpose.

Hoshang07|5 months ago

that's a good baseline..db rbac locks scope, but i guess not behavior. inside that schema the agent can still fire off wide scans, weird joins, or grab way more rows than needed.. Ai can write apt queries for sure but there's always a dependancy on the kinda prompt it gets. you may also lose semantic clarity and telemetry .. no way to know if it actually pulled the right data. how would you enforce “only these lookups” or “only these joins” with just db perms? is there a way?