top | item 46303841

(no title)

alberth | 2 months ago

So who's the arbiter to determine if the outcome was achieved?

And how do you programmatically measure it?

discuss

order

nerdjon|2 months ago

The obvious solution is just to throw more LLM's at it to verify the output of the other LLM and that it is doing its job...

\s (mostly because you know this will be the "Solution" that many will just run with despite the very real issue of how "persuadable" these systems are)...

The real answer is that even that will fail and there will have to be a feedback loop with a human that will likely in many cases lead to more churn trying to fix the work the AI did vs if the human just did it in the first place.

Instead of focusing on the places that using an AI tool can truly cut down on time spent like searching for something (which can still fail but at least the risk when a failure is far lower vs producing output).

rajvarkala|2 months ago

Hi alberth,

I'd assume an outcome is a negotiated agreement between buyer and Agent provider.

Think of all the n8n workflows. If we take a simple example of Expense receipt processing workflows, or a lead sourcing workflow, I'd think the outcomes can be counted pretty well. In these cases, successfully entered receipts into ERP or number of Entries captured in salesforce.

I am sure there are cases where outcomes are fuzzy, for instances employer-employee agreement.

But in some cases, for instance, my accounting agent would only get paid if he successfully uploads my tax returns.

Surely not applicable in all cases. But, in cases Where a human is measured on outcomes, the same should be applicable for agents too, I guess

htrp|2 months ago

> But in some cases, for instance, my accounting agent would only get paid if he successfully uploads my tax returns.

I think you'd want it to correctly compute your taxes. Especially if you get a letter a year or two after the fact saying you owe the government money

malux85|2 months ago

This is the problem with this, in simple cases like “you add N employees” then you can vaguely approximate it, like they do in the article.

But for anything that’s not this trivial example, the person who knows the value most accurately is … the customer! Who is also the person who is paying the bill, so there’s strong financial incentive for them not to reveal this info to you.

I don’t think this will work …

rajvarkala|2 months ago

I often go back to customer support voice AI agent example. Let's say, The bot can resolve tickets successfully at a certain rate . This is capturable easily. Why is this difficult? What cases am I missing?

higginsniggins|2 months ago

That's litterlly the job of a founder. You talk to cusomters and learn from them.