top | item 37265100

(no title)

regiswilson | 2 years ago

Release engineer here. That's an excellent question, and we worry about it all the time. The AI "seems" authoritative, but it can't even add 1+1 sometimes :crying-emoji:. We've tried to engineer the prompts and tooling so that it will say "I don't know" if it doesn't know. But we've still seen it say some crazy things, like "Your cluster is fine" when it clearly wasn't. :tounge-sticking-out-emoji: I guess the only real answer is you have to trust but verify.

discuss

order

JimDabell|2 years ago

> But we've still seen it say some crazy things, like "Your cluster is fine" when it clearly wasn't. :tounge-sticking-out-emoji:

It’s difficult to take you seriously when you write like this about show-stopping bugs.

regiswilson|2 years ago

I was referring to problems we found during initial development, but I appreciate that I didn't clarify that well.

frankohn|2 years ago

You need to engineer a system when the AI state something it has to give a command that should support what it says and explain how the command shows that it is true. At this point the command should be really executed and its output or error fed yo the AI so that it can confirm the statements or correct it.

I am crazy how they think a system with no feedback loop can be always accurate. Only perfect mathematics can work like this, any -like system need to have a feedback loop.

regiswilson|2 years ago

Excellent idea, we do internally feed the answers back to the system to improve its own inputs and outputs. The funniest part of some of this experience has been to find cases where even humans were hallucinating: "Hey, I thought this was shutdown?!" or "I can't find the bucket!" Even on a bad day, the humans are still ahead though.

Michelangelo11|2 years ago

Thanks for the answer. Yeah, that's pretty much what I expected would be the case. Speaking as another dev in the AI space, it seems like reliability and consistency are the hardest issues when it comes to making AI genuinely useful in production vs. just a neat toy, and there's no stock solution.

tommy_mcclung|2 years ago

Tommy, CEO here. We also have some ideas on reporting hallucinations and feeding wrong answers back into the prompts automatically to help reduce instances of hallucinations. We have a few other ideas and would welcome any ideas folks have to help with this problem.

Michelangelo11|2 years ago

After thinking about it for a bit, I have an idea that might help. The writeup is probably too long for an HN comment, though. Could I email you?

say_it_as_it_is|2 years ago

How about applying good old fashioned bean counting?