Ask HN: How are you checking if your LLM is giving customers the right answer?
2 points| navaed01 | 9 months ago
There seems to be multiple failure points: hallucinations, partial responses (missing facts), saying information does not exist, response accuracy depends on how and what is being asked.
How are you measuring this in production today? - Thumbs up/ down seems like a weak signal - Running a sample of ‘known queries’ Assumes you know what is being asked.
What have you tried that works for you?
No comments yet.