top | item 44852533 LLM Evals Are Just Tests. Why Are We Making This So Complicated? 3 points| camwest | 6 months ago |cameronwestland.com 2 comments order hn newest 8organicbits|6 months ago So, did the tests allow you to build a system that never confused existing features with new features? That seems like the problem statement, but I think I'm only seeing probabilistic testing. camwest|6 months ago Never? No. Way less likely? Yes!In dev we do 100 consistency checks and get green. In CI we do 10.
8organicbits|6 months ago So, did the tests allow you to build a system that never confused existing features with new features? That seems like the problem statement, but I think I'm only seeing probabilistic testing. camwest|6 months ago Never? No. Way less likely? Yes!In dev we do 100 consistency checks and get green. In CI we do 10.
camwest|6 months ago Never? No. Way less likely? Yes!In dev we do 100 consistency checks and get green. In CI we do 10.
8organicbits|6 months ago
camwest|6 months ago
In dev we do 100 consistency checks and get green. In CI we do 10.