top | item 46816092

(no title)

1 points| galsapir | 1 month ago

discuss

order

galsapir|1 month ago

we spent a few months building evals for a health agent (and the agent itself!). tried to apply anthropic's framework to a real system looking at CGM data + diet. some of it worked. we got decent at checking form — citations exist, tools were called, numbers trace back. the harder part was essence — is this clinically appropriate? actually helpful? we didn't really solve that. curious if others building health/bio agents have found ways around this, or if everyone's just accepting fuzzy metrics for the stuff that matters.