(no title)
edunteman | 4 months ago
Right now, we assume first call is correct, and will eagerly take the first match we find while traversing the tree.
One of the worst things that could currently happen is we cache a bad run, and now instead of occasional failures you’re given 100% failures.
A few approaches we’ve considered - maintain a staging tree, and only promote to live if multiple sibling nodes (messages) look similar enough. Decision to promote could be via tempting, regex, fuzzy, semantic, or LLM-judged - add some feedback APIs for a client to score end-to-end runs so that path could develop some reputation
toobulkeh|4 months ago