top | item 44672020

(no title)

adinagoerres | 7 months ago

Hey HN, I'm Adina, Stefan's co-founder at superglue. When we started working on LLM-powered integrations about a year ago, the models were barely good enough to handle simple mappings. We started benchmarking our performance as an internal evals project and thought it would be fun to open source it, to create more transparency around LLM performance. Our goal here is to understand how we can make agents production-ready and improve reliability across the board.

discuss

order

hoerzu|7 months ago

Love the benchmarks. Is better to use single LLM for performance or would always advise to add a self reflection step

adinagoerres|7 months ago

self-reflection is very important for both humans and LLMs, indeed