top | item 44672020

(no title)

Hey HN, I'm Adina, Stefan's co-founder at superglue. When we started working on LLM-powered integrations about a year ago, the models were barely good enough to handle simple mappings. We started benchmarking our performance as an internal evals project and thought it would be fun to open source it, to create more transparency around LLM performance. Our goal here is to understand how we can make agents production-ready and improve reliability across the board.

discuss

hoerzu|7 months ago

Love the benchmarks. Is better to use single LLM for performance or would always advise to add a self reflection step

adinagoerres|7 months ago

self-reflection is very important for both humans and LLMs, indeed