(no title)
richardblythman | 6 months ago
I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.
Existing code generation benchmarks focus mainly on self-contained code snippets and compare models not agents. Almost none focus on library-specific generation.
So we built a simple app to test how well coding agents interact with libraries: • Takes your library’s docs • Automatically extracts usage examples • Tasks AI agents (like Claude Code) with generating those examples from scratch • Logs mistakes and analyzes performance
We’re testing libraries now, but it’s early days. If you're interested: Input your library, see what breaks, spot patterns, and share the results below.
We plan to expand to more coding agents, more library-specific tasks, and new metrics. Let us know what we should prioritize next.
bdhcuidbebe|6 months ago
> I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.
Why should they even bother to answer such a loaded and hypothetical question?
richardblythman|6 months ago
justonceokay|6 months ago
grim_io|6 months ago
I could be wrong.
dotancohen|6 months ago
add-sub-mul-div|6 months ago
mxkopy|6 months ago
weitendorf|6 months ago
spankalee|6 months ago
richardblythman|6 months ago