top | item 40289622

R2E: Turning Any GitHub Repository into a Programming Agent Environment

1 points| slimshetty | 1 year ago |r2e.dev

1 comment

Current code generation systems are evaluated on static benchmarks like HumanEval, which comprise isolated code snippets lacking real-world aspects of programming, like dealing with large codebases, dependencies, and execution environments. While GitHub repositories provide a rich source of real-world codebases, evaluating code generation systems on them is challenging due to the lack of test harnesses associated with the code.

We present R2E, a scalable framework that turns any GitHub repository into an environment for programming agents. These environments can be used to benchmark programming agents that can interact with interpreters on repository-level problems. The system is designed to be scalable and can be used to evaluate code generation, optimization, and refactoring on public and _private_ repos. Further, R2E also enables the collection of large-scale execution traces to improve LLMs themselves.