top | item 46800769

(no title)

frabonacci | 1 month ago

Fair point - we just open-sourced this so benchmark results are coming. We're already working with labs on evals, focusing on tasks that are more realistic than OSWorld/Windows Agent Arena and curated with actual workers. If you want to run your agent on it we'd love to include your results.

discuss

order

No comments yet.