(no title)
bgirard | 24 days ago
I wish they would share the full conversation, token counts and more. I'd like to have a better sense of how they normalize these comparisons across version. Is this a 3-prompt 10m token game? a 30-prompt 100m token game? Are both models using similar prompts/token counts?
I vibe coded a small factorio web clone [1] that got pretty far using the models from last summer. I'd love to compare against this.
veb|24 days ago
bgirard|24 days ago
This was built using old versions of Codex, Gemini and Claude. I'll probably work on it more soon to try the latest models.