top | item 47002238

(no title)

ramraj07 | 16 days ago

Great post indeed but let me ask you, put yourself in the LLM shoes. Now instead of reading through coherent lines of code that is exclusively about solving problems, you now have random characters before every line that mean something (because the presence of the edit tool implies it) but not about your actual problem. Do you reckon the LLM will be distracted a little bit? The benchmark deliberately sidestep the actual intelligence of the model on the task at hand, so while the author feels successful at their subtask its very possible they've failed at the war. This seems to be the beauty of AI engineering. The smarter you think you are about something the bigger the fall.

discuss

No comments yet.