(no title)
recipe19 | 7 months ago
I think it's a good reality check for the claims of impending AGI. The models still depend heavily on being able to transform other people's work.
recipe19 | 7 months ago
I think it's a good reality check for the claims of impending AGI. The models still depend heavily on being able to transform other people's work.
winrid|7 months ago
motorest|7 months ago
It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.
This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.
bapak|7 months ago
rs186|7 months ago
CalRobert|7 months ago
pygy_|7 months ago
It's a big mess.
0. https://github.com/isaacs/semicolons/blob/main/semicolons.js
poniko|7 months ago
remich|7 months ago
As a test recently I instructed an agent using Claude to create a new MCP server in Elixir based on some code I provided that was written in Python. I know that, relatively speaking, Python is over-represented in training data and Elixir is under-represented. So, when I asked the agent to begin by creating its plan, I told it to reference current Elixir/Phoenix/etc documentation using context7 and to search the web using Kagi Search MCP for best practices on implementing MCP servers in Elixir.
It was very interesting to watch how the initially generated plan evolved after using these tools and how after using the tools the model identified an SDK I wasn't even aware of that perfectly fit the purpose (Hermes-mcp).
ragequittah|7 months ago
gompertz|7 months ago
johnisgood|7 months ago
empressplay|7 months ago
timschmidt|7 months ago
vineyardmike|7 months ago
Last night I tried to build a super basic “barely above hello world” project in Zig (a language where IDK the syntax), and it took me trying a few different LLMs to find one that could actually write anything that would compile (Gemini w/ search enabled). I really wasn’t expecting it considering how good my experience has been on mainstream languages.
Also, I think OP did rather well considering BASIC is hardly used anymore.
andsoitis|7 months ago
The models don’t have a model of the world. Hence they cannot reason about the world.
pygy_|7 months ago
They don't need a formal model, they need examples from which they can pilfer.
bawana|7 months ago
hammyhavoc|7 months ago
jjmarr|7 months ago
cmrdporcupine|7 months ago
I have had it writing LambdaMOO code, with my own custom extensions (https://github.com/rdaum/moor) and it's ... not bad considering.