(no title)
joefourier | 2 months ago
They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.
I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).
treespace8|2 months ago
One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.
For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)
LLMs still suffer from Garbage in Garbage out.
jennyholzer3|2 months ago
edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
memoriuaysj|2 months ago
after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
belter|2 months ago
"Write a chess engine where pawns move backward and kings can jump like nights"
It will keep slipping back into real chess rules. It learned chess, it did not understand the concept of "rules"
Or
Ask it to reverse a made up word like
"Reverse the string 'glorbix'"
It will get it wrong on the first try. You would not fail.
Or even better ask it to...
"Use the dxastgraphx library to build a DAG scheduler."
dxastgraphx is a non existing library...
Marvel at the results...tried in both Claude and ChatGPT....
manmal|2 months ago
bossyTeacher|2 months ago
Answer to second question:
"I can do that, but there’s a catch: dxastgraphx is not a known or standard Python (or JS) library as of now. I don’t have any public API or documentation for it to target directly.
To avoid guessing wrong, here are the two sensible options:"
somebodythere|2 months ago
criticalfault|2 months ago
baq|2 months ago