top | item 44510382

(no title)

clord | 7 months ago

There is something deep in this observation. When I reflect on how I write code, sometimes it’s backwards. Sometimes I start with the data and work back through to the outer functions, unnesting as I go. Sometimes I start with the final return and work back to the inputs. I notice sometimes LLMs should work this way, but can’t. So they end up rewriting from the start.

Makes me wonder if future llms will be composing nonlinear things and be able to work in non-token-order spaces temporarily, or will have a way to map their output back to linear token order. I know nonlinear thinking is common while writing code though. current llms might be hiding a deficit by having a large and perfect context window.

discuss

hnuser123456|7 months ago

Yes, there are already diffusion language models, which start with paragraphs of gibberish and evolve them into a refined response as a whole unit.

altruios|7 months ago

Right, but that smoothly(ish) resolves all at the same time. That might be sufficient, but it isn't actually replicating the thought process described above. That non-linear thinking is different than diffuse thinking. Resolving in a web around a foundation seems like it would be useful for coding (and other structured thinking, in general).

saurik|7 months ago

The process of developing software involves this kind of non-linear code editing. When you learn to do something (and the same should go for code, even if sometimes people don't get this critical level of instruction), you don't just look at the final result: you watch people construct the result. The process of constructing code involves a temporarily linear sequence of operations on a text file, but your cursor is bouncing around as you put in commands that move your cursor through the file. We don't have the same kind of copious training data for it, but thereby what we really need to do is to train models not on code, but on all of the input that goes into a text editor. (If we concentrate on software developers that are used to do doing work entirely in a terminal this can be a bit easier, as we can then just essentially train the model on all of the keystrokes they press.)

UltraSane|7 months ago

I think long term LLMs should directly generate Abstract Syntax Trees. But this is hard now because all the training data is text code.

saurik|7 months ago

The training data is text code that can be compiled, though, so the training data can also easily be an Abstract Syntax Tree.

undfined|7 months ago

There's a fair amount of experimental work happening trying different parsing and resolution procedures such that the training data reflects an AST and or predicts nodes in an AST as an in-filling capability.

kenjackson|7 months ago

It's possible that LLMs build ASTs internally for programming. I have no 1st hand data on this, but it would not surprise me at all.

lelanthran|7 months ago

> Sometimes I start with the final return and work back to the inputs.

Shouldn't be hard to train a coding LLM to do this too by doubling the training time: train the LLM both forwards and backwards across the training data.

jdiff|7 months ago

GP is talking about the nonlinear way that software engineers think, reason, and write down code. Simply doing the same thing but backwards provides no benefit.