(no title)
QueensGambit | 24 days ago
I've been arguing the same for code generation. LLMs flatten parse trees into token sequences, then burn compute reconstructing hierarchy as hidden states. Graph transformers could be a good solution for both: https://manidoraisamy.com/ai-mother-tongue.html
mkmccjr|16 days ago
In the physics/science cases I work with, the factorization is usually between the physical law (shared structure) and the experimental conditions (dataset-specific structure). If you don't separate them, the model wastes capacity trying to memorize the noise of the experimental conditions. (It's ineffective as well as wasteful.)
The analogy to code generation makes a lot of sense: flattening a tree into a sequence forces the model to infer syntax that was already explicit. Thank you for the link; I look forward to diving into it!