Author here. Majromax challenged me to test `i = 1 + i`, which broke my
theoretical framework. While setting up that experiment, I realized I hadn't
used chat templates in my original measurements (rookie mistake with an
Instruct model!).
Re-running with proper methodology completely flips the results - the terse
version actually wins. I'll add a correction note to the article once AWS/Medium
comes back online and will write a follow-up with the corrected experiments.
This is open science working as intended - community scrutiny improves the work.
Thank you all for the engagement, and especially to Majromax for the challenge
that led to discovering this!
This approach of solving a problem by building a low-perplexity path towards the solution reminds me of Grothendieck's approach towards solving complex mathematical problems - you gradually build a theory which eventually makes the problem obvious.
what is striking to me is how far reasoning by analogy and generalization can get you. some of the deepest theorems are about relating disparate things by analogy.
The bigger issue is that LLMs haven’t had much training on Q as there’s little publically available code. I recently had to try and hack some together and LLMs couldn’t string simple pieces of code together.
> I think the aesthetic preference for terseness should give way to the preference for LLM accuracy, which may mean more verbose code
From what I understand, the terseness of array languages (Q builds on K) serves a practical purpose: all the code is visible at once, without the reader having to scroll or jump around. When reviewing an LLM's output, this is a quality I'd appreciate.
I agree with you, though in the q world people tend to take it to the extreme, like packing a whole function into a single line rather than a single screen. Here's a ticker plant standard script from KX themselves; I personally find this density makes it harder to read, and when reading it I put it into my text editor and split semicolon-separated statements onto different lines: https://github.com/KxSystems/kdb-tick/blob/master/tick.q
E.g. one challenge I've had was generating a magic square on a single line; for odd-size only, I wrote: ms:{{[(m;r;c);i]((.[m;(r;c);:;i],:),$[m[s:(r-1)mod n;d:(c+1) mod n:#:[m]];((r+1)mod n;c);(s;d)])}/[((x;x)#0;0;x div 2);1+!:[x*x]]0}; / but I don't think that's helping anyone
First pass on my local deepseekv3.1-Terminus at Q4 answered it correctly. if anything, i think LLMs should write terse code, Q/J/APL/Forth/Prolog/Lisp, tokens is precious. It's insane to waste precious tokens generating Java, javascript and other overly verbose code...
It did go back on itself 3 times, no? "Actually, let’s trace for x=3:" (it had just computed for x=3 the first time); then "Better to check actual q output:" -- did it actually run it in a q session, or just pretended? And another one "That doesn’t seem to align. Let’s do it step by step:"
Unless the interpreter is capable of pattern-recognizing that whole pattern, that will be less efficient, e.g. having to work with 16-bit integers for x in the range 128..32767, whereas the direct version can construct the array directly (i.e. one byte or bit per element depending on whether kdb has bit booleans). Can't provide timings for kdb for licensing reasons, but here's Dyalog APL and CBQN doing the same thing, showing the fast version at 3.7x and 10.5x faster respectively: https://dzaima.github.io/paste/#0U1bmUlaOVncM8FGP5VIAg0e9cxV...
The vibe I get from q/kdb in general is that its concision has passed the point of increasing clarity through brevity and is now about some kind of weird hazing or machismo thing. I've never seen even numpy's verbosity be an actual impediment to understanding an algorithm, so we're left speculation about social and psychological explanations for why someone would write (2#x)#1,x#0 and think it beautiful.
Some brief notations make sense. Consider, say, einsum: "ij->ji" elegantly expresses a whole operation in a way that exposes the underlying symmetry of the domain. I don't think q's line noise style (or APL for that matter) is similarly exposing any deeper structure.
> When I(short proof)=I(long proof), per-token average surprisal must be lower for the long proof than for the short proof. But since surprisal for a single token is simply -log P, that would mean that, on average, the shorter proof is made out of less probable tokens.
This assertion is intuitive, but it isn't true. Per-token entropy of the long proof can be larger if the long proof is not minimal.
For example, consider the "proof" of "list the natural numbers from 1 to 3, newline-delimited." The 'short proof' is:
"1\n2\n3\n" (Newlines escaped because of HN formatting)
Now, consider the alternative instruction to give a "long proof", "list the natural numbers from 1 to 3, newline-delimited using # for comments. Think carefully, and be verbose." Trying this just now with Gemini 2.5-pro (Google AI Studio) gives me:
"# This is the first and smallest natural number in the requested sequence.\n
1
# Following the first, this is the second natural number, representing the concept of a pair.\n
2
# This is the third natural number, concluding the specified range from 1 to 3.\n
3"
We don't have access to Gemini's per-token logits, but repeating the prompt gives different comments so we can conclude that there is 'information' in the irrelevant commentary.
The author's point regains its truth, however, if we consider the space of all possible long proofs. The trivial 'long' proof has higher perplexity than the short proof, but that's because there are so many more possible long proofs of approximately equal value. The shortest possible proof is a sharp minimum, but and longer proofs are shallower and 'easier'.
The author also misses a trick with:
> Prompted with “Respond only with code: How do you increment i by 1 in Python?”, I compared the two valid outputs: i += 1 has a perplexity of approximately 38.68, while i = i + 1 has a perplexity of approximately 10.88.
… in that they ignore the equally-valid 'i = 1 + i'.
Thanks so much for this challenge! I just ran the experiment with i = 1 + i
and you're absolutely right - it breaks my theoretical framework (same semantic
information, but much higher perplexity).
While setting this up, I realized I hadn't used chat templates in my original
measurements (rookie mistake with an Instruct model!). Re-running with proper
methodology completely flips the results - the terse version actually wins.
I'll add a correction note to the article once AWS/Medium comes back online,
and will write a proper follow-up with all the corrected experiments. Your
comment literally made the research better - thank you!
I was wondering recently, is fine tuning an effective way to make this the default? If so, does fine tuning this behavior on one language have a carry-over effect on other languages (maybe even non-programming language?), or is the effect localized to the language of the fine-tuning dataset?
Disagree. If some small adjustments to your workflow or expectations enable you to use LLMs to produce good, working, high-quality code much faster than you could otherwise, at some point you should absolutely welcome this, not stubbornly refuse change.
I was kind of taken aback by the author's definition of 'terse'. I was expecting a discussion about architecture not about syntax aesthetics.
Personally I don't like short variable names, short function names or overly fancy syntactical shortcuts... But I'm obsessed with minimizing the amount of logic.
I want my codebases to be as minimalist as possible. When I'm coding, I'm discovering the correct lines, not inventing them.
This is why I like using Claude Code on my personal projects. When Claude sees my code, it unlocks a part of its exclusive, ultra-elite, zero-bs code training set. Few can tap into this elite set. Your codebase is the key which can unlock ASI-like performance from your LLMs.
My friend was telling me about all the prompt engineering tricks he knows... And in a typical midwit meme moment; I told him, dude, relax, my codebase basically writes itself now. The coding experience is almost bug free. It just works first time.
I told my friend I'd consider letting him code on my codebase if he uses an LLM... And he took me up on the offer... I merged his first major PR directly without comment. It seems even his mediocre Co-pilot was capable of getting his PR to the standard.
I'd bet a lot of people are trying to optimize their codebases for LLMs. I'd be interested to see some examples of your ASI-unlocking codebase in action!
gabiteodoru|4 months ago
Re-running with proper methodology completely flips the results - the terse version actually wins. I'll add a correction note to the article once AWS/Medium comes back online and will write a follow-up with the corrected experiments.
This is open science working as intended - community scrutiny improves the work. Thank you all for the engagement, and especially to Majromax for the challenge that led to discovering this!
neprotivo|4 months ago
https://ncatlab.org/nlab/show/The+Rising+Sea
nextaccountic|4 months ago
Which incidentally is how programming in Haskell feels like
orbifold|4 months ago
benjaminwootton|4 months ago
It’s a bizarre language.
haolez|4 months ago
sanjayjc|4 months ago
From what I understand, the terseness of array languages (Q builds on K) serves a practical purpose: all the code is visible at once, without the reader having to scroll or jump around. When reviewing an LLM's output, this is a quality I'd appreciate.
dapperdrake|4 months ago
Human language has roughly, say, 36% encoding redundancy on purpose. (Or by Darwinian selection so ruthless we might as well call it "purpose".)
gabiteodoru|4 months ago
segmondy|4 months ago
https://pastebin.com/VVT74Rp9
gabiteodoru|4 months ago
Veedrac|4 months ago
Is this... just to be clever? Why not
aka. the identity matrix is defined as having ones on the diagonal? Bonus points AI will understand the code better.sannysanoff|4 months ago
dzaima|4 months ago
quotemstr|4 months ago
Some brief notations make sense. Consider, say, einsum: "ij->ji" elegantly expresses a whole operation in a way that exposes the underlying symmetry of the domain. I don't think q's line noise style (or APL for that matter) is similarly exposing any deeper structure.
Majromax|4 months ago
This assertion is intuitive, but it isn't true. Per-token entropy of the long proof can be larger if the long proof is not minimal.
For example, consider the "proof" of "list the natural numbers from 1 to 3, newline-delimited." The 'short proof' is:
"1\n2\n3\n" (Newlines escaped because of HN formatting)
Now, consider the alternative instruction to give a "long proof", "list the natural numbers from 1 to 3, newline-delimited using # for comments. Think carefully, and be verbose." Trying this just now with Gemini 2.5-pro (Google AI Studio) gives me:
"# This is the first and smallest natural number in the requested sequence.\n 1
# Following the first, this is the second natural number, representing the concept of a pair.\n 2
# This is the third natural number, concluding the specified range from 1 to 3.\n 3"
We don't have access to Gemini's per-token logits, but repeating the prompt gives different comments so we can conclude that there is 'information' in the irrelevant commentary.
The author's point regains its truth, however, if we consider the space of all possible long proofs. The trivial 'long' proof has higher perplexity than the short proof, but that's because there are so many more possible long proofs of approximately equal value. The shortest possible proof is a sharp minimum, but and longer proofs are shallower and 'easier'.
The author also misses a trick with:
> Prompted with “Respond only with code: How do you increment i by 1 in Python?”, I compared the two valid outputs: i += 1 has a perplexity of approximately 38.68, while i = i + 1 has a perplexity of approximately 10.88.
… in that they ignore the equally-valid 'i = 1 + i'.
gabiteodoru|4 months ago
While setting this up, I realized I hadn't used chat templates in my original measurements (rookie mistake with an Instruct model!). Re-running with proper methodology completely flips the results - the terse version actually wins.
I'll add a correction note to the article once AWS/Medium comes back online, and will write a proper follow-up with all the corrected experiments. Your comment literally made the research better - thank you!
daxfohl|4 months ago
bArray|4 months ago
> Due to a global hosting outage, Medium is currently unavailable. We’re working to get you reading and writing again soon.
> — The Medium Team
Dang.
chmod775|4 months ago
Asking humans to change for the sake of LLMs is an utterly indefensible position. If humans want terse code, your LLM better cope or go home.
afc|4 months ago
mikkupikku|4 months ago
Use the tool according to how it works, not according to how you think it should work.
icsa|4 months ago
* LLMs don't understand the syntax of q (or any other programming language).
* LLMs don't understand the semantics of q (or any other programming language).
* Limited training data, as compared to kanguages like Python or javascript.
All of the above contribute to the failure modes when applying LLMs to the generation or "understanding" of source code in any programming language.
chewxy|4 months ago
I use my own APL to build neural networks. This is probably the correct answer, and inline with my experience as well.
I changed the semantics and definition of a bunch of functions and none of the coding LLMs out there can even approach writing semidecent APL.
socketcluster|4 months ago
Personally I don't like short variable names, short function names or overly fancy syntactical shortcuts... But I'm obsessed with minimizing the amount of logic.
I want my codebases to be as minimalist as possible. When I'm coding, I'm discovering the correct lines, not inventing them.
This is why I like using Claude Code on my personal projects. When Claude sees my code, it unlocks a part of its exclusive, ultra-elite, zero-bs code training set. Few can tap into this elite set. Your codebase is the key which can unlock ASI-like performance from your LLMs.
My friend was telling me about all the prompt engineering tricks he knows... And in a typical midwit meme moment; I told him, dude, relax, my codebase basically writes itself now. The coding experience is almost bug free. It just works first time.
I told my friend I'd consider letting him code on my codebase if he uses an LLM... And he took me up on the offer... I merged his first major PR directly without comment. It seems even his mediocre Co-pilot was capable of getting his PR to the standard.
ahussain|4 months ago