Thing that I don't understand about LLMs at all, is that how it is possible to for it to "understand" and reply in hex (or any other encoding), if it is a statistical "machine"? Surely, hex-encoded dialogues is not something that is readily present in dataset? I can imagine that hex sequences "translate" to tokens, which are somewhat language-agnostic, but then why quality of replies drastically differ depending on which language you are trying to commuicate with it? How deep that level of indirection goes? What if it would be double-encoded to hex? Triple?If someone has insight, can you explain please?
armcat|1 year ago
timeattack|1 year ago
B1FF_PSUVM|1 year ago
That's intriguing, and would make a good discussion topic in itself. Although I doubt the "we have the same thing in [various languages]" bit.
unoti|1 year ago
It develops understanding because that's the best way for it to succeed at what it was trained to do. Yes, it's predicting the next token, but it's using its learned understanding of the world to do it. So this it's not terribly surprising if you acknowledge the possibility of real understanding by the machine.
As an aside, even GPT3 was able to do things like english -> french -> base64. So I'd ask a question, and ask it to translate its answer to french, and then base64 encode that. I figured there's like zero chance that this existed in the training data. I've also base64 encoded a question in spanish and asked it, in the base64 prompt, to respond in base64 encoded french. It's pretty smart and has a reasonable understanding of what it's talking about.
unknown|1 year ago
[deleted]
ImHereToVote|1 year ago
circuit10|1 year ago
(this is an opinion about how we use certain words and not an objective fact about how LLMs work)
timeattack|1 year ago
Distinctive part is hidden in the task: you, being presented with, say, triple-encoded hex message, would easily decode it. Apparently, LLM would not. o1-pro, at least, failed spectacularly, on the author's hex-encoded example question, which I passed through `od` twice. After "thinking" for 10 minutes it produced the answer: "42 - That is the hidden text in your hex dump!". You may say that CoT should do the trick, but for whatever reason it's not working.
timeattack|1 year ago
Like, say, `vim` is a complex and polished tool. I routinely use it to solve various problems. Even if I would give LLM full keyboard & screen access, would be able to solve those problems for me? I don't think so. There is something missing here. You can say, see, there are various `tools` API-level integrations and such, but is there any real demonstration of "intelligent" use of those tools by AI? No, because it would be the AGI. Look, I'm not saying that AI would never be able to do that or that "we" are somehow special.
You, even if given something as crude as `ed` from '73 and assembler, would be able to write an OS, given time. LLMs can't even figure out `diff` format properly using so much time and energy that none of us would ever have.
You can also say, that brains do some kind of biological level RL driven by utility function `survive_and_reproduce_score(state)`, and it might be true. However given that we as humankind at current stage do not needed to excert great effort to survive and reproduce, at least in Western world, some of us still invent and build new tools. So _something_ is missing here. Question is what.
chpatrick|1 year ago
rcdwealth|1 year ago
[deleted]
generalizations|1 year ago
cle|1 year ago
(Granted the definition of “statistical machine” is quite vague and different folks might define that differently…)
Weryj|1 year ago
godelski|1 year ago
The encoding puts the information into latent vector representations. Then the information is actually processed in this latent space. You are working on highly compressed data. Then there's decoding which brings it back to a representation we understand. This is the same reason you can highly train on one language and be good at translation.
This is over simplified as everything is coupled. But it can be difficult to censor because the fun nature of high dimensional spaces in addition to coupling effects (superposition)
donkeyboy|1 year ago
teruakohatu|1 year ago
quectophoton|1 year ago
Something like a first pass on the input to detect language or format, and try to do some adjustments based on that. I wouldn't be surprised if there's a hex or base64 detection and decoding pass being done as pre-processing, and maybe this would trigger a similar post-processing step.
And if this is the case, the censorship could be running at a step too late to be useful.
nurettin|1 year ago
f1shy|1 year ago
AutistiCoder|1 year ago
ustad|1 year ago
Kostchei|1 year ago
z3c0|1 year ago
fragmede|1 year ago
anxoo|1 year ago