(no title)
az09mugen | 18 days ago
For example I would not want a zip of an encyclopedia that uncompresses to unverified, approximate and sometimes even wrong text. According to this site : https://www.wikiwand.com/en/articles/Size%20of%20Wikipedia a compressed Wikipedia without medias, just text is ~24GB. What's the medium size of an LLM, 10 GB ? 50 GB ? 100 GB ? Even if it's less, it's not an accurate and deterministic way to compress text.
Yeah, pretty easy to calculate...
nagaiaida|18 days ago
notpushkin|18 days ago
With the usual interface it’s probably inefficient: giving just a prompt alone might not produce the output we need, or it might be larger than the thing we’re trying to compress. However, if we also steer the decisions along the way, we can probably give a small prompt that gets the LLM going, and tweak its decision process to get the tokens we want. We can then store those changes alongside the prompt. (This is a very hand-wavy concept, I know.)
duskwuff|18 days ago
From what I understand, this is essentially how ts_zip (linked elsewhere) works.
program_whiz|17 days ago
Of course the optimal sequence to produce the output will be a series of word vectors (of multi-hundreds of dimensions). You could match each to its closest word in any language (or make this a constraint during solving), or just use the vectors themselves as the compressed data value.
Ultimately, NNets of various kinds are used for compression in various contexts. There are some examples where guassian-splatting-like 3d scenes are created by comrpessing all the data into the weights of a nnet via a process similar to what I described to create a fully explorable 3d color scene that can be rendered from any angle.
microtonal|17 days ago
As temperatures approach zero, the probability of the most likely token approaches one (assuming no ties). So my guess is that LLM inference providers started using temperature=0 to disable sampling because people would try to approximate greedy decoding by using teensy temperatures.
D-Machine|17 days ago
Ignoring GPU indeterminism, if you are running a local LLM and control batching, yes.
If you are computing via API / on the cloud, and so being batched with other computations, then no (https://thinkingmachines.ai/blog/defeating-nondeterminism-in...).
But, yes, there is a lot of potential from semantic compression via AI models here, if we just make the efforts.