top | item 44443846

(no title)

There is an excellent talk by Jack Rae called “compression for AGI”, where he shows (what I believe to be) a little known connection between transformers and compression;

In one view, you can view LLMs as SOTA lossless compression algorithms, where the number of weights don’t count towards the description length. Sounds crazy but it’s true.

discuss

swyx|8 months ago

his talk here https://www.youtube.com/watch?v=dO4TPJkeaaU

and his last before departing for Meta Superintelligence https://www.youtube.com/live/U-fMsbY-kHY?si=_giVEZEF2NH3lgxI...

Workaccount2|8 months ago

A transformer that doesn't hallucinate (or knows what is a hallucination) would be the ultimate compression algorithm. But right now that isn't a solved problem, and it leaves the output of LLMs too untrustworthy to use over what are colloquially known as compression algorithms.

Nevermark|8 months ago

It is still task related.

Compressing a comprehensive command line reference via model might introduce errors and drop some options.

But for many people, especially new users, referencing commands, and getting examples, via a model would delivers many times the value.

Lossy vs. lossless are fundamentally different, but so are use cases.