top | item 45089411

(no title)

JoshCole | 6 months ago

> We are now getting an equivalent definition of what neural nets are being trained for! LLMs are trained to compress the internet as much as possible!

Nice payoff. Others have also called out the relationship to compression (https://www.youtube.com/watch?v=AKMuA_TVz3A).

discuss

CuriouslyC|6 months ago

Framing it as compression is reductive (intended). Yes compression of information is a proxy measure of Kolmogorov complexity, however it's really more accurate to say you're accurately mapping the conditional probability distribution, since it's a stochastic machine that produces samples from a distribution, not a literal compressed representation of anything (you have to do work to extract this stuff and it's not 100% in all cases).

vasekrozhon|5 months ago

Hi, the author here.

@JoshCole Thanks -- I also like that lecture by Ilya, added it to the resources now.

@0ffh -- Yeah I hope the page does not come across as me trying to claim some new revolutionary insights. People like Ilya have been talking about this years ago - I am just trying to package it into a hopefully more accessible format.

_0ffh|6 months ago

The relationship has been thought about for a long time. In 2006 it even led to the creation of the Hutter Prize, with around 38k€ payed out so far.