top | item 46637762

(no title)

RhythmFox | 1 month ago

This isn't strictly better to me. It captures some intuitions about how a neural network ends up encoding its inputs over time in a 'lossy' way (doesn't store previous input states in an explicit form). Maybe saying 'probabilistic compression/decompression' makes it a bit more accurate? I do not really think it connects to your 'synthesize' claim at the very end to call it compression/decompression, but I am curious if you had a specific reason to use the term.

discuss

XenophileJKO|1 month ago

It's really way more interesting that that.

The act of compression builds up behaviors/concepts of greater and greater abstraction. Another way you could think about it is that the model learns to extract commonality, hence the compression. What this means is because it is learning higher level abstractions AND the relationships between these higher level abstractions, it can ABSOLUTELY learn to infer or apply things way outside their training distribution.

bhadass|1 month ago

ya, exactly... i'd also say that when you compress large amounts of content into weights and then decompress via a novel prompt, you're also forcing interpolation between learned abstractions that may never have cooccurred in training.

that interpolation is where synthesis happens. whether it is coherent or not depends.