top | item 47111557

(no title)

I'm not sure, but I suspect that LLM weights don't compress all that well. The intuition here is that training an LLM is compression of the training data into the weights, so they are probably very information dense already. Can't squeeze them down much.

discuss

spwa4|6 days ago

I've found this to often be untrue when optimizing on the CPU. I wish someone would pay me to dive deep into this problem and the scheduling problem. I'd be amazed if I can't squeeze out a 50% speed increase on both problems.