It's obviously a productive change and kudos for taking it on, but much of the enthusiasm being generated here was driven by the entirely unanticipated prospect of running a model at full speed using less memory than the model's own footprint, and by the notion that inference with a dense model somehow behaved in a sparse manner at runtime. Best to be a bit more grounded here, particularly with regard to claims that defy common understanding.
jart|2 years ago
unknown|2 years ago
[deleted]