top | item 43190585

(no title)

volodia | 1 year ago

Great question! The model can more efficiently leverage existing GPU hardware---it performs more computation per unit of memory transferred; this means that on older hardware one should be able to get similar inference speeds as one would get on recent hardware with a classical LLM. This is actually interesting commercially, since it opens new ways of reducing AI inference costs.

discuss

No comments yet.