top | item 39787015

(no title)

sadhorse | 1 year ago

Does every token requires a full model computation?

discuss

order

onedognight|1 year ago

No, you can cache some of the work you did when processing the previous tokens. This is one of the key optimization ideas designed into the architecture.