top | item 39787015

(no title)

sadhorse | 1 year ago

Does every token requires a full model computation?

discuss

No, you can cache some of the work you did when processing the previous tokens. This is one of the key optimization ideas designed into the architecture.