top | item 39787015 (no title) sadhorse | 1 year ago Does every token requires a full model computation? discuss order hn newest onedognight|1 year ago No, you can cache some of the work you did when processing the previous tokens. This is one of the key optimization ideas designed into the architecture.
onedognight|1 year ago No, you can cache some of the work you did when processing the previous tokens. This is one of the key optimization ideas designed into the architecture.
onedognight|1 year ago