(no title)
smpanaro | 10 months ago
I wrote about it here[0] but the gist is you can have a fixed size cache and slide it in chunks with each inference. Not as efficient as a cache that grows by one each time of course.
[0]: https://stephenpanaro.com/blog/inside-apples-2023-transforme...
kamranjon|10 months ago