(no title)
smpanaro | 10 months ago
Maybe cache is the wrong word. This is a limit to how much can be mmap'd for the ANE at once. It's not too hard to hit on M1 if your model is in the GB range. Chunking the model into smaller pieces makes it more likely to "fit", but if it doesn't fit you have to unmap/remap in each forward pass which will be noticeable.
Awesome to hear about ModernBERT! Big fan of your work as well :)
anemll|10 months ago