(no title)
simedw | 2 months ago
I recently had some trouble converting a HF transformer I trained with PyTorch to Core ML. I just couldn’t get the KV cache to work, which made it unusably slow after 50 tokens…
simedw | 2 months ago
I recently had some trouble converting a HF transformer I trained with PyTorch to Core ML. I just couldn’t get the KV cache to work, which made it unusably slow after 50 tokens…
samwho|2 months ago
Yes, I recently wrote https://github.com/samwho/llmwalk and had a similar experience with cache vs no cache. It’s so impactful.
mrgaro|2 months ago