(no title)
bigattichouse | 1 year ago
It was when I realized that XNOR and population count could basically score 32 dimensions at a time.
While this isn't ANYTHING like an actual quantized LLM, I thought it was a really nice proof-of-concept, and could be very useful for smaller machines running RAG applications.
My Code: https://github.com/bigattichouse/bitvector_research
NOTE: I'm not saying 30X faster than GPUs, but CPU implementations could be 30X faster.
No comments yet.