(no title)
akie | 8 days ago
And if the hardware is real and functional, as you can independently verify by chatting with that thing, how much more effort would it be to etch more recent models?
The real question is of course: what about LARGER models? I'm assuming you can apply some of the existing LLM inference parallelization techniques and split the workload over multiple cards. Some of the 32B models are plenty powerful.
It's a proof of concept, and a convincing one.
No comments yet.