I'm surprised people are surprised. Of course this is possible, and of course this is the future. This has been demonstrated already: why do you think we even have GPUs at all?! Because we did this exact same transition from running in software to largely running in hardware for all 2D and 3D Computer Graphics. And these LLMs are practically the same math, it's all just obvious and inevitable, if you're paying attention to what we have, what we do to have what we have.
the__alchemist|7 days ago
Generally, you use an ASIC to perform a specific task. In this case, I think the takeaway is the LLM functionality here is performance-sensitive, and has enough utility as-is to choose ASIC.
RobotToaster|7 days ago
GTP|7 days ago
JKCalhoun|7 days ago
I think burning the weights into the gates is kinda new.
("Weights to gates." "Weighted gates"? "Gated weights"?)
Zetaphor|7 days ago
brookst|7 days ago
dogma1138|7 days ago
It’s also not that different than how TPUs work where they have special registers in their PEs for weights.
IshKebab|7 days ago
We transitioned from software on CPUs to fixed GPU hardware... But then we transitioned back to software running on GPUs! So there's no way you can say "of course this is the future".
rembal|7 days ago
darkwater|7 days ago
luckydata|7 days ago
iugtmkbdfil834|7 days ago
To your point, its neat tech, but the limitations are obvious since 'printing' only one LLM ensures further concentration of power. In other words, history repeats itself.
pwarner|7 days ago
I don't expect it's like super commercially viable today, but for sure things need to trend to radically more efficient AI solutions.
saati|7 days ago
MarsIronPI|7 days ago
theptip|7 days ago
I think the interesting point is the transition time. When is it ROI-positive to tape out a chip for your new model? There’s a bunch of fun infra to build to make this process cheaper/faster and I imagine MoE will bring some challenges.
dyauspitr|7 days ago