Actual result: "This new process promises to increase the number of optical fibers that can be connected at the edge of a chip, a measure known as beachfront density, by six times."
Faster interconnects are always nice, but this is more like routine improvement.
"In recent inference tests run on a 3-billion-parameter LLM developed from IBM’s Granite-8B-Code-Base model, NorthPole was 47 times faster than the next most energy-efficient GPU and was 73 times more energy efficient than the next lowest latency GPU."
It's also fascinating that they are experimenting with analog memory because it pairs so well with model weights
About 20 years ago the CS community was getting excited about optical memory. It promised to be huge, must faster than static RAM, and hold it's state. Tied directly to the CPU as a very large cache+RAM replacement it would have revolutionized computing. There were other advantages besides speed. One was that you could just pause the CPU, put the computer to sleep, then wake it up later and everything was already in RAM and computation would continue where it left off. Instant boot. Running apps would be instant, they were already in RAM and could be run in place. Prototypes existed but optical memory never happened commercially. Not sure I remember why, maybe couldn't scale, or manufacturing problems. There was also the problem that code is never perfect, so what to do when something stored became corrupted? Without a boot phase there would be no integrity checks.
Off topic, but does the sentence structure of STATEMENT-QUESTION MARK have a name? It's pretty annoying in my opinion. Why not write "IS the von Neumann bottleneck impeding AI computing?" instead?
IBM initially leads with the more salient point (current architecture designs are hindering frontier computing concepts), then just kinda…relents into iterative improvement.
Which is fine! I am all for iterative improvements, it’s how we got to where we are today. I just wish more folks would start openly admitting that our current architecture designs are broadly based off “low hanging fruit” of early electronics and microprocessors, followed by a century of iterative improvements. With the easy improvements already done and universally integrated, we’re stuck at a crossroads:
* Improve our existing technologies iteratively and hope we break through some barrier to achieve rapid scaling again
OR
* Accept that we cannot achieve new civilizational uplifts with existing technologies, and invest more capital into frontier R&D (quantum processing, new compute substrates, etc)
I feel like our current addiction to the AI CAPEX bubble is a desperate Hail Mary to validate our current tech as the only way forward, when in fact we haven’t really sufficiently explored alternatives in the modern era. I could very well be wrong, but that’s the read I get from the hardware side of things and watching us backslide into the 90s era of custom chips to achieve basic efficiency gains again.
That's valid jargon but from the wrong layer of the stack. A Harvard bus is about the separation of the "instruction" memory from "data" memory so that (pipelined) instructions can fetch from both in parallel. And in practice it's implemented in the L1 (and sometimes L2) cache, where you have separate icache/dcache blocks in front of a conceptually unified[1] memory space.
The "Von Neumann architecture" is the more basic idea that all the computation state outside the processor exists as a linear range of memory addresses which can be accessed randomly.
And the (largely correct) argument in the linked article is that ML computation is a poor fit for Von Neumann machines, as all the work needed to present that unified picture of memory to all the individual devices is largely wasted since (1) very little computation is actually done on individual fetches and (2) the connections between all the neurons are highly structured in practice (specific tensor rows and columns always go to the same places), so a simpler architecture might be a better use of die space.
[1] Not actually unified, because there's a page translation, IO-MMUs, fabric mappings and security boundaries all over the place that prevents different pieces of hardware from actually seeing the same memory. But that's the idea anyway.
This is being done, with great results so far. As models get better, architecture search and creation and refinment improves, driving a reinforcement loop. At some point in the near future the big labs will likely start seeing significant returns from methods like this, translating into better and faster AI for consumers.
Huh, I did not get that from the article. The main takeaway for me was doing ALU operations in memory resulting in massive energy savings. There is still a von Neumann architecture running the show.
owyn|5 months ago
https://www.science.org/doi/full/10.1126/science.adh1174
Also they've been working on this for 10+ years so it's not exactly new news.
lawlessone|5 months ago
Maybe they're hoping someone else does it.. and then pays IBM for using whatever patents they have on it.
Animats|5 months ago
Faster interconnects are always nice, but this is more like routine improvement.
bahmboo|5 months ago
It's also fascinating that they are experimenting with analog memory because it pairs so well with model weights
UltraSane|5 months ago
rapjr9|5 months ago
abrookewood|5 months ago
jesuswasrasta|5 months ago
But you're right, I think it's not even grammarly correct.
Anyway, I'd like always to remember this about headlines as a question: https://en.wikipedia.org/wiki/Betteridge's_law_of_headlines
stego-tech|5 months ago
Which is fine! I am all for iterative improvements, it’s how we got to where we are today. I just wish more folks would start openly admitting that our current architecture designs are broadly based off “low hanging fruit” of early electronics and microprocessors, followed by a century of iterative improvements. With the easy improvements already done and universally integrated, we’re stuck at a crossroads:
* Improve our existing technologies iteratively and hope we break through some barrier to achieve rapid scaling again
OR
* Accept that we cannot achieve new civilizational uplifts with existing technologies, and invest more capital into frontier R&D (quantum processing, new compute substrates, etc)
I feel like our current addiction to the AI CAPEX bubble is a desperate Hail Mary to validate our current tech as the only way forward, when in fact we haven’t really sufficiently explored alternatives in the modern era. I could very well be wrong, but that’s the read I get from the hardware side of things and watching us backslide into the 90s era of custom chips to achieve basic efficiency gains again.
yellowcake0|5 months ago
unknown|5 months ago
[deleted]
nyrikki|5 months ago
ARM processors primarily use a modified Harvard architecture, including the raspberry pi pico.
ajross|5 months ago
The "Von Neumann architecture" is the more basic idea that all the computation state outside the processor exists as a linear range of memory addresses which can be accessed randomly.
And the (largely correct) argument in the linked article is that ML computation is a poor fit for Von Neumann machines, as all the work needed to present that unified picture of memory to all the individual devices is largely wasted since (1) very little computation is actually done on individual fetches and (2) the connections between all the neurons are highly structured in practice (specific tensor rows and columns always go to the same places), so a simpler architecture might be a better use of die space.
[1] Not actually unified, because there's a page translation, IO-MMUs, fabric mappings and security boundaries all over the place that prevents different pieces of hardware from actually seeing the same memory. But that's the idea anyway.
NooneAtAll3|5 months ago
I think this post is more about... compute in memory? if I got it right?
bobmcnamara|5 months ago
Edit: see also ARM7TDMI, Cortex-m0/0+/1, and probably a few others. All the big stuff is modified Harvard or very rarely pure Harvard.
lomase|5 months ago
jedberg|5 months ago
observationist|5 months ago
This is being done, with great results so far. As models get better, architecture search and creation and refinment improves, driving a reinforcement loop. At some point in the near future the big labs will likely start seeing significant returns from methods like this, translating into better and faster AI for consumers.
greenchair|5 months ago
mwkaufma|5 months ago
bahmboo|5 months ago