In the networking world Fulcrum built some very low latency switch chips used in switch routers using asynchronous logic. The Alta switch chip was the last of that generation.
Intel acquired Fulcrum and has not had a new product. One can speculate that they were acquired in part for their experience and tool to design asynchronous pipelines.
In the DSP world Octasic makes DSPs that use asynchronous desisns:
>> Imagine what software would be like if subroutines
could start and end only at preset time intervals. “My subroutines all start at 3.68 millisecond intervals; how
often do yours start?”
Mine start at 50 microsecond intervals. I've worked on stuff with shorter and longer intervals. Sometimes we have lists of tasks that need to run at different rates, so scheduling becomes a real pain. Welcome to the world of real time embedded software in high performance systems. The same thing applies, we make sure the worst case execution time is within the allowed intervals and use a master clock to sync everything up.
I've done quite a few of these embedded systems with real time constraints and your summary is quite accurate. The good part for me is that once you have things nailed down they (usually, and if not then you're really in for a long night) don't shift and what works will continue to work reliably.
This in contrast to non-real-time systems which tend to just freeze for random periods of time (sometimes seconds or even minutes) without any apparent cause. That's something that really puzzles me about todays software+hardware. In theory it should all be faster than ever but in practice I spend as much or even more time waiting for my computers than I ever did in the past.
Maybe I'm just more impatient but I don't believe that's the reason here.
Real time should be the norm, not the exception, just like encrypted communications should be the norm, not the exception.
Computers should respond without noticeable latency to user input at all times.
Asynchronous logic is significantly more power efficient, so it may be one approach to "save Moore's Law" (for one generation perhaps). But it would probably require some company that really cares about power efficiency, doesn't care about industry best practices, and is willing to risk hundreds of millions in R&D.
Most of the money would be in development, not research. Elegant designs and methods exist, and the designs can be iterated very quickly. The very real problem has been interfacing with the rest of the world. Not at the hardware level (see the switch designs mentioned elsewhere in this thread), but in the interoperability of design software and other tools. Any company will need either their own design software, or to do a massive hack-job to get synchronous design software to work well without that bedrock assumption.
This reminded me of one of Gustafson's reasoning for change in how numerical computations are done - currently used principles of hw architecture result in hw wasting lots of energy and time, mostly in the process where numbers get from RAM to CPU and back. It seems more people already realize this, which is good. I hope to see some general purpose hw inspired by these ideas of efficient computation.
The inefficiency you are talking about is not due to the fact that there is a synchronous clock (within each individual "block", since there needs to be some async logic going between the different clock domains of DRAM and the processor). The waste in getting numbers from RAM to the register file is primarily due to the hardware managed cache hierarchy, which we are addressing at REX Computing, along with John Gustafson as an advisor.
Hm, I came up with this idea independently, 5 < years < 10 ago, after reading the first third of Code: The Hidden Language of Computer Hardware and Software.
Neat!
I just figured that you could redesign common ICs so that they had a new wire akin to the "carry" bit. I called it the 'done' wire, and I figured you could just tie it to the CLK of the next IC. Ya know? So 'doneness' would propagate across the surface of the motherboard (or SoC) in different ways depending on the operation it was performing. Rather than the CLK signal, which is broadcast to all points...
(I know that my idea is half baked and my description is worse. I'm glad I found this PDF!)
I knew the big advantage would be power savings. I called the idea 'slow computing', and I envisioned an 8-bit style machine that would run on solar or a hand crank and be able to pause mid calculation until enough power was available... Just like a old capacitor-based flash camera will be able to flash more frequently when you have fresh batteries in it.
You'd just wire the power system up with the logic. Suppose an adder fires a "done" at some other IC. Now, put your power system inline, like MiTM... When it gets the "done", it charges that capacitor (a very small one? :) ) and only when enough power is available does it propagate the "done". ...Maybe the "done" powers the next IC. I dunno.
As I said, half baked. Glad to find out that I'm not the only one that dreamed of 'clockless', though!
The big issue with the done signal you're referring to is how do you generate it? In other words, how does the circuit "know" that it's finished execution?
There are several options. One is to simply add a delay element to each circuit that is matched to the circuit's delay. Another is to use a circuit-level handshaking protocol, similar to that used in TCP.
It's not an easy thing to tackle and leads to performance loss in the long run relative to a synchronous design.
There doesn't seem to be any sign of recent activity on the asynchronous research center site affiliated with the article. Is anyone aware of currently active academic or industrial research groups in this field?
I would personally love try to design some fancy asynchronous stuff, but I got the impression the impression that current FPGAs would make this difficult.
[+] [-] francoisLabonte|9 years ago|reply
http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/...
Intel acquired Fulcrum and has not had a new product. One can speculate that they were acquired in part for their experience and tool to design asynchronous pipelines.
In the DSP world Octasic makes DSPs that use asynchronous desisns:
http://www.octasic.com/technology/opus-dsp-architecture
[+] [-] wmf|9 years ago|reply
[+] [-] phkahler|9 years ago|reply
Mine start at 50 microsecond intervals. I've worked on stuff with shorter and longer intervals. Sometimes we have lists of tasks that need to run at different rates, so scheduling becomes a real pain. Welcome to the world of real time embedded software in high performance systems. The same thing applies, we make sure the worst case execution time is within the allowed intervals and use a master clock to sync everything up.
[+] [-] jacquesm|9 years ago|reply
This in contrast to non-real-time systems which tend to just freeze for random periods of time (sometimes seconds or even minutes) without any apparent cause. That's something that really puzzles me about todays software+hardware. In theory it should all be faster than ever but in practice I spend as much or even more time waiting for my computers than I ever did in the past.
Maybe I'm just more impatient but I don't believe that's the reason here.
Real time should be the norm, not the exception, just like encrypted communications should be the norm, not the exception.
Computers should respond without noticeable latency to user input at all times.
[+] [-] wmf|9 years ago|reply
[+] [-] wnoise|9 years ago|reply
[+] [-] B1FF_PSUVM|9 years ago|reply
I'm scratching it up to "it's the future, and always will be ..."
[+] [-] effie|9 years ago|reply
[+] [-] trsohmers|9 years ago|reply
[+] [-] peter_d_sherman|9 years ago|reply
Warning: Slightly commercial in nature. But some good information about how it works starting on page 4. Worth reading from there.
[+] [-] EdwardCoffin|9 years ago|reply
Archive.org has some of their old FLEET architecture papers and slide decks: [2]
[1] https://news.ycombinator.com/item?id=11425533
[2] https://web.archive.org/web/20120227072220/http://fleet.cs.b...
[+] [-] daveloyall|9 years ago|reply
Neat!
I just figured that you could redesign common ICs so that they had a new wire akin to the "carry" bit. I called it the 'done' wire, and I figured you could just tie it to the CLK of the next IC. Ya know? So 'doneness' would propagate across the surface of the motherboard (or SoC) in different ways depending on the operation it was performing. Rather than the CLK signal, which is broadcast to all points...
(I know that my idea is half baked and my description is worse. I'm glad I found this PDF!)
I knew the big advantage would be power savings. I called the idea 'slow computing', and I envisioned an 8-bit style machine that would run on solar or a hand crank and be able to pause mid calculation until enough power was available... Just like a old capacitor-based flash camera will be able to flash more frequently when you have fresh batteries in it.
You'd just wire the power system up with the logic. Suppose an adder fires a "done" at some other IC. Now, put your power system inline, like MiTM... When it gets the "done", it charges that capacitor (a very small one? :) ) and only when enough power is available does it propagate the "done". ...Maybe the "done" powers the next IC. I dunno.
As I said, half baked. Glad to find out that I'm not the only one that dreamed of 'clockless', though!
[+] [-] Cyph0n|9 years ago|reply
There are several options. One is to simply add a delay element to each circuit that is matched to the circuit's delay. Another is to use a circuit-level handshaking protocol, similar to that used in TCP.
It's not an easy thing to tackle and leads to performance loss in the long run relative to a synchronous design.
[+] [-] Cyph0n|9 years ago|reply
http://www.cs.columbia.edu/async/
[+] [-] grandalf|9 years ago|reply
[+] [-] gradschool|9 years ago|reply
[+] [-] Ericson2314|9 years ago|reply
[+] [-] tomgreen000|9 years ago|reply
[+] [-] lasermike026|9 years ago|reply
I'm for approaches that may be superior overall.
[+] [-] unixhero|9 years ago|reply
Didn't understand it though.