Hell yes! Intel chips are about to get exciting again. SGI put FPGA's on nodes connected to its NUMA interconnect with great results. Intel will likely put it on its network on chip with more bandwidth and integration while pushing latency down further. 90's era tools that automatically partitioned an app between a CPU and FPGA can be revived now once Intel knocks out those obstacles that held them back.
Combine that with OSS developments by Clifford Wolf and Synflow in synthesis that can be connected to OSS FPGA tools to show even more potential here. Exciting time in HW field.
Even more exciting is the OmniPath[1] stuff that came out as a result of the Infiniband acquisition. RDMA + Xeon Phi + the insane number of PCI-e lanes[2] available for those new M.2 SSDs which post just absolutely insane numbers[3] all of which are supported by ICC[4] and you've got a really budget-friendly HPC setup. I'm really hoping for IBM's OpenPOWER to gain traction because Intel is poised to capture the mid-market in a dramatic fashion.
I'm excited about this too, but the article suggests this is an industry wide thing; IBM is doing this with Xilinx, Qualcomm is experimenting with ARM stuff (not sure how this is different than the Zynq), AMD also with Xilinx. If there is something that works here I'm sure Intel wont be the only game in town.
Can you give any examples of how FPGAs helped SGI? I'm aware that a certain Voldemort-like .gov liked them at one time, but I never saw any uptake in the real world.
Intel is a volume player; this makes FPGAs a bit of a head-scratcher, since in the Grand Scheme, products might get prototyped as FPGAs, but they jump to high-volume, higher-performance ASICs ASAP.
As someone with basic knowledge of FPGA structure and how HDLs work, does anyone have a link on the limitations of FPGA-implemented processing vs traditional CPU/GPU architecture?
I get the sense there's a hole in my knowledge as to exactly what kinds of limits the structure of FPGAs places on the end result. And more importantly, why.
So I have this vague recollection that Intel had an FPGA division in the early 90's that they spun off. Was that what became Lattice? Sad that the Interwebs get really murky pre 1995
I was thinking the same thing too. I have always had a much easier time working with the Xilinx/Mentor workflows, and I would love to see the competition in that space. But then I remembered the last time I tried to download my copy of Intel C++; over an hour lost in a maze of broken links, ending with having to open three different support cases, and I stopped holding my breath.
I can't see how it would help - this kind of search involves almost no computation and a lot of memory/disk bandwidth.
People need to remember that FPGAs are not a magic bullet, especially not for throughput; they're better used for low-latency hardware interaction and things where you need cycle-deterministic behaviour.
Quickest way is to use what's called a high-level synthesis tool that converts a high-level version of algorithm to hardware language. Synthagate, Handel-C, Catapult-C, Synflow's C-flow, C-to-Silicon... many tools claim to do it. Best to have someone with hardware background help, though.
Intel CEO Brian Krzanich: "We will apply Moore's Law to grow today's FPGA business, and we'll invent new products that make amazing experiences of the future possible"
Not sure how they think FPGAs are going to reduce their "cloud workload". FPGAs are pretty power hungry (aside from lattice) and only work well if you have some unique requirements.
Fast cores take exponentially more energy than slow ones. So the solution is to use more slow and simple cores instead. We get more performance per watt that way. On PCs we can use GPU's to make computations in parallel. I guess this is like that, but for servers.
FPGAs are excellent for parallelizing IO, so if the application involves lots of IO, an FPGA coprocessor will likely reduce power consumption if the CPU delegates IO intensive operations to the FPGA.
Pretty sure IBM and AMD are both partnering with Xilinx, they're not going anywhere any time soon. (Plus they also have more enterprise contracts than Altera does.)
Fpgas really only accelerate parallel workloads, sequential computation is done easier and just as good with a CPU.
Problem with massive parallelism becomes communication costs and spatial routing. Nothing is free.
More excited about commodity chips with 100s of cores. Rather have something that's easier to program with a faster dev cycle if I'm going to tackle parallelism.
[+] [-] nickpsecurity|10 years ago|reply
Combine that with OSS developments by Clifford Wolf and Synflow in synthesis that can be connected to OSS FPGA tools to show even more potential here. Exciting time in HW field.
[+] [-] iheartmemcache|10 years ago|reply
[1] See: IntelOmniPath-WhitePaper_2015-08-26-Intel-OPA-FINAL.pdf (my copy is paywalled, sorry) [2] http://www.anandtech.com/show/9802/supercomputing-15-intels-... [3] http://www.anandtech.com/show/9702/samsung-950-pro-ssd-revie... [4] https://software.intel.com/en-us/articles/distributed-memory... This is for Fortran, but the same remote Direct-Memory-Access concepts extend over to the new Xeon architecture.
[+] [-] emcq|10 years ago|reply
[+] [-] markhahn|10 years ago|reply
Intel is a volume player; this makes FPGAs a bit of a head-scratcher, since in the Grand Scheme, products might get prototyped as FPGAs, but they jump to high-volume, higher-performance ASICs ASAP.
[+] [-] ethbro|10 years ago|reply
I get the sense there's a hole in my knowledge as to exactly what kinds of limits the structure of FPGAs places on the end result. And more importantly, why.
[+] [-] ChuckMcM|10 years ago|reply
[+] [-] rgbrenner|10 years ago|reply
The processors were the FLEXlogic line. They only released a few (looks like 4 total[1]). Here's an announcement for one: https://groups.google.com/forum/#!topic/comp.sys.intel/YBUtO...
0. http://www.embedded.com/electronics-blogs/max-unleashed-and-...
1. http://www.intel-vintage.info/timeline19901995.htm
[+] [-] Cieplak|10 years ago|reply
[+] [-] 0xcde4c3db|10 years ago|reply
[+] [-] vt240|10 years ago|reply
[+] [-] mozumder|10 years ago|reply
[+] [-] electrum|10 years ago|reply
[+] [-] aheilbut|10 years ago|reply
[+] [-] pjc50|10 years ago|reply
People need to remember that FPGAs are not a magic bullet, especially not for throughput; they're better used for low-latency hardware interaction and things where you need cycle-deterministic behaviour.
Crypto is a far more interesting potential case.
[+] [-] nickpsecurity|10 years ago|reply
[+] [-] cornholio|10 years ago|reply
PHB, how you've grown !
[+] [-] vvanders|10 years ago|reply
[+] [-] petke|10 years ago|reply
[+] [-] Cieplak|10 years ago|reply
[+] [-] tw04|10 years ago|reply
http://www.eetimes.com/document.asp?doc_id=1324372
[+] [-] nickpsecurity|10 years ago|reply
I'd guess more so when sequential part is on top tier CPU and accelerator is on its NOC.
[+] [-] comboy|10 years ago|reply
[+] [-] fizixer|10 years ago|reply
- Neuromorphic.
- Bye bye Xilinx.
[+] [-] PeCaN|10 years ago|reply
[+] [-] belleandsebasti|10 years ago|reply
Problem with massive parallelism becomes communication costs and spatial routing. Nothing is free.
More excited about commodity chips with 100s of cores. Rather have something that's easier to program with a faster dev cycle if I'm going to tackle parallelism.