GRVI Phalanx joins The Kilocore Club

vvanders|9 years ago

Do I dare even ask how long the timing and routing pass took on 1680 cores?

jsgray|9 years ago

This build took 11 hours on an Intel Skulltrail NUC w/ 32 GB DRAM, but I believe there are ways to speed this up going forwards (incremental and/or hierarchical builds, out of context synthesis). For example, the XCVI9P is a 3 die (3 "super logic region") device and by setting up a hierarchical design flow I think I can place and route each SLR separately (at the same time) across more (x86) cores on my build box. The inter-SLR interconnect nets are just some quite regular 300b wide Hoplite NOC links and clock and reset.

nickpsecurity|9 years ago

Can't the tools do it relatively fast with a geometric method if the individual cores already have area/timing data to use and are homogenous? And a FPGA instead of an ASIC?

My reading the various papers on synthesis as a non-hardware guy made me think this job shouldn't be as hard on that as the SOC's whose components vary considerably in individual attributes.

quickben|9 years ago

If I'm not mistaken, the devices go for 5-6k for the eval kit?

duskwuff|9 years ago

Yep. The Arty board (which ran 32 cores) is $99, though, which is actually a lot more interesting.

http://store.digilentinc.com/arty-artix-7-fpga-development-b...

geolqued|9 years ago

This one? USD7k https://www.xilinx.com/products/boards-and-kits/ek-u1-vcu118...

jsgray|9 years ago

The board used (VCU118) is $7000.

As noted the Digilent Arty is $99 and hosts up to 32 cores. The XC7020 Zynq devices should host 80. That includes the Zedboard, the original Parallella kickstarter ed., the forthcoming Snickerdoodle Black (?), and the Digilent Pynq which is $65 Q1 for students. It is my intention to put out a version of GRVI Phalanx for 7020s, at least a bitstream and SDK, perhaps more, but much to do. Note the 7 series (including XC7A35T of Arty and the XC7Z020) have BRAMs but not UltraRAMs so those clusters have 4K instruction RAMs and 32K shared cluster RAMs. The 4-8K/128K clusters possiblr om the new UltraScale+ devices afford more breathing room for code and data per cluster.

ethagknight|9 years ago

What does one do with such a cluster?

RandomOpinion|9 years ago

>What does one do with such a cluster?

They presented a short paper at FCCM '16: http://fpga.org/wp-content/uploads/2016/05/grvi_phalanx_fccm... Section VI lists possible applications.

frozenport|9 years ago

Now use it for a barrel processor!

21 comments