Parallella: A Supercomputer For Everyone

[+] throwaway1979|13 years ago|reply

I picked up a raspberry Pi a few days ago. Initially, I was blown away by the low price point. Since then, I've been reflecting on what makes a computer useful.

For personal computers - desktops and laptops - I think we don't have a shortage of processor cycles. The minimal specs of the Raspberry Pi make it useable - 256MB of RAM, 700 MHz CPU, a few GB of storage and enough MB to saturate a home broadband connection. What is compelling about the best contemporary personal computing devices is form factor. How easy is it to provide input; how nice is the screen; if it is a mobile device, how heavy is it and does the battery last long enough, etc.

Does a personal parallel computer really help me? At first blush, I am having a hard time seeing how. Clearly, there are CPU intensive workloads that people have mentioned in this discussion - ray tracing is one. The video mentions robotics and algorithms. I have mixed feelings about that since I personally believe the future of robotics lies in computation off the physical robot itself - aka cloud robotics. A use case I personally would find beneficial is the ability to run dozens of VMs on the same machine. Heck ... each of my 50 open browser tabs could run inside separate VMs. I know light weight container technology is around for a while. e.g. jails, LXC. But what about hypervisor-based virtualization - e.g. VMWare, Xen, etc.? While the parallelization offered by this tech would be awesome, what seems to be missing is the ability to address lots and lots of memory.

[+] Qworg|13 years ago|reply

As the majority of robotics research in the US is paid for by the military, I think there's more of a market for "fast computation on board" than you'd think. Communication and networking is expensive and hard. As a practicing roboticist, I'd love to work with a few of these. =)

[+] gavanwoolery|13 years ago|reply

The real value is in pushing forward a general compute device with many cores. Overall our programs are still stuck in the 1-2 thread era, and there is a bit of a chicken/egg problem. Without a very effective multicore processor, the payoff in writing parallel programs is small. GPGPU is still to expensive and not very practical due to memory constraints and the GPU/system memory bottleneck. This probably wont be the device to change all of that, but even failure is progress.

[+] vidarh|13 years ago|reply

It's a dev platform. A way for people to experiment with a new architecture. The roadmap is to eventually get to PCIe cards with thousands of cores.

[+] fghh45sdfhr3|13 years ago|reply

The raspberry Pi + tarsnap or dropbox is the disposable computer I've wanted for a while.

That plus cheap access to a massively parallel computer could also be very interesting.

Except where raspberry Pi + online storage could be useful to many, many people. Massive parallelism is probably only interesting to folks like us.

[+] bdfh42|13 years ago|reply

This is an interesting project that deserves to reach it's funding goal but progress towards that is slow (I have been keeping an eye on it since launched on Kickstarter).

I suspect the problem is that it has no compelling (and immediate) "use case". If they could communicate a set of application ideas then I suspect that a whole new raft of supporters will be happy to risk at least $99.

[+] imrehg|13 years ago|reply

Also their video is just mediocre. Very slow and very elevator-music like. I really want it to succeed, backed it already and got some more friends to do that too, but they have to do more as well. Fortunately these days they did make some progress by opening up the specs and more reward options.

Also, the $3million stretch goal is just waaaaay too far, too bad that the better design is floated for just that level.

[+] compilercreator|13 years ago|reply

I have backed this project. This is an interesting startup, with some good solid technology behind it. They have managed to design and tape out a chip with just a 2m dollar budget so far. The main draw of their architecture is not its peak, but rather its efficiency, both in terms of perf/watt and perf/die area. You can look at their manuals on the site.

Hoping their funding drive succeeds. I am liking the fact that ISA is being fully documented and we will have a fully open-source toolchain to work with the system.

(Disclaimer: Not associated with Adapteva in any way).

[+] tsmarsh|13 years ago|reply

I'm also a backer and I've been completely surprised by the lack of interest. $99 to try what could represent the future of CPU design. I see it as a platform to really try out if the new wave of concurrent languages really make a difference on these platforms.

[+] mbenjaminsmith|13 years ago|reply

I don't really have any comment on the project itself (not something I would ever use and I don't know the value of what they're proposing).

But on purely geek terms this thing seems to warrant a "holy shit":

http://www.adapteva.com/products/silicon-devices/e64g401/

Again I don't know how (un)common that sort of thing is but I wasn't expecting to see 64 cores in that tiny form factor. Does anyone here know how cutting edge this thing is if at all?

[Edit]

Also does anyone here want to address use cases for this thing?

[+] DeepDuh|13 years ago|reply

Well, NVIDIAs Kepler GPUs have 1536 cores on something like 320mm^2. I can't really find the die size of that adapteva product but I'd say it comes out at a similar range.

Having looked at the data a bit more: I like their specs concerning system balance. 100 GFLOPS over 6.4GB/s gives you a system balance of 15.625 FLOPS per memory access, that's about the same balance as a Westmere Xeon - pretty good for real world algorithms.

For comparison: NVIDIA Fermi has a system balance of about 20. Meaning: Fermi is sooner bounded by memory bandwidth, which is very often the limiting factor in real world computations.

One thing though: High Performance Computing is all about software / tooling support. If this company comes out with OpenCL in C (even better Fortran 90+) support, then we're talking.

Edit: By similar 'range' I meant core per mm^2 ratio.

[+] stonemetal|13 years ago|reply

http://en.wikipedia.org/wiki/TILE64

Tilera did a very similar looking 64 cores on a chip in 2007, which is the oldest instance I know of off the top of my head. Their devices cost(or at least they used to) a few grand though. Tilera has bumped it up to around 100 per chip these days. I don't know anything about either architecture so it is hard to say if 64 1Ghz adapteva cores compares with 64 1.5Ghz Tilera cores.

So not quite cutting edge just an under explored side channel.

[+] wcchandler|13 years ago|reply

I haven't looked at the specs, yet, but this is what I've had in mind: Roll one out to help with deep packet inspection of some of my network traffic. Spam filtering might also be offloaded to one of these guys.

Dedicated machines to host backend applications -- SQL servers, Apache, nginx, etc.

[+] Geee|13 years ago|reply

You'll need this if you want energy-efficiency when solving parallelizable problems. Use one chip in energy-limited systems, like battery-powered robots. Use multiple chips in power/heat-limited systems, like supercomputers.

[+] Swizec|13 years ago|reply

Don't modern GPU's have essentially thousands of cores?

[+] jacques_chester|13 years ago|reply

I think folk need to stop abusing the term "supercomputer".

It is not really a performance designation. It doesn't define a certain architecture or design.

It is pretty clearly an economic designation.

[+] scott_s|13 years ago|reply

I agree that people tend to abuse the term, but I think it is a performance designation. It's just a sliding performance target. A supercomputer is a computer that can achieve the upper limits of what has been achieved in performance.

[+] nnq|13 years ago|reply

WTF: "45 GHz of equivalent CPU performance"

(though I see the more informative "50 GFLOPS/Watt" below... and I like the prospect of something that would make it cheap to play with large scale real time neural nets...)

[+] compilercreator|13 years ago|reply

Their attempts at marketing talk are indeed very bad, but their technology is pretty interesting.

[+] willvarfar|13 years ago|reply

I would enjoy making a ray-tracing GPU from one of these.

That the cores don't run in lockstep can be shader heaven! I'm imagining using the cores in a pipeline with zoning so some core 'owns' some tile of the screen and does z-buffering, and other core does clipping of graphics primitives for each tile, and a sea of compute nodes between them chew up work and push it onwards.

Some kind of using the cores as a spatial index too. Passing rays to other cores as they propagate beyond the aabb belonging to a core.

Doubtless it wouldn't work like that. And wouldn't work well. But its fun thinking about it! :)

[+] ricksta|13 years ago|reply

Parallel computing is limited by Amdalah's Law. Having more core does not mean you can have have more speed because it's not easy to use all those cores. Most imperial languages are not designed with running codes on multiple core and few programers are taught how to design their algorithm for using a handful of cores.

I can see this platform being a good tool for students and researchers to experiment with algorithm speedups by making their sequential code, parallel.

In my parallel programming class, our teacher had to rig together a computer lab to connect the 12 quad core computers to simulate a 64 core cluster. Then again, 64 core cluster of Parallella would cost like $7000. You can get the same 64 core setup by buying 8 x 8 core consumer desktop computer for under $3000, which will still be more cost effective and probably have ten times more computing power because of the x86 architecture.

http://en.wikipedia.org/wiki/Amdahls_law

[+] AustinGibbons|13 years ago|reply

If you like Amdahl's law you may also like... http://en.wikipedia.org/wiki/Gustafsons_law

It is a more powerful expression of the benefit of scaling with parallelism. Principally, instead of scaling speed with respect to a fixed data size, you scale the data size with respect to a fixed speed.

Having more cores means you (sometimes) can have more data. You still need those parallel programmers with their parallel algorithms though :-)

[+] jamieb|13 years ago|reply

"Pledge $199 or more: 64-CORE: You get everything in the SUPPORTER reward and a 64-core Epiphany-IV based Parallella board"

[+] lelf|13 years ago|reply

http://news.ycombinator.com/item?id=4583263

[+] runako|13 years ago|reply

Without commenting on the merit of this project, I'm alarmed to see a VC-backed making a Kickstarter pitch.

[+] driverdan|13 years ago|reply

I'm of the opposite opinion. Companies that already have financial backing, have already put significant time and effort into a project, and already have experience running their business are much more likely to follow through on their campaign than some kid who build a new chair in his bedroom and thinks he can deliver it a month after his $100k campaign is finished.

[+] deweerdt|13 years ago|reply

could you expand?

[+] sspiff|13 years ago|reply

Does anyone know what kind of cores these RISC cores will be? Will it be some lower end ARM version, or MIPS? Will it be something for which a wide array of tooling already exists, or will this have its own custom architecture which only works with their toolchain?

[+] rsneekes|13 years ago|reply

Architecture reference manual can be found here: http://www.adapteva.com/support/docs/e3-reference-manual/

[+] willvarfar|13 years ago|reply

They have the cores and they are custom, if I read the blurb right. They already have the cores!

[+] fuzzy|13 years ago|reply

According to the kickstarter page the RISC cores are ARM A9.

[+] unknown|13 years ago|reply

[deleted]

[+] batgaijin|13 years ago|reply

I'd rather see a kickstarter for a book on greenarrays programming :(

[+] err|13 years ago|reply

glad to know i'm not alone. perhaps some instructional videos will suffice for now?

http://www.youtube.com/user/GreenArraysInc?feature=CAQQwRs%3...

[+] unknown|13 years ago|reply

[deleted]

[+] unix-junkie|13 years ago|reply

What's the point in having such a RAM/core ratio? By assigning 4 threads per core (which is fairly common to exploit manycore architectures) you don't even have 4Meg of memory per thread.

I would totally agree that memory constraint is sort of tied to manycore architectures, but in this case I find it pushed to the limits.

[+] wmf|13 years ago|reply

They don't have multithreading.

[+] wtracy|13 years ago|reply

Since I thought I saw an Adapteva person posting here earlier:

If the Kickstarter falls through, what options could you still make available to hobbyists? Is there some version of your current prototype setup that you could sell, even if it's not one convenient board?

[+] adapteva|13 years ago|reply

We would rather not think of that option:-) if the ks project fails, we'll do our best, but seems unlikely that we could support selling kits to hobbyists and they would certainly cost thousands of dollars each due to a lack of volume.

[+] plextoria|13 years ago|reply

I'm so hoping this gets funded.

[+] jayhawk|13 years ago|reply

we got into a big discussion on super computers (the definition), the meaning of what a core is and a whole bunch of other issues... but the low power requirements of this are being completely ignored... as for applications... well portable and/or remote devices/sensors that need parallel computing capabilities and where high energy usage is prohibitive are possible applications. But the greatest asset of this is to spark the next gen of app developers and programmers to fully embrace parallel programming and truly make software scalable...

[+] ksadeghi|13 years ago|reply

Yes but can they mine Bitcoins? They come with OpenCL drivers so in theory they could as most Bitcoin miners have OpenCL interfaces to the GPU.

[+] perlpimp|13 years ago|reply

is it me or erlang would sort of fit nicely into the core's ideology of data processing? Seems that LD is like set constant. there are external STR commands. You can have data loaded into registers from the code - MOV. Not an expert in Erlang but it seem that two ideologies can beneficial to one another.

And if it is so, should expecting Erlang compiler be out of the question? :)

[+] segmond|13 years ago|reply

[deleted]

85 comments