top | item 9525725

Russia now selling home-grown CPUs with Transmeta-like x86 emulation

152 points| lelf | 11 years ago |arstechnica.com | reply

79 comments

order
[+] huhtenberg|11 years ago|reply
Elbrus is a mature project with quite a bit of history, going all the way back to the mid 70s. It was state-funded so it went through several periods of stagnation and got nearly scraped off at some point, but back when I was in the Uni many CS profs spoke with a great deal of reverence of both the project itself and those working on it.

http://en.wikipedia.org/wiki/Elbrus_%28computer%29

[+] thesz|11 years ago|reply
The difference between Elbrus from 70-th and contemporary one is quite significant.

Old Elbrus was stack based outside and has a level that translated stack-based ops into RISC commands for OoO execution. Stack based instruction set was meant to reduce code size (and complexity of code generation).

New Elbruses are VLIWs and I cannot agree with that architectural decision. They claim their VLIW and compiler solve frequent stalls (a hallmark of any VLIW arch, except in DSP setting where memory is quite predictable) but numbers in benchmarks do not agree with that.

Consider this: http://www.7-cpu.com/

Elbrus with 4 threads is about 15 times as slow in compression as Intel i7 (Intel i7 3770 (Ivy Bridge)). The difference in clock speeds is about sevenfold.

7-zip compression is very memory-intensive, and access memory in rather unfriendly manner - going backwards in dictionary search and forward in comparison.

This great discrepancy means that Elbrus stalls much more heavily than i7. And rightfully so - OoO CPUs like i7 specifically designed to avoid stalls.

Other that CPU architecture decision, Elbrus as a SoC is very good.

[+] spatular|11 years ago|reply
Here is more information from official site [1,4,5]:

- native "Elbrus" ISA or x86 ISA,

- Ebrus ISA is VLIW, can dispatch 23 operations per cycle (33 with SIMD), in-order execution,

- it's stated that x86 code translation + register allocation is done in HW, but later they write about a software translator and full-system emulator,

- 6 ALUs (all support integer operations, 4 can do FP),

- 256 x 84-bit register file,

- hardware support for loops, including pipelining,

- some kind of module for async mem preloading,

- speculative execution and branching predicates,

- "4S" model has 4 cores,

- 800 Mhz core clock,

- 64 KB L1, 128 KB L2, 8 MB L3 (shared between cores),

- 3 DDR3-1600 interfaces, ECC support,

- 3 x 12GBytes/s inter-CPU links, support for up to 4 sockets,

- 65nm process, 380 mm^2 die size, 986e6 transistors,

- software is based on Linux 2.6.33 and Debian 5.0 with more than 3000 packages.

There are some benchmarks for older chip model "2S" (overclocked to 500MHz, 2 cores) [2,3]. FP performance is about 1-5x of Pentium M 1GHz (1 core?) depending on benchmark, integer performance is about 1x. New CPU, "4S", should be 3 times faster than "2S".

[1] http://www.elbrus.ru/arhitektura_elbrus

[2] http://www.elbrus.ru/files/535269/9f0cd8/50606f/000000/2014-...

[3] http://www.elbrus.ru/files/535269/0e0cd8/50586f/000000/2014-...

[4] http://www.mcst.ru/mikroprocessor-elbrus4s

[5] http://www.mcst.ru/mikroprocessor-elbrus4s-gotov-k-serijnomu...

--

Edit: loop pipeling, OS information

[+] agumonkey|11 years ago|reply
How many processors have hardware support for loops ? I expect it to be a different, more efficient infrastructure than Comparison/Jump, maybe something similar to DisplayLists in old OpenGL ?
[+] userbinator|11 years ago|reply
The most interesting thing about the benchmarks is that they show some pretty amazing IPC. The P6-based Pentium M was known for its high IPC, but this 500MHz Elbrus core is more than 50% of the speed of a 1GHz Pentium M. The floating-point results are even better, although perhaps not surprising due to having 4 FP ALUs.

Another set of benchmarks containing an Elbrus is at http://www.7-cpu.com/ and also shows extremely good IPC efficency - it achieves 1.2 MIPS/MHz/core for compression, which is better than all the other non-x86 in that list, and somewhere between Haswell's 1.18 and Ivy Bridge's 1.24.

If they could scale this up to a newer process, they'd probably be equal to if not surpassing Intel's current x86 performance.

[+] return0|11 years ago|reply
The title is a little "Soviet", it's a russian company not "russia"
[+] duaneb|11 years ago|reply
This is a poetic device named "metonymy". Obviously russia ≠ the company, but the nationality is probably the more interesting aspect.
[+] ludamad|11 years ago|reply
As far as I know, this is new ground for Russia, so the title could be warranted
[+] mrbill|11 years ago|reply
Wasn't the NVIDIA "Denver" Tegra K1 originally intended to be an x86-compatible chip along these lines, then when they couldn't get the licensing right, it was turned into an ARM-compatible?

http://en.wikipedia.org/wiki/Project_Denver

[+] valarauca1|11 years ago|reply
Most x86_64 and "Modern" ARM processors are actually a much more 80's-esque RISC processor with parallel instructions, pipelines, etc. The actual hardware binary assembly is actually a high level language the decoder/scheduler makes sense of.

Likely they developed a very fast, very efficient core dye. And just swapped decoders at the last minute. Its not much of a stretch that they'd develop the ARM set in parallel b/c of the high risk associated to doing anything with x86.

[+] rkuska|11 years ago|reply
It's not arm as ARM architecture, it's just a russian acronym.
[+] ChuckMcM|11 years ago|reply
Hah, what goes around comes around. It is notable that China is going gangbusters on building ARM variants rather than coming up with an entirely new architecture. Back in the USSR days when Sun was working with the ELVIS group they were required to have some Soviet designed machines in addition to the SparcStations that Sun provided. Those machines were not well liked by the researchers.
[+] dmitrygr|11 years ago|reply
I wonder if, unlike Transmeta, they will let you run native code on it too. Transmeta could not do it for technical reasons. As far as i know, the underlying arch was designed ONLY as a JIT target, and did not even have protected memory as such. Memory accesses were translated to different instructions (one privileged and one not) based on context of translated code.
[+] exDM69|11 years ago|reply
Not sure about this Russian chip but in the case of Transmeta and Nvidia Denver (and to a lesser degree, Intel x86 µops), writing the "native" code directly is not beneficial in any way.

The whole point is that the JIT compiler running in the CPU can make dynamic optimizations that's somewhat similar in nature to doing branch prediction and other optimizations modern CPUs do.

The native code executed by these CPUs is a poor target for static compilation. Without runtime data about which branches are taken, which memory locations are touched, etc, it is not possible to generate code that outperforms the built-in JIT or can compete with more traditional CPUs.

And besides, the JIT frontend in these chips is rather cheap in terms of power and performance.

[+] Elhana|11 years ago|reply
You can use native code. Problem is that you need a compiler for it and so far optimizations is their main issue.
[+] bhewes|11 years ago|reply
"Technology independence" is going to be our generation's equivalent of "energy independence".
[+] frozenport|11 years ago|reply
Nope. We need "energy independence" to avoid places with dictators and repugant middle eastern societies. The only folk that care about "technology independence" have irrational, kelptocratic nationalist agendas (Russia, China), when's the last time you had a problem with a bastion of technology?
[+] cordite|11 years ago|reply
Why don't they do something similar but with ARM? We already have plenty of proven stuff working on that architecture.

Also, the propreties that they are getting feel like being knocked back a decade--is it due to the fabrication facilities that they have?

[+] weland|11 years ago|reply
As much as I'd love to see x86 go the fuck away, we have a lot more proven stuff working on that architecture -- including a lot of legacy or closed-source software that won't get ported soon.

> Also, the propreties that they are getting feel like being knocked back a decade--is it due to the fabrication facilities that they have?

Most likely. Top of the line stuff needs a lot of money and expertise -- Russian companies don't have that much of the former, and Russia exports or rent^H^H^H^H uses a lot of the latter for outsourcing.

[+] MichaelSalib|11 years ago|reply
It looks like they're using a 65nm process, so yeah, fab space appears to be limiting them. Modern designs are well under 20nm.
[+] CUViper|11 years ago|reply
Is that "Elbrus-compatible Linux distro" native? Or just x86 with some tweaks to make it compatible? (perhaps addressing emulation shortcomings or optimizations)

If they made kernel changes, and choose to respect GPL, I'd be interested to see those sources...

[+] spatular|11 years ago|reply
It's stated on official site [1] that it's based on Linux 2.6.33, and it looks like kernel and userspace are compiled for Elbrus ISA and run natively, without x86 emulation.

There are no links to sources on their site, and they don't provide datasheets. To request sources under GPL you need to get binaries first. I live in Russia, and I've never seen Elbrus in real use anywhere. It's not marketed or sold to general public. I think target market is government's security agencies. Of course they get sources anyway for audit and have no incentive to publish them.

[1] (russian) http://www.mcst.ru/os_elbrus

[+] Theodores|11 years ago|reply
You could have a CPU that updates itself to be able to emulate whatever new features come along in x86/AMD64 to therefore be 'future proof'. Even if the raw performance is not at the same level as the genuine Intel, does it matter if merely surfing the web? If performance is 'ample' then the CPU that just gets updated to include new instructions could make it so computers could last for decades doing things like showing web pages. How hard can that get?
[+] SCHiM|11 years ago|reply
Many instructions on your cpu are already 'micro coded' instead of wired into the hardware. I'm not 100% sure on how it works, or what it is. But it sounds like what you're proposing could already be implemented in modern cpus.

https://en.wikipedia.org/wiki/Microcode

[+] toast0|11 years ago|reply
Very little software is written that requires features from the latest CPU updates. There's not much of a need to address compatibility by emulating new features in the CPU itself. Windows 8 needs PAE, NX, and SSE2, which were available in cpus produced 8 years before release (certainly not all cpus produced in 2004 had these features, but many did).
[+] x0054|11 years ago|reply
Why on earth do you need the latest CPU architecture to browse the web. I have a laptop from 2002 and it runs chrome and browses the web just find. What on earth are you talking about?
[+] sliverstorm|11 years ago|reply
The few truly new features that have come along are not really emulate-able. Things like 64 bit, or new secure modes.

Most everything else has been bundles of hardware accelerated instructions for performance gains on popular data classes (see: MMX, AVX256, etc).

It's rare you run into software that doesn't have multiple codepaths, leveraging an extension if you have it and bypassing it if you don't.

Extensions that you are defacto required to have are so old, you almost certainly have them anyway. MMX was introduced in 1997 in the P5. MMX support will not be the first problem on your list if you're trying to run modern software on a P4.

[+] ex3ndr|11 years ago|reply
The price is known. It is about 200.000 Rub that equals to 3900$.
[+] tekni5|11 years ago|reply
I was doubting your comment, but after looking into it seems like all the Russian sources claim it to be around 200,000 rubles.

Does anyone have any idea why an average consumer, would even consider buying this at this price?

[+] ido|11 years ago|reply
Why use names such as SPARC and ARM when the products are neither? Is it some sort of joke/wordplay that got lost in translation?
[+] ArtifTh|11 years ago|reply
МЦСТ (MCST) was an abbreviation for "Московский центр SPARC-технологий", now it's just a meaningless letters. ARM in computer name means probably "Автоматизированное рабочее место" - "Automated workstation"
[+] RexRollman|11 years ago|reply
Transmeta! Now there's a name I haven't heard in a long time.
[+] ck2|11 years ago|reply
A little late to the party, maybe they should have been working on ARM