Elbrus is a mature project with quite a bit of history, going all the way back to the mid 70s. It was state-funded so it went through several periods of stagnation and got nearly scraped off at some point, but back when I was in the Uni many CS profs spoke with a great deal of reverence of both the project itself and those working on it.
The difference between Elbrus from 70-th and contemporary one is quite significant.
Old Elbrus was stack based outside and has a level that translated stack-based ops into RISC commands for OoO execution. Stack based instruction set was meant to reduce code size (and complexity of code generation).
New Elbruses are VLIWs and I cannot agree with that architectural decision. They claim their VLIW and compiler solve frequent stalls (a hallmark of any VLIW arch, except in DSP setting where memory is quite predictable) but numbers in benchmarks do not agree with that.
Elbrus with 4 threads is about 15 times as slow in compression as Intel i7 (Intel i7 3770 (Ivy Bridge)). The difference in clock speeds is about sevenfold.
7-zip compression is very memory-intensive, and access memory in rather unfriendly manner - going backwards in dictionary search and forward in comparison.
This great discrepancy means that Elbrus stalls much more heavily than i7. And rightfully so - OoO CPUs like i7 specifically designed to avoid stalls.
Other that CPU architecture decision, Elbrus as a SoC is very good.
Here is more information from official site [1,4,5]:
- native "Elbrus" ISA or x86 ISA,
- Ebrus ISA is VLIW, can dispatch 23 operations per cycle (33 with SIMD), in-order execution,
- it's stated that x86 code translation + register allocation is done in HW, but later they write about a software translator and full-system emulator,
- 6 ALUs (all support integer operations, 4 can do FP),
- 256 x 84-bit register file,
- hardware support for loops, including pipelining,
- 3 x 12GBytes/s inter-CPU links, support for up to 4 sockets,
- 65nm process, 380 mm^2 die size, 986e6 transistors,
- software is based on Linux 2.6.33 and Debian 5.0 with more than 3000 packages.
There are some benchmarks for older chip model "2S" (overclocked to 500MHz, 2 cores) [2,3]. FP performance is about 1-5x of Pentium M 1GHz (1 core?) depending on benchmark, integer performance is about 1x. New CPU, "4S", should be 3 times faster than "2S".
How many processors have hardware support for loops ? I expect it to be a different, more efficient infrastructure than Comparison/Jump, maybe something similar to DisplayLists in old OpenGL ?
The most interesting thing about the benchmarks is that they show some pretty amazing IPC. The P6-based Pentium M was known for its high IPC, but this 500MHz Elbrus core is more than 50% of the speed of a 1GHz Pentium M. The floating-point results are even better, although perhaps not surprising due to having 4 FP ALUs.
Another set of benchmarks containing an Elbrus is at http://www.7-cpu.com/ and also shows extremely good IPC efficency - it achieves 1.2 MIPS/MHz/core for compression, which is better than all the other non-x86 in that list, and somewhere between Haswell's 1.18 and Ivy Bridge's 1.24.
If they could scale this up to a newer process, they'd probably be equal to if not surpassing Intel's current x86 performance.
Wasn't the NVIDIA "Denver" Tegra K1 originally intended to be an x86-compatible chip along these lines, then when they couldn't get the licensing right, it was turned into an ARM-compatible?
Most x86_64 and "Modern" ARM processors are actually a much more 80's-esque RISC processor with parallel instructions, pipelines, etc. The actual hardware binary assembly is actually a high level language the decoder/scheduler makes sense of.
Likely they developed a very fast, very efficient core dye. And just swapped decoders at the last minute. Its not much of a stretch that they'd develop the ARM set in parallel b/c of the high risk associated to doing anything with x86.
Hah, what goes around comes around. It is notable that China is going gangbusters on building ARM variants rather than coming up with an entirely new architecture. Back in the USSR days when Sun was working with the ELVIS group they were required to have some Soviet designed machines in addition to the SparcStations that Sun provided. Those machines were not well liked by the researchers.
I wonder if, unlike Transmeta, they will let you run native code on it too. Transmeta could not do it for technical reasons. As far as i know, the underlying arch was designed ONLY as a JIT target, and did not even have protected memory as such. Memory accesses were translated to different instructions (one privileged and one not) based on context of translated code.
Not sure about this Russian chip but in the case of Transmeta and Nvidia Denver (and to a lesser degree, Intel x86 µops), writing the "native" code directly is not beneficial in any way.
The whole point is that the JIT compiler running in the CPU can make dynamic optimizations that's somewhat similar in nature to doing branch prediction and other optimizations modern CPUs do.
The native code executed by these CPUs is a poor target for static compilation. Without runtime data about which branches are taken, which memory locations are touched, etc, it is not possible to generate code that outperforms the built-in JIT or can compete with more traditional CPUs.
And besides, the JIT frontend in these chips is rather cheap in terms of power and performance.
Nope. We need "energy independence" to avoid places with dictators and repugant middle eastern societies. The only folk that care about "technology independence" have irrational, kelptocratic nationalist agendas (Russia, China), when's the last time you had a problem with a bastion of technology?
As much as I'd love to see x86 go the fuck away, we have a lot more proven stuff working on that architecture -- including a lot of legacy or closed-source software that won't get ported soon.
> Also, the propreties that they are getting feel like being knocked back a decade--is it due to the fabrication facilities that they have?
Most likely. Top of the line stuff needs a lot of money and expertise -- Russian companies don't have that much of the former, and Russia exports or rent^H^H^H^H uses a lot of the latter for outsourcing.
Is that "Elbrus-compatible Linux distro" native? Or just x86 with some tweaks to make it compatible? (perhaps addressing emulation shortcomings or optimizations)
If they made kernel changes, and choose to respect GPL, I'd be interested to see those sources...
It's stated on official site [1] that it's based on Linux 2.6.33, and it looks like kernel and userspace are compiled for Elbrus ISA and run natively, without x86 emulation.
There are no links to sources on their site, and they don't provide datasheets. To request sources under GPL you need to get binaries first. I live in Russia, and I've never seen Elbrus in real use anywhere. It's not marketed or sold to general public. I think target market is government's security agencies. Of course they get sources anyway for audit and have no incentive to publish them.
You could have a CPU that updates itself to be able to emulate whatever new features come along in x86/AMD64 to therefore be 'future proof'. Even if the raw performance is not at the same level as the genuine Intel, does it matter if merely surfing the web? If performance is 'ample' then the CPU that just gets updated to include new instructions could make it so computers could last for decades doing things like showing web pages. How hard can that get?
Many instructions on your cpu are already 'micro coded' instead of wired into the hardware. I'm not 100% sure on how it works, or what it is. But it sounds like what you're proposing could already be implemented in modern cpus.
Very little software is written that requires features from the latest CPU updates. There's not much of a need to address compatibility by emulating new features in the CPU itself. Windows 8 needs PAE, NX, and SSE2, which were available in cpus produced 8 years before release (certainly not all cpus produced in 2004 had these features, but many did).
Why on earth do you need the latest CPU architecture to browse the web. I have a laptop from 2002 and it runs chrome and browses the web just find. What on earth are you talking about?
The few truly new features that have come along are not really emulate-able. Things like 64 bit, or new secure modes.
Most everything else has been bundles of hardware accelerated instructions for performance gains on popular data classes (see: MMX, AVX256, etc).
It's rare you run into software that doesn't have multiple codepaths, leveraging an extension if you have it and bypassing it if you don't.
Extensions that you are defacto required to have are so old, you almost certainly have them anyway. MMX was introduced in 1997 in the P5. MMX support will not be the first problem on your list if you're trying to run modern software on a P4.
МЦСТ (MCST) was an abbreviation for "Московский центр SPARC-технологий", now it's just a meaningless letters. ARM in computer name means probably "Автоматизированное рабочее место" - "Automated workstation"
[+] [-] huhtenberg|11 years ago|reply
http://en.wikipedia.org/wiki/Elbrus_%28computer%29
[+] [-] thesz|11 years ago|reply
Old Elbrus was stack based outside and has a level that translated stack-based ops into RISC commands for OoO execution. Stack based instruction set was meant to reduce code size (and complexity of code generation).
New Elbruses are VLIWs and I cannot agree with that architectural decision. They claim their VLIW and compiler solve frequent stalls (a hallmark of any VLIW arch, except in DSP setting where memory is quite predictable) but numbers in benchmarks do not agree with that.
Consider this: http://www.7-cpu.com/
Elbrus with 4 threads is about 15 times as slow in compression as Intel i7 (Intel i7 3770 (Ivy Bridge)). The difference in clock speeds is about sevenfold.
7-zip compression is very memory-intensive, and access memory in rather unfriendly manner - going backwards in dictionary search and forward in comparison.
This great discrepancy means that Elbrus stalls much more heavily than i7. And rightfully so - OoO CPUs like i7 specifically designed to avoid stalls.
Other that CPU architecture decision, Elbrus as a SoC is very good.
[+] [-] spatular|11 years ago|reply
- native "Elbrus" ISA or x86 ISA,
- Ebrus ISA is VLIW, can dispatch 23 operations per cycle (33 with SIMD), in-order execution,
- it's stated that x86 code translation + register allocation is done in HW, but later they write about a software translator and full-system emulator,
- 6 ALUs (all support integer operations, 4 can do FP),
- 256 x 84-bit register file,
- hardware support for loops, including pipelining,
- some kind of module for async mem preloading,
- speculative execution and branching predicates,
- "4S" model has 4 cores,
- 800 Mhz core clock,
- 64 KB L1, 128 KB L2, 8 MB L3 (shared between cores),
- 3 DDR3-1600 interfaces, ECC support,
- 3 x 12GBytes/s inter-CPU links, support for up to 4 sockets,
- 65nm process, 380 mm^2 die size, 986e6 transistors,
- software is based on Linux 2.6.33 and Debian 5.0 with more than 3000 packages.
There are some benchmarks for older chip model "2S" (overclocked to 500MHz, 2 cores) [2,3]. FP performance is about 1-5x of Pentium M 1GHz (1 core?) depending on benchmark, integer performance is about 1x. New CPU, "4S", should be 3 times faster than "2S".
[1] http://www.elbrus.ru/arhitektura_elbrus
[2] http://www.elbrus.ru/files/535269/9f0cd8/50606f/000000/2014-...
[3] http://www.elbrus.ru/files/535269/0e0cd8/50586f/000000/2014-...
[4] http://www.mcst.ru/mikroprocessor-elbrus4s
[5] http://www.mcst.ru/mikroprocessor-elbrus4s-gotov-k-serijnomu...
--
Edit: loop pipeling, OS information
[+] [-] agumonkey|11 years ago|reply
[+] [-] userbinator|11 years ago|reply
Another set of benchmarks containing an Elbrus is at http://www.7-cpu.com/ and also shows extremely good IPC efficency - it achieves 1.2 MIPS/MHz/core for compression, which is better than all the other non-x86 in that list, and somewhere between Haswell's 1.18 and Ivy Bridge's 1.24.
If they could scale this up to a newer process, they'd probably be equal to if not surpassing Intel's current x86 performance.
[+] [-] return0|11 years ago|reply
[+] [-] duaneb|11 years ago|reply
[+] [-] ludamad|11 years ago|reply
[+] [-] mrbill|11 years ago|reply
http://en.wikipedia.org/wiki/Project_Denver
[+] [-] valarauca1|11 years ago|reply
Likely they developed a very fast, very efficient core dye. And just swapped decoders at the last minute. Its not much of a stretch that they'd develop the ARM set in parallel b/c of the high risk associated to doing anything with x86.
[+] [-] rkuska|11 years ago|reply
[+] [-] ChuckMcM|11 years ago|reply
[+] [-] sedachv|11 years ago|reply
[+] [-] sedachv|11 years ago|reply
There was the Longsoon which had an extended MIPS64 instruction set: http://en.wikipedia.org/wiki/Loongson
http://en.wikipedia.org/wiki/Loongson
[+] [-] dmitrygr|11 years ago|reply
[+] [-] exDM69|11 years ago|reply
The whole point is that the JIT compiler running in the CPU can make dynamic optimizations that's somewhat similar in nature to doing branch prediction and other optimizations modern CPUs do.
The native code executed by these CPUs is a poor target for static compilation. Without runtime data about which branches are taken, which memory locations are touched, etc, it is not possible to generate code that outperforms the built-in JIT or can compete with more traditional CPUs.
And besides, the JIT frontend in these chips is rather cheap in terms of power and performance.
[+] [-] Elhana|11 years ago|reply
[+] [-] bhewes|11 years ago|reply
[+] [-] frozenport|11 years ago|reply
[+] [-] cordite|11 years ago|reply
Also, the propreties that they are getting feel like being knocked back a decade--is it due to the fabrication facilities that they have?
[+] [-] weland|11 years ago|reply
> Also, the propreties that they are getting feel like being knocked back a decade--is it due to the fabrication facilities that they have?
Most likely. Top of the line stuff needs a lot of money and expertise -- Russian companies don't have that much of the former, and Russia exports or rent^H^H^H^H uses a lot of the latter for outsourcing.
[+] [-] MichaelSalib|11 years ago|reply
[+] [-] higherpurpose|11 years ago|reply
http://wccftech.com/arm-baikal-processor-8-cores-cleansweep-...
Not sure what MCST's ties are to the government.
[+] [-] CUViper|11 years ago|reply
If they made kernel changes, and choose to respect GPL, I'd be interested to see those sources...
[+] [-] spatular|11 years ago|reply
There are no links to sources on their site, and they don't provide datasheets. To request sources under GPL you need to get binaries first. I live in Russia, and I've never seen Elbrus in real use anywhere. It's not marketed or sold to general public. I think target market is government's security agencies. Of course they get sources anyway for audit and have no incentive to publish them.
[1] (russian) http://www.mcst.ru/os_elbrus
[+] [-] Theodores|11 years ago|reply
[+] [-] SCHiM|11 years ago|reply
https://en.wikipedia.org/wiki/Microcode
[+] [-] toast0|11 years ago|reply
[+] [-] x0054|11 years ago|reply
[+] [-] sliverstorm|11 years ago|reply
Most everything else has been bundles of hardware accelerated instructions for performance gains on popular data classes (see: MMX, AVX256, etc).
It's rare you run into software that doesn't have multiple codepaths, leveraging an extension if you have it and bypassing it if you don't.
Extensions that you are defacto required to have are so old, you almost certainly have them anyway. MMX was introduced in 1997 in the P5. MMX support will not be the first problem on your list if you're trying to run modern software on a P4.
[+] [-] ex3ndr|11 years ago|reply
[+] [-] tekni5|11 years ago|reply
Does anyone have any idea why an average consumer, would even consider buying this at this price?
[+] [-] ido|11 years ago|reply
[+] [-] ArtifTh|11 years ago|reply
[+] [-] RexRollman|11 years ago|reply
[+] [-] ck2|11 years ago|reply