nortiero's comments

nortiero | 7 years ago | on: No Man’s Sky developer Sean Murray: ‘It was as bad as things can get’

Sorry, I don't buy it. Man kept overpromising (somebody would say "lying") even after release. Now he's on a publicity stunt for the latest and greatest product, addon, whatever, and cheaply plays the always popular victim card. Nobody says that maybe some angry commenters where "overpromising" too, with their enormous and unpractical threats. The journalist is of course thrilled to expose weirdos -- the correct type of weirdos, so to speak -- to his small court of readers, for outrage and clicks. A deal is made to reciprocal benefit.

nortiero | 8 years ago | on: Starbucks to Close All U.S. Stores for Racial-Bias Education

It is not a counterpoint, just a way to show the limits of an overblown analogy. You should appreciate the moral difference between Parks' and Gandhi's deliberate approach and the entitled arrogance of those two gentlemen, whose actions would be of no relevance if not for their skin color. Strange times. But but! If I'm wrong and the guys were acting a protest against overpriced dairy product shady dealers -- then they will have all the support that my limited means offer and unlimited appreciation.

edit: gratitude, not appreciation, silly me.

nortiero | 9 years ago | on: Memory bandwidth

Hi, I've posted a few details and a sample program over here. As you correctly notice, latency is a big issue on modern machines, way worse (in proportion) than it was in the late eighties.

But as soon as the wheels start turning, they spit out a lot of bytes, for sure!

nortiero | 9 years ago | on: Memory bandwidth

Hi! Memory is definitely an issue today, especially with big fat server processors...

Here is the small program I wrote, first allocates an array, then writes a random placed linked list that touches all the array cells (e.g. start->|4|6|2|5|3|end , so it goes to cell 1, then 4, then 5, then 3,2,6 and done.

Then it reads the same array from start to finish, just to sum contents. This is fast and goes to 6-8 GB/s, depending on unrelated memory pressure from video and os tasks. This is the advertised speed. So I don't think it's a swap issue (and my SSD is way faster :-) I've also tried with smaller and bigger arrays. Just ensure it stays in real memory.

Moving around in a big array is very hard on the memory controller: - no prefetch possible; - row and column changes at every access (almost), so commands have to be issued to burst terminate, close row, close bank, precharge maybe, open bank, activate row, fetch address, and who knows; - no advantage from interleave; - no advantage from a fat 64 or 128 bit bus;

It is the combined time needed for each and every memory read (due to randomization) that cause performance to drop.

This is the code I've been using. I've modified it to work on unices (OSX has an issue with timeval struct), but have not tested on them. It compiles, may need to be altered a bit. Launch it with ./a.out <array_size> <random seed>. You will note that, as soon as the array moves out of caches, hell breaks loose. Works with gcc or clang.

https://godbolt.org/g/qX39tL

nortiero | 9 years ago | on: Memory bandwidth

The article is very optimistic about memory availability per cycle, reality is way worse.

As an example, on my Macbook Air 2011 with ~10 GB/s of maximum ram bandwidth, random access to memory can take 100 time more than a sequential one.

This in C, with full optimizations and using a very low overhead read loop.

Using the same metrics of the author:

best case: ~ 3 bytes per cycle

(around 6 Gigabyte per second of available bandwidth)

worst case: ~ 0.024 bytes per cycle (every scheduler, prefetch, already open column mostly defied)

Note that worst case uses 10 seconds (!) to read and sum in a random way all the cells of an array of 100.000.000 of 4 byte integers, exactly once. Main loop is light enough not to influence the test.

That's about 40 megabytes per second out of 6.000 available.

What can I say.. CPU designers are truly wizards!

nortiero | 9 years ago | on: Changes I would make to Go

I feel the exact opposite... Go is a bold design, if not an elegant one, yet powerful. Rust is a design-by-committee language, built from accretion and removal. Its origins as a managed language, pivoted and is now a second system C++ replacement -- the only one, to be fair.

nortiero | 9 years ago | on: GCC optimized code gives strange floating point results

You are absolutely correct, "mandates" is too strong a word. Annex F, which is normative, have a way out by not setting __STDC_IEC_559__, GCC6.3.0 does set a slightly different GCC_IEC_559. Yet I think that my argument still holds:

floating point calculations can be executed at wider range (§5.1.2.3.12);

assignments and casts are under obligation to wipe out that extra precision (§ same paragraph);

I am no language lawyer, but .. given the issue of program observable effects (§5.1.2.3.12) and the "as is" implied rule that governs optimizations, how possibly could equality be stable over time?

nortiero | 9 years ago | on: GCC optimized code gives strange floating point results

I think that this is a case of mismatch between two standards, not a bug. The C standard allows higher precision values to be used in place of lesser ones (extended precision vs double) and REQUIRES conversion to the correct precision during assignment (or casting). Also, the C abstract machine has the option to store those values when/if it deems opportune, it won't change observables as defined there.

On the other side, IEEE 754 allows extended precision to be used in place of double and of course requires that any chosen precision be kept or else.

But C Standard mandates IEEE 754 , too!

It seems to me that modern C deliberately chose to ignore such kind of mismatch in the name of (substantial) performance gains. K&R, good or bad, was way simpler!

page 1