top | item 15709810

Fast software is a discipline, not a purpose

56 points| frostmatthew | 8 years ago |lemire.me | reply

64 comments

order
[+] jackjeff|8 years ago|reply
“Avoid multiple passes over the data when one would do.”

Totally disagree. Unless performance is an issue (like I’m not dealing with a trivial number of elements), I would rather use functional programming approaches to sort data into shape (think map/filter/reduce). These approaches typically result in passing over the data multiple times, and often performing multiple copies, but it makes for readable and less error prone code. Doing the performant thing consists of writing a for loop (or multiple of them unless you unroll) and automatically result with a large custom body of code performing multiple transformations on the data. It’s just wasted mental effort when you write it and every time when you happen to read it again.

Most collections/array I process have trivial number of elements (<100) and you get virtually no benefit from writing optimized code.

I have also written a lot of performant code, and the lesson is always benchmark benchmark benchmark. The results do not always fit the mental model about what you think should be the fastest. In particuliar avoiding cache misses is far more important than you might think.

[+] hvidgaard|8 years ago|reply
Generally, performance is something you worry about when it becomes an issue. Otherwise you spend waaaay too much time on "performance" that have 0 benefit.

That said “Avoid multiple passes over the data when one would do", while true, often profound performance gains comes from insight into the domain, where the first pass is structuring it such that the subsequent queries and transformations are simple and quick.

[+] ssijak|8 years ago|reply
Ever heard of streams/iterators? Where data from the collection pass as a stream through all transformations only once. Code is readable and functional, and it only passes once through your collection.
[+] alkonaut|8 years ago|reply
I think the idea is: think about what you are doing. Choose the correct level of simplicity vs performance What I find is that it's rarely massive amounts of data that kills you, it's the polynomial effects that do. For example, if you have 3 collections of <100 and you do for-each-class of students, then for each class check which rooms are available, and for each room available see if the course can be in the room, you may have 100^3 tests though your data was trivially small. People benchmark and it looks fine (because they took 10 items each for their O(MNK) algorithm, and perhaps even tried (1001010) but when things later hit 100^3 in production everything grinds to a halt. This is why benchmarking is not always helpful in the early phase. Estimate the relations between the data sizes just slightly wrong and your estimate of performance might be 1000% off.

I work on a program with dozens of parts being fundamentally quadratic and several even NP, and I'm perpetually angry at my fellow developers for tackling O(N^k) things as O(N^(k+1)). Typically this is by the use of "subtly polynomial" things such as things.First(x => otherThings.Contains(...)) etc.

I'm all for the nice high level constructs, I just think tthat it needs to be carefully considered. And I don't agree with the "make the simple one first, and only optimize after benchmarking" because that just keeps proving useless (Pick too small dataset and benchmark is OK, and unless the algorithm is linear or better, it's trivial to choose N such that the performance is unacceptable in the benchmark. And whatever N you choose, customers will demand 2N tomorrow). I prefer a simpler solution, e.g. "in parts X and Y of the application we write the fastest things possible from the start, while in the remaining 80% of the app we write the clearest thing possible and don't optimize until we are certain it's needed".

[+] AstralStorm|8 years ago|reply
Readable and less error prone? As opposed as to proven to be correct for example? Recursive functional code is a pain to prove to be right and not blow up stack. Multiple passes (when interspersed with other accesses) can blow up cache.

Even calling through a lambda is a cost compilers cannot easily optimize away unless you help them. (Even in C++.)

You do get the benefit of optimizing all the "trivial data size" code when it is all the functions being written in such silly way. Difference of 10% in one function as opposed to across whole application.

Benchmarking is often even harder to do right than writing good tests. (In fact is tied to it.) Instead of benchmarking, be a real computer scientist and prove bounds and memory allocations.

[+] elcritch|8 years ago|reply
Plus some functional languages can combine those steps at compile, giving you the best of both worlds. Rust for example elides away many lambda's in map operations. I don't have the link handy, but a post a few months back showed Rust and Haskell functional patterns could almost match their hard coded imperative style implementations and C/C++ cousins.
[+] taeric|8 years ago|reply
Performance is always an issue.

That said, I fully grant it may be an issue that is worth solving later in the process.

[+] dullgiulio|8 years ago|reply
Performant code vs readable code is a false dichotomy.
[+] srean|8 years ago|reply
> Totally disagree.

Hmm, I see....

> Most collections/array I process have trivial number of elements (<100)

Ah! ok that explains it.

> I would rather use functional programming approaches to sort data into shape (think map/filter/reduce). These approaches typically result in passing over the data multiple times, and often performing multiple copies,

That's a good recommendation. If you tend to write in a deforestable style (whether deforested automatically by the compiler or, by you manually) you can mitigate some gratuitous copying and multiple passes.

[+] imtringued|8 years ago|reply
The vast majority of programming languages return an iterator when you use map, filter, etc. There is usually only one eager evaluation pass at the end that conerts the iterator to a list.

Most languages are capabale of inlining the "next" function of the iterator to generate almost the same code as a regular for loop.

[+] jlg23|8 years ago|reply
The advice you quoted is the most valuable of all, in my opinion.

You give "map/filter/reduce" as an example and claim that theses approaches "typically result in passing over the data multiple times". No, since reduce can behave like map and filter, with a proper reduction function you only have to pass over the data once.

> Most collections/array I process have trivial number of elements (<100) and you get virtually no benefit from writing optimized code.

That is true for all other advice given in the original article, but not for this: A reduction function only needs one input element and the aggregate/result so far. This function then does not care whether it is called in the context of list or vector iteration, it will happily work with elements read from a stream or any other potentially destructive iterator. The benefit is obvious: Write & debug once, document and never touch the code again until the actual algorithm implemented there changes.

[+] ssijak|8 years ago|reply
"When people train, they usually don’t try to actually run faster or lift heavier weights. As a relatively healthy computer science professor, how fast I run or how much I can lift is of no practical relevance. "

This is really strange. If you don`t try to lift heavier weights you will not have progress. It is called progressive loading. I don`t wan`t to go into the details of the training, but if you do not track and improve your training (with whatever goal you have in mind, there are different kinds of progressions) you are the same as the programmer who write bad code, and knows it is bad, but does not care.

[+] s_m_t|8 years ago|reply
I interpreted that as "When people train, their goal in of itself is not usually to run faster or lift heavier weights". (the following might diverge from the actual content of the article, the author's english doesn't seem to be that great and I don't think we'll gain anything by taking the article literally)The goal might be overall health, attractiveness, etc. However, that doesn't mean they won't take the training seriously if they think it will help their larger goal. If you saw someone at the gym dilly dallying and doing a seriously half-assed rotation you might think they don't really care about their end goal.

In the same vein if you saw a programmer coding something extremely slow just because they couldn't be bothered to put in the tiny bit of extra effort to get to a baseline of performance you might think they don't take their job or future skillset very seriously. Always thinking about the performance of your implementation is a form of training just like lifting weights is. Even if you don't care about making this particular section of code fast, or you don't care whether this particular dumbbell is lifted up and down 10 times in a row, you do those things in service of a greater goal.

[+] zimpenfish|8 years ago|reply
> If you don`t try to lift heavier weights you will not have progress.

That depends on what you're aiming for - if you're aiming for tone rather than bulk, you'd go for more reps of the same weight rather than stepping up the weight with the same reps, wouldn't you?

(Although I guess, technically, that does count as "heavier" since you're still moving more weight in your sets.)

[+] blux|8 years ago|reply
Regarding the point "Don’t use floating-point operations when integers will do." I agree that using integer arithmetic can result in cleaner, easier to reason about code when applicable. It then follows that because of these traits the software will faster.

But integer arithmetic is not necessarily faster than floating point arithmetic in general; see for example: https://youtu.be/3K2LmnaLLF8?t=31m10s.

[+] AstralStorm|8 years ago|reply
Only if you either have to pay the extra cost of handling overflows, underflows, saturation or normalization. (First three involve conditionals which may be mispredicted, last a division or multiplication.) Likewise correct rounding. (Multiple math ops and sometimes a conditional too.)

Or when the platform has no integer vector math for your type of choice. Which is still somewhat common.

[+] gopalv|8 years ago|reply
> But integer arithmetic is not necessarily faster than floating point arithmetic in general;

GPUs was the first place I feel like floats got a much better pathway than any other data type - graphical pixel manipulations also can get away with much higher arithmetic errors, because it's all going to get rounded down to a pixel eventually.

On the other hand, the only place where I've really worked heavily with fixed point was on a graphical interface toolkit (MIDP on ARM I was working with would've had to use softfp for floats, so fixed point was hugely superior).

[+] AstralStorm|8 years ago|reply
There is indeed a point where being fit and clean turns into an obsession. There is such a thing as premature optimization.

Commonly the mistakes costing performance are caused in part by bad design though, typically arrived at by either not caring or not fixing a quick and dirty prototype with something reasonable. Or "brute force" bug fixing in concurrent code.

If I got paid for every unnecessary or replaceable synchronized statement in Java, I'd be rich. Or a fat data copy made using a queue.

[+] tluyben2|8 years ago|reply
> There is indeed a point where being fit and clean turns into an obsession. There is such a thing as premature optimization.

There is also something as no optimization (at any point), and I see far more often in the wild. I think more devs should run their app on a cheap Android phone or cheap consumer entry laptop to see what is happening when it's not running on 32 gb i7 octa cores. I recently got some entry level machine from a friend to play around with; it has 4gb and flash drive; it is eerie how slow the thing is; this is what is sold in walmart etc at the entry price level, so fair to say, if you are B2C software dev, this is your audience.

For fun, I tried to do some development on it; Atom/Vscode started both snappy, but when devving on it a bit, they became rapidly unusable (cpu 100% always and typing appearing seconds after you typed it); emacs was ok; vim did well. You cannot run Electron on that and yet, that's what many people deliver. And my suspicion is that most don't optimize when using that (if you are fast software minded, you wouldn't use it in the first place, not because it's inherently bad, but because there are too many moving parts you cannot influence). Please try it on these machines to know what you are doing to your users :)

[+] delta1|8 years ago|reply
1. Make it work

2. Make it right

3. Make it fast

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth

[+] microcolonel|8 years ago|reply
The problem is that these rules should apply in sequence to a given feature or flow, but they're instead applied in sequence to the whole product. So what happens is: they make everything "work", then they spend the next two decades pecking at making it right, and making it fast takes a back seat.

Corollary: interaction latency, power consumption, and other performance characteristics should probably be part of "make it right", and "make it fast" might not even be a value at all.

[+] cjsuk|8 years ago|reply
1. Make it work

2. Make it ... err actually fuck it, let's build a hairball on top of it.

That's how it really rolls.

You have to think it fast, think it right, then make it work because if you do it the other way round, it never happens.

[+] Cthulhu_|8 years ago|reply
The challenge there is that a lot of developers are pushed by their managers to go to the next feature after #1, postponing 2 and 3 for when there's time. Time to market is considered the most important thing. Of course, it accumulates after a year.
[+] hasenj|8 years ago|reply
This quote is so misunderstood and so abused all the time.

The point of the statement is to not micro-optimize this and that corner of the program without having any data to guide you on what you should optimize and how.

It's not a rallying call to forgo all concerns about application performance and efficiency.

The mis-application of this quote is the root of all evil in modern software IMO.

It's why a chat program takes 500MB of RAM and why many programs take 10 seconds to load even though there's no technical reason they cannot startup instantly.

You should make it right and fast from the very start. You should not make something that "works" but is full of bugs and slow as hell. Like, that makes no sense at all. If it's full of bugs then it doesn't work. If it's slow as hell then it doesn't work.

1. Make it work correctly and efficiently (to a reasonable degree)

2. Make it even faster

3. Make it better

[+] twic|8 years ago|reply
> But then, I dress cleanly every single day even if I stay at home. And you should too.

Nope.

It's good to know how to produce fast software, but also good to know when to do so. Lemire sounds like he has some psychological issues to work out.

[+] majikandy|8 years ago|reply
I’ve spent the last 20 years not caring about performance. Not once has performance ever been a real issue on any project I’ve been on. Imagine how much time I could have wasted prematurely optimising.
[+] speedplane|8 years ago|reply
You've never written any code to run in parallel? Or used a data structure or algorithm with O(logN) time instead of O(N) time?
[+] mbrock|8 years ago|reply
For most people, "seeing inefficient code" is basically hallucinating. There's real truth to the mantra about premature optimization. I've seen people so many times worry about the performance of something, thinking that some code looks slow, when in actual reality it's a complete nonissue.

Software development is all about compromises, just like any kind of real world engineering. Building a stage for a weekend festival is different from building a bridge over a river. We're always on a budget, so if I can work much faster and safer and lose some marginal efficiency, I'll do it—unless marginal efficiency is what I'm competing on, which it usually isn't.

[+] a_imho|8 years ago|reply
There is a reason we don’t tend to hire people who show up to work in dirty t-shirts. It is not that we particularly care about dirty t-shirts, it is that we want people who care about their work.

I don't quite agree the cleanliness of clothing is a good proxy of one's craftsmanship or personality, even if t-shirts are acceptable garment at a particular workplace.

[+] agnivade|8 years ago|reply
It maybe different in US. But in some Asian countries, most people care about the money at the end of the day.

So yea, they don't care. But does it matter ? People are happy as long as they get paid at the end of the day. They don't care about the quality of code written.

Not saying its a good thing or bad thing. Just the way it is.

[+] foreigner|8 years ago|reply
I think of myself as a software craftsman. Just like a master carpenter, the things I make should be not only functional but beautiful as well. Of course we should "care" about our work.

However there are other metrics besides performance that are worth caring about, e.g. readability, maintainability, etc...

[+] mercer|8 years ago|reply
Meeting deadlines, getting paid, whether anyone will ever care about the code, again, etc.

I also try to be a 'craftsman', and I try to pick work where that's an option, but I find that in practice, more often than not I have to care about various 'other metrics' at the detriment of craftsmanship.

Although I suppose being skilled at managing these various metrics could in itself be considered craftsmanship. And perhaps that mindset helps a bit with not getting utterly depressed at some of the shit I'm forced to deliver...

[+] michaelg7x|8 years ago|reply
Code bloat could easily be sorted out by forcing people to read the generated assembly of their creations.
[+] reikonomusha|8 years ago|reply
This is easy and convenient in Common Lisp with the DISASSEMBLE built-in function.
[+] nnq|8 years ago|reply
Blah. You get a lot of performance "for free" by getting the overall architecture of a system right. By starting with optimizations, you usually hit the "limit of your brainpower" very soon, like before you know enough to architect things right, so you mess up the architecture (think like "shit, we ended up with this location dependent time conversion logic in the backend instead of on the clients, so now is un-cacheable, but it's too late to refactor across multiple teams now... shit").

Just avoid building un-optimizable systems (nowadays it's harder than ever to avoid this... all the tools and services lay traps for us), and then you can safely leave the optimizations for later...

If you can't avoid building an un-optimizable system, all your clever early optimizations are worthless anyway. If you can, they can be postponed anyway.

Now, what I'd really want to read would be an article/book about avoiding the traps that lead people to get stuck with un-optimizable systems! (Though I imagine it's in general an unsolvable problem if natural evolution didn't solve it either: at some point your body accumulated enough de-optimizations that it basically stops working or being fixable/optimizable, so you die, and some of your memes and genes carry on in "rewritten forks" aka "other newly born people".)