I think there is a growing rebellion against the kind of software development "best practices" that result in the kind of problems noted in the article. I see senior developers in the game industry coming out against sacred principles like object orientation and function size limits. A few examples:
For ages now, I've been telling people that the best best code, produced by the most experienced people, tends to look like novice code that happens to work --- no unnecessary abstractions, limited anticipated extensibility points, encapsulation only where it makes sense. "Best practices", blindly applied, need to die. The GoF book is a bestiary, not an example of sterling software design. IME, it's much more expensive to deal with unnecessary abstraction than to add abstractions as necessary.
People, think for yourselves! Don't just blindly do what some "Effective $Language" book tells you to do.
(For starters, stop blindly making getters and settings for data fields! Public access is okay! If you really need some kind of access logic, change the damn field name, and the compiler will tell you all the places you need to update.)
Not a disagreement, but it's interesting that at least 3 of 4 of the people you mentioned are game programmers (not sure about Mike Acton because his name doesn't ring a bell). Some, like Carmack, are definitely brilliant programmers. But game programming has very specific constraints, doesn't it? Speed and size are comparatively more important than in bussiness/enterprise software, and maintenance is comparatively less important.
That said, I welcome anyone trying to knock OOP off its pedestal.
A poor programmer uses the first abstractions and ideas to come into their head, and runs with it.
A mediocre programmer uses ideas and abstractions they've heard about being good ideas for this scenario, and just runs with it, occaisionally rewriting as needed.
A good programmer carefully figures out what abstractions and ideas are appropriate for the job at hand, studying and rewriting until they're sure they've gotten them right, and uses them.
A master programmer uses the first abstractions and ideas to pop into their head: they've been at this long enough to know the right approach.
Ridiculously long functions are a maintainability problem but so is a ton of really small functions that do not provide a logical separation of concerns.
OO code can provide modularity which can greatly improve the ability to make changes without breaking other code. On the other hand, when applied poorly it can have they opposite effect.
Muratori's compression-oriented programming and Acton's data-oriented design have really helped me in writing HPC code. Carmack's arguments make sense for apps that have a clear main function, although they're less applicable to libraries.
I consider these also "best practices", they are just better for performance than object-oriented practices applied to many small objects.
One other important thing to keep in mind when considering that crowd is that all of them are game programmers, which face a very different set of constraints than web developers (which is what I think a lot of people here are) face. That being said, I do like a lot of what they have said in the past.
Not to say that the above aren't all examples of skilled programmers, and likely much more practical than a lot of people, just that they have a very different experience in the world than say, Uncle Bob or Martin Fowler. (Some of the more "best practices" developers).
I think an overarching trend is that programmers in general are realizing that "best practices" like all the OOP design patterns (like the flyweight or adapter patterns) are better if you don't have to go out of your way to accommodate them, but they fit into the language well.
The movement of languages like Rust, Go and Elixir (what I've been able to investigate lately) away from class-based OOP by splitting it up into its various pieces (subtyping, polymorphism, code sharing, structured types) is a good trend for the programming industry IMO. I'm looking forward to more improvements in the ability to statically verify code a la Rust. Also exciting is the improvements that C# is getting from Joe Duffy's group to help it reduce allocations and GC pressure.
It's an exciting time to be in software development and to be following PTL development, some meaningful progress seems to be happening.
I do so agree with that. Me, i am a old school programmer. Started with basic, pascal, cobol, clipper, dbase, Vb, C++, php, javascript, java. Always created the framework i needed and the libraries i needed. Straight forward piramid structure software. Everything was functions ( its coming back ). Hardly any testers other then the client and yourself. Lots of that stuf is still running. Now i'm lost a of times in the complexity of the frameworks and the use of endless classes. Debugging takes ages because some class in a total different environment is badly written. My advice, KEEP IT SIMPLE.
Casey Muratori's article really hits home for me. That's how I've felt for a long time, and I'm glad to have a coherent article to point to, to explain this to others.
I think a lot of this could be covered by two principles that are often quoted but over looked. The first is, KISS, keep it simple stupid. The second is single responsibility.
When I design software, I apply both of these to every facet of the system (though I admit sometimes not as well as I should). The end result is I might not have a ton of interfaces and hierarchies. It might not handle curve balls as well as an abstract MachineFactoryFactory could. It does handle everything that we've thrown at it however.
This article is anecdotal and ranty but I will respond anyway. I've spent the last 15 years working on various projects involving cleaning up scientific code bases. Messy unengineered code is fine if only a very few people ever use it. However, if the code base is meant to evolve over time you need good software engineering or it will become fragile and unmaintainable.
That said, there are many "programmers" who apply design concepts willy nilly with out really understanding why. They often make a bigger mess of things. There is an art to quality software engineering which takes time to learn and is a skill which must be continually improved.
The claim in the article that programmers have too much free time on their hands because they aren't doing real work, like a scientist does, is obviously ridiculous. Any programmer worth their salt is busy as hell and spends a lot of thought on optimizing their time.
Conclusion, scientists should work with software engineers for projects that are meant to grow into something larger but hire programmers with a proven track record of creating maintainable software.
I've had similar experience with scientific software. When I'm told that the existing software is "OK because it works", I ask "how do you know it works?" because typically there are no unit tests or tests of any sort of individual stages for that matter.
I've found that scientists tend to assume "it works" when they like the results they see such as R^2 values high enough to publish.
Recently I converted some scientific software that was using correlation^2 (calling it R^2) as a measure for model predictions, as opposed to something more appropriate like PRESS-derived R^2s (correlation is totally inappropriate for judging predictions because it's translation and scale independent on both observed and predicted sides). Nobody went looking for the problem because results seem good and reasonable. Converting to a proper prediction R^2, some of the results are now negative, meaning the models are doing worse than a simple constant-mean function. Yikes.
Yes, I work on a mixed team of physicists, engineers, and computer scientists, and the most frustrating part is trying to work with some of the physicists' code. For the most part it is fairly functional, but the problem is that it is almost unreadable. It is quite clear that they write it as fast as possible so they can do what the OP would call real work without regard for others will need to work with and maintain that code later on.
What most people seem to forget is that "best practices" are not universal: Depending on the size and scope of the software project, some best practices are actually worst practices and can slow you down. For example, unit testing and extensive documentation might be irrelevant for a short term project / prototype while they will be indispensable for code that should be understood and used by other people. Also, for software projects that have an exploratory nature (which is often the case for scientific projects) it's usually no use trying to define a complete code architecture at the start of the project, as the assumptions about how the code should work and how to structure it will probably change during the project as you get a better understanding of the problem that you try to solve. Trying to follow a given paradigm here (e.g. OOP or MVC) can even lead to architecture-induced damage.
The size of the project is also a very important factor. From my own experience, most software engineering methods start to have a positive return-on-investment only as you go beyond 5.000-10.000 lines of code, as at this point the code base is usually too large to be understandable by a single person (depending on the complexity of course), so making changes will be much easier with a good suite of unit tests that makes sure you don't break anything when you change code (this is especially true for dynamically typed languages).
So I'd say that instead of memorizing best practices you need to develop a good feeling for how code bases behave at different sizes and complexities (including how they react to changes), as this will allow you to make a good decision on which "best practices" to adopt.
Also, scientists are -from my own experience- not always the worst software developers as they are less hindered by most of the paradigms / cargo cults that the modern programmer has to put up with (being test-driven, agile, always separating concerns, doing MVP, using OOP [or not], being scalable, ...). They therefore tend to approach projects in a more naive and playful way, which is not always a bad thing.
Disclosure: I'm a recent astronomy grad who specialized in computational astrophysics. Definitely biased.
The issue is that at least for many scientists and mathematicians, mathematical abstraction and code abstraction are topics that oftentimes run orthogonal to each other.
Mathematical abstractions (integration, mathematical vernacular, etc) are abstractions hundreds of years old, with an extremely precise, austere, and well defined domain, meant to manage complexity in a mathematical manner. Code abstractions are recent, flexible, and much more prone to wiggly definitions, meant to manage complexity in an architectural manner.
Scientists often times have already solved a problem using mathematical abstractions, e.g. each step of the Runge-Kutta [1] method. The integrations and function values for each step is well defined, and results in scientists wanting to map these steps one-to-one with their code, oftentimes resulting in blobs of code with if/else statements strewn about. This is awful by software engineering standards, but in the view of the scientist, the code simply follows the abstraction laid out by the mathematics themselves. This is also why it's often times correct to trust results derived from spaghetti code, since the methods that the code implements themselves are often times verified.
Software engineers see this complexity as something that's malleable, something that should be able to handle future changes. This is why it code abstractions play bumper cars with mathematical abstractions, simply because mathematical abstractions are meant to be unchanging by default, which makes tools like inheritance, templates, and even naming standards poorly suited for scientific applications. It's extremely unlikely I'll ever rewrite a step of symplectic integrators [2], meaning that I won't need to worry about whether this code is future proof against architectural changes or not. Functions, by and large in mathematics, are meant to be immutable.
Tl; dr: Scientists want to play with Hot Wheels tracks while software engineers want to play with Lego blocks.
>mathematical abstractions are meant to be unchanging by default
Let's say today I am doing RK2, and tomorrow I want RK4, how do I easily make my change? In my codes, it's a change of a single line and I get higher order convergence, etc. It is not a week or month project, as for many codes, it would be because of some of those abstractions you derride.
Also, computational math is an active area of research, the method you mentioned is not hundreds of years old, although yes, it was developed in the early 1900's. To this day, people are developing new methods that give higher order accuracy (orders above O(err^10) to abuse notation)...but as you can guess, no one uses them because changing the current codes are so difficult they just don't.[0] Of course, I agree O(err^4) is often enough, so the motivation to change codes now isn't that over-powering, but it again is something we lose by learning things a little but outside our field which could be helpful.
[0]Instead we, choose smaller and smaller mesh-sizes and timesteps to deal with small order error, and request millions of cpu hours, use electricity, kill trees and contribute to global warming.
It sounds like you want a language like Haskell. Abstractions are based on mathematical (algebraic and category theoretic) abstractions with well-defined laws. The language has immutable semantics and admits equations reasoning. Using libraries like Dimensional has made me better at physics; many fields of physics play fast and loose with units and dimensions and aren't even aware of it.
meeeeh, come on. You can't say the sloppy code can be trusted because the clean math it is based on is verified. The sloppiness of the code prevents validation that it properly implements that precious math of yours.
The problem is that you want to treat the code as not your "real" job. Your real job is getting correct answers into published papers, and providing a proof of that correctness. If your code, on which your results rely, is too sloppy for anyone else to understand (and note that "anyone else" can include "you, in 6 months"), then you've not proven correctness at all.
> The issue is that at least for many scientists and mathematicians, mathematical abstraction and code abstraction are topics that oftentimes run orthogonal to each other.
Excellent observation. I'm an ex-physicist and on the few occasions that I had to use computers the only thing I cared about was how computer functions mapped into the mathematical abstractions that I cared about. Everything else was just noise.
"Crashes (null pointers, bounds errors), largely mitigated by valgrind/massive testing"
Once upon a time I had lunch with a friend-of-a-friend whose entire job, as a contractor for NASA, was running one program, a launch vehicle simulation. People would contact her, give her the parameters (payload, etc.) and she would provide the results, including launch parameters for how to get the launch to work. Now, you may be thinking, that seems a little suboptimal. Why couldn't they run the program themselves; they're rocket scientists, after all?
Unfortunately, running the program was a dark art. The knowledge of initial parameter settings to get reasonable results out of the back end had to be learned before it would provide, well, reasonable results. One example: she had to tell the simulation to "turn off" the atmosphere above a certain altitude or the simulation would simply crash. She had one funny story about a group at Georga Tech who wanted to use the program, so they dutifully packed off a copy to them. They came back wondering why they couldn't match the results she was getting. It turns out that they had sent the grad students a later version of the program than she was using.
Here's the thing that grinds my gears. Let's see scientists apply that same attitude toward papers. Let them label a bunch of equations poorly, and not label a few, have them explain concepts out of turn in different places in the document, have them produce shitty, unreadable figures, let's see how that turns out.
The issue is that code which eventually leads to their results isn't public, they don't have their reputation lying on it, and so they can pretend they understand what they talk about when they come to publishing, but one or two looks at their code let's you know they hardly bullshit. But when if comes to a paper, well, they will be judged on that, so they can't be messy there.
It's okay if it's a one off code for one group, that's fine. But when a code is vital for so many people, for it to be that terrible and inaccessible?
Simple solution: if you are funded by the tax payer, what you produce should be accessible by the tax payer (absent defense restrictions). Demanding accessibility for gov't funded papers is good but I feel the same restriction should apply to code.
That's incredibly common in enterprises. the fad is to move enterprises to an ITIL model where work is done by a mix of matrixed in-house and outsourced teams, the overhead of "best practices" becomes important.
His first list really, really hand-waves the problems that style of coding can cause. Just use better tools or run valgrind? It never is that simple.
One aspect of scientific coding is that it can have very long lifetimes. I sometimes work on some code > 20 years old. Technology can change a lot in that time frame. For example, using global data (common back then) can completely destroy parallel capability.
The 'old' style also makes the code sensitive to small changes in theory. Need to support a new theory that is basically the same as the old one with a few tweaks? Copy and paste, change a few things, and get working on that paper! Who cares if you just copied a whole bunch of global data - you successfully avoided the conflict by putting "2" at the end of every variable. You've got better things to do than proper coding.
Obviously, over-engineering is a problem. But science does need a bit of "engineering" to begin with.
Anecdote: A friend of mine wanted my help with parsing some outputs and replacing some text in input files. Simple stuff. He showed me what he had. It was written in fortran because that's what his advisor knew :(
Note: I'm currently part of a group trying to help with best practices in computational chemistry. We'll see how it goes, but the field seems kind of open to the idea (ie, there is starting to be funding for software maintenance, etc).
I think some of the author's criticisms are misplaced.
Long functions — Yes, functions in scientific programming tend to be longer than your usual ones, but that's often because they cannot be split into smaller functions that are meaningful on their own. In other words, there's simply nothing to "refactor". Splitting them into smaller chunks would simply result in a lot of small functions with unclear purposes. Every function should be made as small as possible, but not smaller.
Bad names — The author gives 'm' and 'k' as examples of bad variable names. I think this is a very misplaced criticism. Unless we are talking about a scientific library, many scientific programs are just implementations of some algorithms that appear in published papers. For such programs, the MAIN documentations are not in the comments but the published papers themselves. The correct way to name the variables is to use exactly the symbols in the paper, but not to use your favourite Hungarian or Utopian notations. (Some programming languages such as Rust or Ruby are by design very inconvenient in this respect.) As for long variable names, I think they are rather infrequent (unless in Java code); the author was perhaps unlucky enough to meet many.
Mostly I agree, bad naive code is better than bad sophisiticated code.
Also science very frequenly only requires small programs that are used for one analisys and then thrown away. It's OK to have a snarl of bad Fortran or Numpy if it only 400 lines long.
BUT: scientific projects are often (in my old field, usually) also engineering projects. Such experiments are complex automated data gathering machines hardware and take rougly similar data runs tens of thousands of times.
There should be some engineering professionalism at the start to design and plan such a machine. Especially the software, since it is mostly a question of integrating off-the shelf hardware.
But PIs think:
(A) engineering is done most cheaply by PhD students -- a penny pinching fallacy.
(B) that their needs will grow unpredictably over time.
B is true, but is actually is a reason to have a good custom platform designed at the start, so that changes are less costly. Your part time programmer is going to develop many thousand of lines of code no one can understand or extend. (I've done it, I should know.)
I believe this post is fundamentally misguided, but I can see how the author got there. In fact I see it as a sort of category error. When you talk about a style of programming being "good" or "bad", I always want to ask "for what?". I wonder if the author has thought about what would happen if everyone adopted the "scientific" style they are alluding too.
Most of what the author describes as the problems of code generated by scientist are what I would call symptoms. The real problems are things like: incorrect abstractions, deep coupling, overly clever approaches with unclear implicit assumptions. Of course this causes maintenance and debugging to be more difficult than it should but the real problem is that such code does not scale well and is poor at managing complexity of the code base.
So long as your code (if not necessarily its domain) is simple, you are fine. Luckily this describes a huge swath of scientific code. However system complexity is largely limited by the the tools and approaches you use .. all systems eventually grow to become almost unmodifiable eventually.
The point is, this will happen to you faster if you follow the "scientific coder" approaches the author describes. Now it turns out that programmers have come up with architectural approaches that help manage complexity over the last several decades. The bad news for scientific coders is that to be successful with these techniques you actually have to dedicate some significant amount of time to learning to become a better programmer and designer, and learning how to use these techniques. It also often has a cost in terms of the amount of time needed to introduce a small change. And sometimes you make design choices that don't help your development at all. They help your ability to release, or audit for regulatory purposes, or build cross-platform, or ... you get the idea. So these approaches absolutely have costs. You have to ask yourself what you are buying with this cost, and do you need it for your project.
The real pain comes when you have people who only understand the "scientific" style already bumping up against their systems ability to handle complexity, but doubling down on the approach and just doing it harder. Those systems really aren't any fun to repair.
It's an interesting discussion, and as the article points out, "Software Engineer" code has some issues as well
There's also an issue that code ends up reflecting the initial process of the scientific calculation needed, which might not be a good idea (but if you depart from that, it causes other problems as well)
Also, I'm going to be honest, a lot of software engineers are bad at math (or just don't care). In theory a/b + c/b is the same as (a+c)/b, in practice you might near some precision edge that you can't deal directly and hence you need to calculate this in another way
It's worse that you say (and think?). For example: in general, floating point equality isn't transitive, and addition isn't even associative.
Not only do those "bad at math" software engineers get this wrong, most of the scientists do too. These two groups often make different types of errors, true - but nearly everybody who hasn't studied numerical computation wiht some care is just bad at it.
I'm 80% "software engineer" and 20% "researcher" and have to play both roles to write supercomputer code (I'm the minority, most peers are more researchers). These issues are important right now, as the govt is investing in software engineering due to recent hardware changes that require porting efforts. We recognize the pitfalls of naive software engineering applied to scientific code, and would like to do things more carefully. I don't think we should have to choose one or the other; with proper communication we can achieve a better balance.
In his excellent book [1], Andy Hunt explains what expertise is with a multi-level model [2], where a novice needs rules that describe what to do (to get started) while an expert chooses patterns according to his goal.
So, "best practices" are patterns that work in most situations, and an expert can adapt to several (and new) situations.
The title of this article should really be "Why bad scientific code beats bad software engineer code."
It contrasts a bunch of bad things scientific coders do, and a bunch of bad things bad software engineers do. There's no "best practices" to be seen on either side.
The article overlooks a massive source of problems : the problems he describes in engineers' code usually starts to become annoying at larger scale. The problems he describes in scientists' code rarely happens at scale, because it can't be extended significantly. I feel it's weird to compare codebases that probably count in the thousands, and codebases that count in the hundreds of thousands or million lines of code.
Also it is worth noting that every single problem he has with engineers' code is described at length in the litterature (working effectively with legacy code, DDD blue book, etc). Of course, these problems exist. But this is linked to the fact that hiring bad programmers still yields benefits. I believe this is not something that we can change, but if the guy is interested in reducing his pain with crappy code, there are solutions out there.
> Long functions
This isn't the worst thing. As long it gets refactored when there is a need for parts of that function to be used in multiple places.
> Bad names (m, k, longWindedNameThatYouCantReallyReadBTWProgrammersDoThatALotToo)
I can live with long winded names, while slightly annoying, they at least still help with figuring out what's going on.
What I can't stand are one or two letter variable names. They're just so unnecessary. Be mildly descriptive and your code becomes so much easier to follow, compared to alphabet soup.
What annoys me about stuff like this is that it just feels like pure laziness and disregard for others. Having done code reviews of data scientists they just don't want to hear it. They adamantly don't care - compared to my software engineer compatriots who would at least sit there and consider it.
As a poster above pointed out, a lot of scientific code is an implementation of a mathematical device. And the scientist is trying to make their equations come to life. And in math, many equations are simplified to their variables in order to avoid insane complexity. Many of the scientists actually are thinking in terms of 'S', 't' and 'v', etc. What's the particle's x, y, t coordinates, and how does that get me v, p and l? So that they can write out:
The latter is AWFUL mathematics, and very real code. (and that is an easy equation. I've had to implement very very complicated calculus into objective-c code and it is absolutely horrid what comes out as 'code', as clean as that code might be. It in no way whatsoever resembles the elegance of the math that birthed it.)
When I first started, I naively tried to write math code with the natural Objective-C objects and ended up on the very wrong side of the language. I realize the mistake now, but it's very awkward to ask the (scientist) programmer to go along programming with the language's tutorialed objects, then to tell them, "btw, that 'NSNumber' you have, can't be used as an exponent, along with that 'float' over there. And you can't add NSNumbers and 'integers'. Oh, you want to multiply two NSNumbers together? You want to write an equation with NSNumbers on one line!? Go for it. Oh, and you want to do a cross-product on a matrix? Ha!".
It is tradition in mathematics (and physics, and maybe other sciences) to use single letter names. A function is f, a variable is x, a parameter is a. These short names are intuitive for the scientist who wrote the code, even if programmers have different conventions.
> (In fact, when the job is far from trivial technically and/or socially, programmers' horrible training shifts their focus away from their immediate duty – is the goddamn thing actually working, nice to use, efficient/cheap, etc.? – and instead they declare themselves as responsible for nothing but the sacred APIs which they proceed to complexify beyond belief. Meanwhile, functionally the thing barely works.)
It seems the author has been plagued with programmers who avoid taking responsibility. One strategy for creating job security is to build a system too complex for anyone else to maintain it. Perhaps the author's colleagues are using this strategy.
It's hard to take complaints about "best practices" seriously when the practices described are not best.
> Simple-minded, care-free near-incompetence can be
> better than industrial-strength good intentions
> paving a superhighway to hell.
Love this line.
I think the thing about bad scientific code that makes it good is that you can often get really good walls around what goes in and what comes out. To the point that you can then mitigate the danger of bad code to just that component.
Software architects, on the other hand, often try to pull everything in to the single "program" so that, in the end, you sum all of the weak parts. All too often, I have seen workflows where people used to postprocess output data get pulled into doing it in the same run as the generation of the data.
As always, the right way is somewhere down the middle.
I recently inherited a blob of "scientific code" with basically no abstraction. Need to indicate the sampling period? That'll never change--just type .0001; that'll never change. Need to read some files? Just blindly open a hardcoded list of filename and assume it's okay--it'll always been like that ? And of course, these files are in that format and there's no need to check. Of course, after this code was written, we bought new hardware. It gathers similar data, but samples at a completely different frequency, has a different number of channels, and records the data in a totally different way.
We could fork the code, find-and-replace the sampling rates, and all that, and maintain a version for each device we buy. Or we could write a DataReader interface, some derived versions for each data source, and maybe even the dreaded DataReaderFactory to automatically detect the filetypes.
Guess which approach will work better in a few years?
I recently followed a course on "Principles of programming for Econometrics" and although I knew a lot about programming already I learned a lot about being structured and documentation. The professor ran some example code which he wrote 10 years ago! He wasn't really sure what the function did again and BAM it was there in the documentation (i.e. comment header of the function).
I used to just hack stuff together in either R or Python but that course really got me thinking about what I want to accomplish first. Write that down on paper. And then and only then after you have the whole program outlined in your head start writing functions with well defined inputs and outputs.
Why not use the computer to help you define and understand the problem? It will be much faster to iterate quickly at a repl and then write the cleaned up version later rather than just trying model the whole thing in your head first
I know a lot of math majors thrown into c++ jobs that write unreadable code almost forgetting they are allowed to use words and not just single letters (though they would probably be fine in the functional programming scene). There's a learning curve either way, write like your co-workers unless you have the experience to know your co-workers suck.
[+] [-] modeless|9 years ago|reply
Casey Muratori on "Compression Oriented Programming": https://mollyrocket.com/casey/stream_0019.html
John Carmack on inlined code: http://number-none.com/blow/john_carmack_on_inlined_code.htm...
Mike Acton on "Data-Oriented Design and C++" [video]: https://www.youtube.com/watch?v=rX0ItVEVjHc
Jonathan Blow on Software Quality [video]: https://www.youtube.com/watch?v=k56wra39lwA
[+] [-] quotemstr|9 years ago|reply
People, think for yourselves! Don't just blindly do what some "Effective $Language" book tells you to do.
(For starters, stop blindly making getters and settings for data fields! Public access is okay! If you really need some kind of access logic, change the damn field name, and the compiler will tell you all the places you need to update.)
[+] [-] the_af|9 years ago|reply
That said, I welcome anyone trying to knock OOP off its pedestal.
[+] [-] qwertyuiop924|9 years ago|reply
A mediocre programmer uses ideas and abstractions they've heard about being good ideas for this scenario, and just runs with it, occaisionally rewriting as needed.
A good programmer carefully figures out what abstractions and ideas are appropriate for the job at hand, studying and rewriting until they're sure they've gotten them right, and uses them.
A master programmer uses the first abstractions and ideas to pop into their head: they've been at this long enough to know the right approach.
[+] [-] jcoffland|9 years ago|reply
OO code can provide modularity which can greatly improve the ability to make changes without breaking other code. On the other hand, when applied poorly it can have they opposite effect.
It's not the concepts, it's how they are applied.
[+] [-] dibanez|9 years ago|reply
I consider these also "best practices", they are just better for performance than object-oriented practices applied to many small objects.
[+] [-] yumaikas|9 years ago|reply
Not to say that the above aren't all examples of skilled programmers, and likely much more practical than a lot of people, just that they have a very different experience in the world than say, Uncle Bob or Martin Fowler. (Some of the more "best practices" developers).
I think an overarching trend is that programmers in general are realizing that "best practices" like all the OOP design patterns (like the flyweight or adapter patterns) are better if you don't have to go out of your way to accommodate them, but they fit into the language well.
The movement of languages like Rust, Go and Elixir (what I've been able to investigate lately) away from class-based OOP by splitting it up into its various pieces (subtyping, polymorphism, code sharing, structured types) is a good trend for the programming industry IMO. I'm looking forward to more improvements in the ability to statically verify code a la Rust. Also exciting is the improvements that C# is getting from Joe Duffy's group to help it reduce allocations and GC pressure.
It's an exciting time to be in software development and to be following PTL development, some meaningful progress seems to be happening.
[+] [-] GrumpyNl|9 years ago|reply
[+] [-] gens|9 years ago|reply
[+] [-] alanbernstein|9 years ago|reply
[+] [-] kazinator|9 years ago|reply
"I don't understand the code" isn't ... quite the same type of problem.
[+] [-] virmundi|9 years ago|reply
When I design software, I apply both of these to every facet of the system (though I admit sometimes not as well as I should). The end result is I might not have a ton of interfaces and hierarchies. It might not handle curve balls as well as an abstract MachineFactoryFactory could. It does handle everything that we've thrown at it however.
[+] [-] jcoffland|9 years ago|reply
That said, there are many "programmers" who apply design concepts willy nilly with out really understanding why. They often make a bigger mess of things. There is an art to quality software engineering which takes time to learn and is a skill which must be continually improved.
The claim in the article that programmers have too much free time on their hands because they aren't doing real work, like a scientist does, is obviously ridiculous. Any programmer worth their salt is busy as hell and spends a lot of thought on optimizing their time.
Conclusion, scientists should work with software engineers for projects that are meant to grow into something larger but hire programmers with a proven track record of creating maintainable software.
[+] [-] sixbrx|9 years ago|reply
I've found that scientists tend to assume "it works" when they like the results they see such as R^2 values high enough to publish.
Recently I converted some scientific software that was using correlation^2 (calling it R^2) as a measure for model predictions, as opposed to something more appropriate like PRESS-derived R^2s (correlation is totally inappropriate for judging predictions because it's translation and scale independent on both observed and predicted sides). Nobody went looking for the problem because results seem good and reasonable. Converting to a proper prediction R^2, some of the results are now negative, meaning the models are doing worse than a simple constant-mean function. Yikes.
[+] [-] wsha|9 years ago|reply
[+] [-] ThePhysicist|9 years ago|reply
The size of the project is also a very important factor. From my own experience, most software engineering methods start to have a positive return-on-investment only as you go beyond 5.000-10.000 lines of code, as at this point the code base is usually too large to be understandable by a single person (depending on the complexity of course), so making changes will be much easier with a good suite of unit tests that makes sure you don't break anything when you change code (this is especially true for dynamically typed languages).
So I'd say that instead of memorizing best practices you need to develop a good feeling for how code bases behave at different sizes and complexities (including how they react to changes), as this will allow you to make a good decision on which "best practices" to adopt.
Also, scientists are -from my own experience- not always the worst software developers as they are less hindered by most of the paradigms / cargo cults that the modern programmer has to put up with (being test-driven, agile, always separating concerns, doing MVP, using OOP [or not], being scalable, ...). They therefore tend to approach projects in a more naive and playful way, which is not always a bad thing.
[+] [-] ben_jones|9 years ago|reply
[1]: https://www.youtube.com/watch?v=kHI7RTKhlz0
[+] [-] ktamiola|9 years ago|reply
[+] [-] whorleater|9 years ago|reply
The issue is that at least for many scientists and mathematicians, mathematical abstraction and code abstraction are topics that oftentimes run orthogonal to each other.
Mathematical abstractions (integration, mathematical vernacular, etc) are abstractions hundreds of years old, with an extremely precise, austere, and well defined domain, meant to manage complexity in a mathematical manner. Code abstractions are recent, flexible, and much more prone to wiggly definitions, meant to manage complexity in an architectural manner.
Scientists often times have already solved a problem using mathematical abstractions, e.g. each step of the Runge-Kutta [1] method. The integrations and function values for each step is well defined, and results in scientists wanting to map these steps one-to-one with their code, oftentimes resulting in blobs of code with if/else statements strewn about. This is awful by software engineering standards, but in the view of the scientist, the code simply follows the abstraction laid out by the mathematics themselves. This is also why it's often times correct to trust results derived from spaghetti code, since the methods that the code implements themselves are often times verified.
Software engineers see this complexity as something that's malleable, something that should be able to handle future changes. This is why it code abstractions play bumper cars with mathematical abstractions, simply because mathematical abstractions are meant to be unchanging by default, which makes tools like inheritance, templates, and even naming standards poorly suited for scientific applications. It's extremely unlikely I'll ever rewrite a step of symplectic integrators [2], meaning that I won't need to worry about whether this code is future proof against architectural changes or not. Functions, by and large in mathematics, are meant to be immutable.
Tl; dr: Scientists want to play with Hot Wheels tracks while software engineers want to play with Lego blocks.
[1]: https://en.wikipedia.org/wiki/Runge–Kutta_methods
[2]: https://en.wikipedia.org/wiki/Symplectic_integrator
[+] [-] noobermin|9 years ago|reply
Let's say today I am doing RK2, and tomorrow I want RK4, how do I easily make my change? In my codes, it's a change of a single line and I get higher order convergence, etc. It is not a week or month project, as for many codes, it would be because of some of those abstractions you derride.
Also, computational math is an active area of research, the method you mentioned is not hundreds of years old, although yes, it was developed in the early 1900's. To this day, people are developing new methods that give higher order accuracy (orders above O(err^10) to abuse notation)...but as you can guess, no one uses them because changing the current codes are so difficult they just don't.[0] Of course, I agree O(err^4) is often enough, so the motivation to change codes now isn't that over-powering, but it again is something we lose by learning things a little but outside our field which could be helpful.
[0]Instead we, choose smaller and smaller mesh-sizes and timesteps to deal with small order error, and request millions of cpu hours, use electricity, kill trees and contribute to global warming.
[+] [-] wyager|9 years ago|reply
[+] [-] moron4hire|9 years ago|reply
The problem is that you want to treat the code as not your "real" job. Your real job is getting correct answers into published papers, and providing a proof of that correctness. If your code, on which your results rely, is too sloppy for anyone else to understand (and note that "anyone else" can include "you, in 6 months"), then you've not proven correctness at all.
[+] [-] jimbokun|9 years ago|reply
[+] [-] oneloop|9 years ago|reply
Excellent observation. I'm an ex-physicist and on the few occasions that I had to use computers the only thing I cared about was how computer functions mapped into the mathematical abstractions that I cared about. Everything else was just noise.
[+] [-] mcguire|9 years ago|reply
Once upon a time I had lunch with a friend-of-a-friend whose entire job, as a contractor for NASA, was running one program, a launch vehicle simulation. People would contact her, give her the parameters (payload, etc.) and she would provide the results, including launch parameters for how to get the launch to work. Now, you may be thinking, that seems a little suboptimal. Why couldn't they run the program themselves; they're rocket scientists, after all?
Unfortunately, running the program was a dark art. The knowledge of initial parameter settings to get reasonable results out of the back end had to be learned before it would provide, well, reasonable results. One example: she had to tell the simulation to "turn off" the atmosphere above a certain altitude or the simulation would simply crash. She had one funny story about a group at Georga Tech who wanted to use the program, so they dutifully packed off a copy to them. They came back wondering why they couldn't match the results she was getting. It turns out that they had sent the grad students a later version of the program than she was using.
Anyway, who's up for a trip to Mars?
[+] [-] noobermin|9 years ago|reply
The issue is that code which eventually leads to their results isn't public, they don't have their reputation lying on it, and so they can pretend they understand what they talk about when they come to publishing, but one or two looks at their code let's you know they hardly bullshit. But when if comes to a paper, well, they will be judged on that, so they can't be messy there.
It's okay if it's a one off code for one group, that's fine. But when a code is vital for so many people, for it to be that terrible and inaccessible?
Simple solution: if you are funded by the tax payer, what you produce should be accessible by the tax payer (absent defense restrictions). Demanding accessibility for gov't funded papers is good but I feel the same restriction should apply to code.
[+] [-] Spooky23|9 years ago|reply
[+] [-] sseagull|9 years ago|reply
One aspect of scientific coding is that it can have very long lifetimes. I sometimes work on some code > 20 years old. Technology can change a lot in that time frame. For example, using global data (common back then) can completely destroy parallel capability.
The 'old' style also makes the code sensitive to small changes in theory. Need to support a new theory that is basically the same as the old one with a few tweaks? Copy and paste, change a few things, and get working on that paper! Who cares if you just copied a whole bunch of global data - you successfully avoided the conflict by putting "2" at the end of every variable. You've got better things to do than proper coding.
Obviously, over-engineering is a problem. But science does need a bit of "engineering" to begin with.
Anecdote: A friend of mine wanted my help with parsing some outputs and replacing some text in input files. Simple stuff. He showed me what he had. It was written in fortran because that's what his advisor knew :(
Note: I'm currently part of a group trying to help with best practices in computational chemistry. We'll see how it goes, but the field seems kind of open to the idea (ie, there is starting to be funding for software maintenance, etc).
[+] [-] luthaf|9 years ago|reply
Any reference concerning this point? I am interested!
[+] [-] The_suffocated|9 years ago|reply
Long functions — Yes, functions in scientific programming tend to be longer than your usual ones, but that's often because they cannot be split into smaller functions that are meaningful on their own. In other words, there's simply nothing to "refactor". Splitting them into smaller chunks would simply result in a lot of small functions with unclear purposes. Every function should be made as small as possible, but not smaller.
Bad names — The author gives 'm' and 'k' as examples of bad variable names. I think this is a very misplaced criticism. Unless we are talking about a scientific library, many scientific programs are just implementations of some algorithms that appear in published papers. For such programs, the MAIN documentations are not in the comments but the published papers themselves. The correct way to name the variables is to use exactly the symbols in the paper, but not to use your favourite Hungarian or Utopian notations. (Some programming languages such as Rust or Ruby are by design very inconvenient in this respect.) As for long variable names, I think they are rather infrequent (unless in Java code); the author was perhaps unlucky enough to meet many.
[+] [-] adrianratnapala|9 years ago|reply
Also science very frequenly only requires small programs that are used for one analisys and then thrown away. It's OK to have a snarl of bad Fortran or Numpy if it only 400 lines long.
BUT: scientific projects are often (in my old field, usually) also engineering projects. Such experiments are complex automated data gathering machines hardware and take rougly similar data runs tens of thousands of times.
There should be some engineering professionalism at the start to design and plan such a machine. Especially the software, since it is mostly a question of integrating off-the shelf hardware.
But PIs think:
(A) engineering is done most cheaply by PhD students -- a penny pinching fallacy.
(B) that their needs will grow unpredictably over time.
B is true, but is actually is a reason to have a good custom platform designed at the start, so that changes are less costly. Your part time programmer is going to develop many thousand of lines of code no one can understand or extend. (I've done it, I should know.)
[+] [-] shitgoose|9 years ago|reply
[+] [-] ska|9 years ago|reply
Most of what the author describes as the problems of code generated by scientist are what I would call symptoms. The real problems are things like: incorrect abstractions, deep coupling, overly clever approaches with unclear implicit assumptions. Of course this causes maintenance and debugging to be more difficult than it should but the real problem is that such code does not scale well and is poor at managing complexity of the code base.
So long as your code (if not necessarily its domain) is simple, you are fine. Luckily this describes a huge swath of scientific code. However system complexity is largely limited by the the tools and approaches you use .. all systems eventually grow to become almost unmodifiable eventually.
The point is, this will happen to you faster if you follow the "scientific coder" approaches the author describes. Now it turns out that programmers have come up with architectural approaches that help manage complexity over the last several decades. The bad news for scientific coders is that to be successful with these techniques you actually have to dedicate some significant amount of time to learning to become a better programmer and designer, and learning how to use these techniques. It also often has a cost in terms of the amount of time needed to introduce a small change. And sometimes you make design choices that don't help your development at all. They help your ability to release, or audit for regulatory purposes, or build cross-platform, or ... you get the idea. So these approaches absolutely have costs. You have to ask yourself what you are buying with this cost, and do you need it for your project.
The real pain comes when you have people who only understand the "scientific" style already bumping up against their systems ability to handle complexity, but doubling down on the approach and just doing it harder. Those systems really aren't any fun to repair.
[+] [-] raverbashing|9 years ago|reply
There's also an issue that code ends up reflecting the initial process of the scientific calculation needed, which might not be a good idea (but if you depart from that, it causes other problems as well)
Also, I'm going to be honest, a lot of software engineers are bad at math (or just don't care). In theory a/b + c/b is the same as (a+c)/b, in practice you might near some precision edge that you can't deal directly and hence you need to calculate this in another way
Try solving a PDE in C/C++ for extra fun
[+] [-] ska|9 years ago|reply
Not only do those "bad at math" software engineers get this wrong, most of the scientists do too. These two groups often make different types of errors, true - but nearly everybody who hasn't studied numerical computation wiht some care is just bad at it.
[+] [-] mlvljr|9 years ago|reply
[deleted]
[+] [-] dibanez|9 years ago|reply
[+] [-] joseraul|9 years ago|reply
So, "best practices" are patterns that work in most situations, and an expert can adapt to several (and new) situations.
[1] https://pragprog.com/book/ahptl/pragmatic-thinking-and-learn...
[2] https://en.wikipedia.org/wiki/Dreyfus_model_of_skill_acquisi...
[+] [-] nolemurs|9 years ago|reply
It contrasts a bunch of bad things scientific coders do, and a bunch of bad things bad software engineers do. There's no "best practices" to be seen on either side.
[+] [-] pyrale|9 years ago|reply
Also it is worth noting that every single problem he has with engineers' code is described at length in the litterature (working effectively with legacy code, DDD blue book, etc). Of course, these problems exist. But this is linked to the fact that hiring bad programmers still yields benefits. I believe this is not something that we can change, but if the guy is interested in reducing his pain with crappy code, there are solutions out there.
[+] [-] lilbobbytables|9 years ago|reply
> Bad names (m, k, longWindedNameThatYouCantReallyReadBTWProgrammersDoThatALotToo) I can live with long winded names, while slightly annoying, they at least still help with figuring out what's going on.
What I can't stand are one or two letter variable names. They're just so unnecessary. Be mildly descriptive and your code becomes so much easier to follow, compared to alphabet soup.
What annoys me about stuff like this is that it just feels like pure laziness and disregard for others. Having done code reviews of data scientists they just don't want to hear it. They adamantly don't care - compared to my software engineer compatriots who would at least sit there and consider it.
But this is just my own anecdotal experience.
[+] [-] toufka|9 years ago|reply
v = ((x2 -x1)^2 + (y2 - y1)^2) ^ (1/2) / t
rather than:
velocity = sqrt(pow((locationX2 - locationX1),2) + pow((locationY2 - locationY1),2)) / duration
The latter is AWFUL mathematics, and very real code. (and that is an easy equation. I've had to implement very very complicated calculus into objective-c code and it is absolutely horrid what comes out as 'code', as clean as that code might be. It in no way whatsoever resembles the elegance of the math that birthed it.)
When I first started, I naively tried to write math code with the natural Objective-C objects and ended up on the very wrong side of the language. I realize the mistake now, but it's very awkward to ask the (scientist) programmer to go along programming with the language's tutorialed objects, then to tell them, "btw, that 'NSNumber' you have, can't be used as an exponent, along with that 'float' over there. And you can't add NSNumbers and 'integers'. Oh, you want to multiply two NSNumbers together? You want to write an equation with NSNumbers on one line!? Go for it. Oh, and you want to do a cross-product on a matrix? Ha!".
[+] [-] occamrazor|9 years ago|reply
[+] [-] xapata|9 years ago|reply
> (In fact, when the job is far from trivial technically and/or socially, programmers' horrible training shifts their focus away from their immediate duty – is the goddamn thing actually working, nice to use, efficient/cheap, etc.? – and instead they declare themselves as responsible for nothing but the sacred APIs which they proceed to complexify beyond belief. Meanwhile, functionally the thing barely works.)
It seems the author has been plagued with programmers who avoid taking responsibility. One strategy for creating job security is to build a system too complex for anyone else to maintain it. Perhaps the author's colleagues are using this strategy.
It's hard to take complaints about "best practices" seriously when the practices described are not best.
[+] [-] thearn4|9 years ago|reply
1) lack of version control
2) lack of testing
Everything else (including the occasional bad language fit) is usually a distant 3rd.
[+] [-] taeric|9 years ago|reply
I think the thing about bad scientific code that makes it good is that you can often get really good walls around what goes in and what comes out. To the point that you can then mitigate the danger of bad code to just that component.
Software architects, on the other hand, often try to pull everything in to the single "program" so that, in the end, you sum all of the weak parts. All too often, I have seen workflows where people used to postprocess output data get pulled into doing it in the same run as the generation of the data.
[+] [-] mattkrause|9 years ago|reply
I recently inherited a blob of "scientific code" with basically no abstraction. Need to indicate the sampling period? That'll never change--just type .0001; that'll never change. Need to read some files? Just blindly open a hardcoded list of filename and assume it's okay--it'll always been like that ? And of course, these files are in that format and there's no need to check. Of course, after this code was written, we bought new hardware. It gathers similar data, but samples at a completely different frequency, has a different number of channels, and records the data in a totally different way.
We could fork the code, find-and-replace the sampling rates, and all that, and maintain a version for each device we buy. Or we could write a DataReader interface, some derived versions for each data source, and maybe even the dreaded DataReaderFactory to automatically detect the filetypes.
Guess which approach will work better in a few years?
[+] [-] Rainymood|9 years ago|reply
I used to just hack stuff together in either R or Python but that course really got me thinking about what I want to accomplish first. Write that down on paper. And then and only then after you have the whole program outlined in your head start writing functions with well defined inputs and outputs.
[+] [-] wintermute42|9 years ago|reply
[+] [-] cdevs|9 years ago|reply