top | item 13805754

Resources about programming practices for writing safety-critical software

208 points| AlexDenisov | 9 years ago |github.com | reply

84 comments

order
[+] ctz|9 years ago|reply
The obsession with C/C++ here is really weird. Like, take the MCO failure. That's a classic, textbook problem that can be structurally guaranteed not to happen with use of even a basic type system. It should be literally impossible to confuse values of different types/units/dimensions like this in something described as "safety-critical".

It seems like all the resources here are concerned with trying to whittle C/C++ into an appropriate choice of tool, rather than choosing a different tool. It seems like a 1980s-1990s mindset.

[+] greenhouse_gas|9 years ago|reply
As I understood the MCO failure, a better type system wouldn't have helped, as the issue was that another program expected metric input while the first program outputted US units.

Units can help verify that a formula within a program is correct (velocity v = 0.5(metric_acceleration)9.8 t;cout<<v.to_metric();) won't compile, for example.

But it won't help with Program 1:

velocity_imperial v = 0.5(metric_acceleration)9.8 t* t;cout<<v;

Program2:

velocity_metric v; cin>>v; BurnFor(doSomeRocketScienceToCalculateEngineBurn(v))

[+] Qworg|9 years ago|reply
C/C++ is almost the only choice on most embedded systems, which is where most safety critical code lives.
[+] endorphone|9 years ago|reply
How would a basic type system protect against incorrectly interpreting an imperial floating point value as a metric floating point value? That seems like an especially weak example, and fundamentally falls under the realm of logical fault endemic of every possible programming language.

There are legitimate gripes about C/C++, especially in a space with hostile actors an unknown inputs, but that example was particularly weak.

[+] btilly|9 years ago|reply
Anecdotally I personally have talked to several people in the last few years who do things like write guidance systems for rockets. My limited sample frequently either worked in C or (a limited subset of) C++. Albeit with a variety of tooling on top to automatically test and catch a variety of kinds of common flaws.

So as weird as this may seem to you, that mindset is applicable much more recently than you might expect.

[+] AlexDenisov|9 years ago|reply
I have seen so many comments and references like this one, so I went and read the whole investigation report.

1. There was a spacecraft (MCO) and a module that was sending some data from the Earth.

2. The module was delivered late when MCO was on its way for 4 (!) months already before that staff manually calculated the needed data.

3. Some teams switched into "defensive mode" not willing to communicate and fixing the problem when it was clear.

[+] Jtsummers|9 years ago|reply
I don't have all my resources on hand right now, but off the top of my head this book should be added:

https://mitpress.mit.edu/books/engineering-safer-world

This list is barely scratching the surface of safety-critical system engineering, but it's a start.

[+] jonahx|9 years ago|reply
I'm halfway through this, and not only is the theory insightful and often unexpected, but it's incredibly engaging, incredibly so for such an academic work.
[+] stanislaw|9 years ago|reply
Thanks for the link. The book has been added.
[+] swah|9 years ago|reply
Other than the latest MISRA, I really enjoyed "Better Embedded System Software" by Phil Koopman.

Ideally you should read it before starting your project, since it deal with the product specification/gathering requirements phase, which is your starting point in safety critical systems.

[1] https://betterembsw.blogspot.com.br/2010/05/test-post.html

[+] phelmig|9 years ago|reply
Does anyone know how software quality is handled in complex supply chains, e.g. automotive? From my point of view software is a 2nd grade citizen in areas dominated by manufacturing and classical engineering.

I guess testing an over-the-update for a car that was build by ann OEM and thousands of suppliers must be quite a task.

[+] Jtsummers|9 years ago|reply
It's getting better, but hardware companies tend to view software as second-class. They think it's "easy", though they're finally accepting that it's not. It's taken decades of fatalities, cost overruns, and missed deadlines for them to realize this, but they're realizing it.
[+] adrianN|9 years ago|reply
Typically the software has to be developed according to some ISO standard like https://en.wikipedia.org/wiki/ISO_26262 and the supplier has to have some proof like from the UL or the German TÜV that they followed the procedures.
[+] GoToRO|9 years ago|reply
MISRA, tons of testing, sometimes manual, and just a lot of people working on a somewhat simple problem if you ignore the safety requirement.

Also people don't realize, but by using a linter you basically don't write the code in C, but in "safe C". It's like a different language.

[+] yeslibertarian|9 years ago|reply
hopefully in a future not so far away, most safety-critical code will be formally verified, like http://sel4.systems/ for example
[+] kevinr|9 years ago|reply
Code like the Boeing 787's avionics package gets one better: the spec specifies what the register values should be after each step of execution, and there's a company which takes the code, puts the processor in single-step mode, and checks.
[+] danaliv|9 years ago|reply
DO-178B has been replaced by DO-178C.
[+] RaiO|9 years ago|reply
Is there anything like this that specifically addresses reliability in a critical (but not "safety-critical") system?
[+] partycoder|9 years ago|reply
I have read the JSF standard. I learned a lot from reading it.

However, the JSF project has been reported to have lots of software defects.

[+] hackuser|9 years ago|reply
> the JSF project has been reported to have lots of software defects

I haven't read anything that differentiates between these two possible scenarios:

1) Poor engineering, execution, etc.

2) The bugs expected in this software project. When I think of it this way, I'm amazed it ever was completed (but maybe I'm thinking about it the wrong way):

* Meet the specifications of not only three U.S. military services but also militaries and other entities in multiple national governments (with all the politics, compromise and complexity that involves).

* Invent and implement technologies to provide capabilities so bleeding edge that few people will imagine some of them for years, if not decades. There are no prior designs; nothing like it has ever been done. Part of the point is to exceed competitors' engineering capabilities by as much as possible.

* Integrate these technologies into a massive system of systems, arguably the most complex system in the history of humankind.

* The system is human-rated.

* Performance is the highest priority; there is no making easy compromises of performance for safety: Human lives, the outcomes of battles, the fates of nations, and the course of history may depend on performance.

* Accomplish this in secret, greatly restricting your access to outside resources. Will this work? You can't publish a paper and get feedback, or make a presentation at a conference.

* Accomplish this in coordination with thousands of suppliers in many countries.

* Because it's hardware and very expensive, your ability to iterate is limited. My completely amateur guess based on the above is that it's a massive, decades-long waterfall-style project.

[+] vonmoltke|9 years ago|reply
Don't blame the tools, blame the carpenters (and the customers, and the customer's bosses).
[+] watwut|9 years ago|reply
That is awesome, thank you.
[+] throwme_1980|9 years ago|reply
c++ is not considered safe for any RTOS system, in fact you won't find it used in Aviation embedded devices (referring to the big 3 ) Tools yes, you can higher level languages to your heart's content.
[+] vonmoltke|9 years ago|reply
Huh? The F-22A, F-35, P-8, and P-3 are all flying C++ code. Those are just the programs I have personally touched (not necessarily the code, though). Where did you get the idea that it "is not considered safe for any [real-time] system"?