top | item 28894751

Software developers have stopped caring about reliability

94 points| ciprian_craciun | 4 years ago |drewdevault.com | reply

109 comments

order
[+] wokwokwok|4 years ago|reply
Well, it's easy to complain.

The reality is that money dictates decisions and behaviour; and if consumers wanted reliable stuff, they'd buy it.

They don't.

They want free stuff, and if it doesn't work well, they'll complain and suck it up because it's free.

The idea that developers are somehow empowered to a) not do as they're told by their bosses, and b) should just go off on their own and do whatever the hell they want (eg. making things reliable), when it is not what they have been asked or instructed to do...

...well, it's a big ask.

If you want change you need to do one of probably these things:

- Empower developers to somehow write more reliable software at no additional effort.

- Convince consumers to buy more expensive stuff.

- Provide a legal mandate to punish companies that fail to adhere to specific levels of quality control.

- Provide a meta framework the punish individuals that fail to adhere to specific levels of quality control (eg. IEEE and other professional organisations which can expel you).

I can't imagine how you do the first two. Those are Hard Problems.

The last two have worked in the past in some circumstances in other professions... but, you know, many folk don't like the idea that there are codes of behaviour, ethics and conduct they should be forced to adhere to; but they want people to take personal responsibility for things not being perfect.

It's a bit of a joke that everyone has to 'do the right thing' with no rules to enforce it... but hey, it's easy to ask for, because you don't have to do anything except complain people aren't doing the right thing.

Thats why you can't have nice things like software that works.

[+] DaiPlusPlus|4 years ago|reply
> Provide a legal mandate to punish companies that fail to adhere to specific levels of quality control.

This is the only workable solution - fortunately it fits right into established regulatory frameworks.

-------------

Controversial opinion:

I do think we need some kind of widely-recognized (i.e. not just in Texas) professional licensing system for SEs/SWEs to enable some kind of accountability for bad software products:

...When a building collapses and kills people, the civil engineer-of-record is accountable or serve as witness. Similarly when avionics software kills passengers (MCAS, etc) there should be a software-engineer-of-record to be held accountable or to serve as witness. Change my opinion.

I know there's a big difference between safety-critical software systems and consumer-hostile inkjet printer cartridges ( https://news.ycombinator.com/item?id=28888214 ) - but more and more jurisdictions (especially outside the US) are going to call-out US and international software and hardware companies on their business-practices which means they'll probably institute something like this inevitably.

[+] tchalla|4 years ago|reply
"Show me the incentives and I will show you the outcome." ~ Charlie Munger
[+] chrismorgan|4 years ago|reply
> if consumers wanted reliable stuff, they'd buy it.

Trouble is, often I can’t when I want to because no one’s making it—that the industry (whatever industry, you find this in many of them) has accepted mediocrity.

For software specifically, my observation is that attempts to retrofit reliability, performance and the likes almost always fail: these are properties that you have to deliberately bake in with from the start, and maintain lest you find them difficult to claw back.

[+] spacemanmatt|4 years ago|reply
> The idea that developers are somehow empowered to a) not do as they're told by their bosses, and b) should just go off on their own and do whatever the hell they want (eg. making things reliable), when it is not what they have been asked or instructed to do... > > ...well, it's a big ask.

While back my brother's band had a song called, Money Goes to the Man. It captured this issue pretty well.

[+] native_samples|4 years ago|reply
It seems much easier to find examples of the first two than the last two, really. I've been programming now for 30 years. Here are just ten of the major improvements the industry made to software reliability in that time, on its own, without government intervention of any kind:

1. Unit testing.

2. Garbage collection (no more use-after-free bugs).

3. Stronger type systems.

4. Then stronger type systems with usability close to that of dynamic type systems.

5. Widespread usage of encryption, signing and sandboxing (hacking being a subset of "stuff that makes software unreliable").

6. Exceptions with useful stack traces, that in some cases can be caught allowing the program to proceed. For example, my IDE doesn't totally die if a plugin hits an assertion whilst analyzing code, it just means that plugin doesn't get to contribute to my editor session.

7. Tightly specified memory models like the JMM.

8. Excellent monitoring platforms.

9. State machine replication like Raft, allowing for entire datacenter outages to be made nearly invisible to end users. Result: Google servers are now more reliable than the internet itself.

10. Ultra-stable operating systems like Linux. In 1995 the most popular OS couldn't manage an uptime of more than ~45 days. Today you can hotpatch the kernel without a reboot.

There are loads more like this. Software today is drastically more reliable than it once was. Meanwhile, what has the government done? I'm trying to think of some, but they've all been failures or actually made things worse. In the security realm, we can thank governments for things like FIPS cryptography, which is hardly relevant outside places where it's mandated because it takes so long to approve what are clearly upgrades. Governments can't even get reliable software out of their own contractors - they're in no position to lecture others on how to do it right when they can't even get their own house in order.

But it's not enough to only consider what governments do for reliability. We must also consider what they do against it. There unfortunately the karmic balance is deeply in the red, because governments routinely stockpile exploits then lose control of them, undermine cryptographic standards and so on. They actually like software being unreliable because they see bugs as weapons. A big part of Google's Project Zero is about imposing software reliability against the will of governments.

[+] lytefm|4 years ago|reply
In some industries where the consequences of failing software are fatal, e.g. medical devices or vehicles, 3. and 4. are in place and ensure that (usually) nothing bad happens.

2. isn't that important if everyone within an industry has to adhere to the same standards.

For example, GDPR levels the playing field for any company that operates in Europe and ISO27001 etc for many industries. But sure, companies will usually do the minimum required effort - and that only if the threat of fines and audits is high enough.

[+] phendrenad2|4 years ago|reply
Ah I remember when I thought like this. I was so young and naive. Good times.

Fact of the matter is, software is NOT like building a bridge, or any other engineering discipline, it's much, much more complex. We'll never achieve high reliability for anything but the simplest software (reactor control rooms come to mind). Accept the fact that software is hopelessly broken, and find tactics to deal with it.

[+] resonious|4 years ago|reply
I pretty much agree with this and I also think the reality is people don't care. The stakes are so low that there's no money in well-built software (most of the time). The average user isn't paying out of their pocket and wouldn't even if you offered a more stable experience in exchange for cash.
[+] slx26|4 years ago|reply
Agree and disagree. On the one hand, I do agree a lot with the idea that software is a moving target, and it's not enough to build, but you have to maintain it. The whole ecosystem is continually changing, needs are changing, best ways to do things are changing. Formats change, protocols change, etc. I believe acknowledging that is vital in being able to become an effective software developer (I mean, if you are working with a microcontroller or a very well defined and fixed environment, it's another story, of course).

But on the other hand, I don't want to accept that we will never achieve high reliability. I've heard and read multiple times that software goes through an expansion phase (an exploratory phase), until we eventually manage to gain enough insight and clarity as to be able to go back and simplify to something that we can wrap our head around again. Certainly, while developing, I often find myself in this situation. For many problems, you don't understand them until you have written the solution, come across new problems, and are able to rewrite again with much greater understanding... a clean solution.

Now, there are many very complex problems. Some solutions will never be clean. But we do definitely have light years of room for improvement. Networking is messy. OpenGL / Vulkan / low level graphic APIs are unwieldy and inelegant. Same for most databases and SQL. Audio APIs are not much more pleasant. Real time systems are mythical creatures for the average developer. Filesystems are not as homogeneous and standardized as we would like. Browsers are monsters. We can't even agree on what units use for UI elements. We don't really have much idea of how to handle screens of different sizes. Common file formats shouldn't even be a software concern most of the time. We talk so much about type systems but still define most protocol specs in plain text. Interoperability is mostly C or die. Unicode is many things besides a standard for symbols. Too many. Permissions and privacy are... well, maybe it's better to stop for today.

And that's indeed the reason why software must be a moving target. Because we can't accept these foundations to stay as our original, unerasable sin. But I do believe there is a real possibility to do better. One day devs will unite in a mystical place outside the realm of "pressure to deliver", we will agree on what are the main priorities and opportunities for improvement, and we will coordinate to start changing things and make our asylum a saner place. Coordination might be both the biggest technical and non-technical problem we will ever have to face, but I find no reason to believe we can't handle it.

[+] wruza|4 years ago|reply
That’s a good rant but it hits the wrong dudes. We haven’t stopped, we were overwhelmed.

Our industry is characterized by institutional recklessness and a callous lack of empathy for our users. It’s time for a come-to-jesus moment. This is our fault, and yes, dear reader, you are included in that statement. We are personally responsible for this disaster, and we must do our part to correct it.

No, the examples the author provides are the fault of stubborn people who continue to say that a winword-like text renderer with left-right-justify-float-border-margin is the right tool for applications. There is javascript in the button because otherwise there was javascript at the serverside, or even more javascript to reload the current state into the new tab. Controls have javascript in them because native behavior is so full of bullshit. And why it’s javascript? Because we’re given no choice.

I tried to bring up the topic of “web” “applications” so many times only to hear that I’m wrong and I don’t get it. While having around 15 years of desktop development experience and a couple of ui frameworks that I wrote myself. And clearly seeing the insanity of modern state of things and what they gave birth to. Back in the day I was able to hack together gta:sa vehicle editor or a simple accounting system in pygtk in an hour, now I don’t even know if I want to begin to map state to props and do other bs unrelated to the business itself.

This is what you must do. You must prioritize simplicity. You and I are not smart enough to be clever, so don’t try.

I stopped trying already, let this crap burn in flames and maybe I will still be not dead when the springtime comes.

Design your data, identify error classes

It’s hard to do through layers of what you need to be able build a decent app, but we try hard, really. The author comes from the web1 perspective and feels missing the simplicity of it, but bad news it was already a stupid way to make apps even back then.

[+] jpgvm|4 years ago|reply
Well to be fair I think Drew and I are cut from the same cloth. We write software tons of people use every day but almost none of it is for the web (at least that isn't the core interface generally).

The whole web2 and now web3 stuff mostly passed us by and now we are looking around trying to work out what went so awfully wrong with software development.

The reality is that guys like us still write highly reliable software, some of it even powers the backends of these awfully buggy frontends people rant about.

But you hit the nail on the head. We got outnumbered, overwhelmed and too tired to put up the good fight every day.

When your coworkers want to build the latest app on Node.js + AWS lambda + other hipster nonsense there isn't much you can do other than say "I told you so" when you are the one stuck migrating and cleaning it up 6-12 months later onto something actually reasonable.

You could say that we should do a better job at hiring but often times that is out of our hands, the company hires what is available and increasingly often that is green engineers that don't want to learn old school reliable stuff like Java/C#, they want to be using blog-post/talk-worthy tech stacks and they haven't yet gained the experience that teaches you why that is a poor idea.

So yeah, senior devs that care are out there but we are tired and we aren't able to fix the system from the inside sadly.

[+] blippage|4 years ago|reply
I heard an anecdote about when Unix was being written. They decided that they needed an editor, and figured it would take one person a week to do it.

Clearly, there was a whole different mindset then than there is now.

[+] kvgr|4 years ago|reply
Maybe the managers and product owners decided it is not a priority for this sprint as it does not fit this quartals OKRs?
[+] aloisdg|4 years ago|reply
Maybe we should not work for this kind of place.
[+] kryptiskt|4 years ago|reply
I was there back in the 90s and they cared even less back then. If anything, things have improved over time, I don't even know what the Windows 10 crash screen looks like.
[+] spacemanmatt|4 years ago|reply
THIS.

The quality of software I can churn out on a hungover weekend far exceeds what I could realistically produce in a professional week, using terrible 90s tools, even with my skills advances since then.

[+] ram_rar|4 years ago|reply
I dont agree with OP. In fact on the contrary, much of the backend software is becoming lot more reliable than it ever was before. Lately most of the back end dev is leveraging various cloud services like dynamodb, big table etc, which have north of 99.99% uptime. Back when I was at Yahoo!, reaching those SLIs was a monumental goal. But nowadays its expected.

Any microservice built leveraging these cloud services, is already far more reliable than building things from scratch. Obviously, there is lots of room to shoot oneself in the foot. But overall, I am far more optimistic of backend core services than what OP presents.

[+] invalidname|4 years ago|reply
I agree. The author seems to be hung up on the way things used to be but the market shifted completely.

Containers completely changed the market and made redundancy cheap. As a result you might have a fault but overall the system will be far more stable. Staged rollouts used to be reserved for top tier enterprises. They are now accessible to everyone.

His main complaint is about clients but I think his memory of the good old days is seriously lacking or tainted. Software used to suck and web based software has a huge boost in stability and reliability.

[+] calpaterson|4 years ago|reply
These managed services are of less benefit when the websites that are running on them, like the authors describe, are flaky and unreliable. Like the OP I open the js debugger often - including for relatives.
[+] TacticalCoder|4 years ago|reply
There have been quite a few major Internet disruptions in recent times and all those "lot more reliable" services proved to not be that reliable after all. We're talking about hours of downtime here. And all those who decided to depend on them were suddenly down, at the mercy of their overlords.

It may get better over time but so far it hasn't exactly been a panacea.

[+] bdavis__|4 years ago|reply
reliability is giving the right answer. availability is being ready to give the right answer.

cloud services have poor reliability and good availability.

[+] i_have_an_idea|4 years ago|reply
I'd love to work on reliability. Unfortunately, my TODO list contains months of new features that need to be shipped, asap. My clients push aggressively to get the new stuff done, reliability almost never something they're interested in, as long as it is not embarrassingly obviously broken feature.

I do what I can, given the constraints imposed on me (since I also hate buggy software), but it's very tough when it's not a key customer priority.

[+] xiphias2|4 years ago|reply
,,Finally, stop letting economics decide everything you do.''

Money is one of the best inventions of humans, and a great organizer in the world. I would say a better organizer than the writer of this article. It's not perfect (especially nowdays with 0% interest rates), but still a good guidance on what value the customer wants.

[+] aloisdg|4 years ago|reply
What about ads? "still a good guidance on what value the customer wants" or is it just blatant manipulation to coerce the customer into buying what you want?
[+] krisrm|4 years ago|reply
While I agree with much of this, it also reeks of confirmation bias. It's very difficult to notice when software is reliable, but poor reliability can often (but not always) be quite apparent.
[+] zokier|4 years ago|reply
I feel this is taking extremely rose-tinted view of past. I don't think software was ever particularly reliable.
[+] RantyDave|4 years ago|reply
Some of it was. My first three computers never crashed at all.

But then hard drives were invented, and suddenly computers could hold several orders of magnitude more mistakes. Demands on feature set grew. We started expecting to run more than one program at once and have them play nice with each other. Then we connected them to the Internet and a whole new world of mistakes became possible - and this is where we are.

But stuff like ... Borland TurboPascal. Was rock solid and fast.

[+] codebolt|4 years ago|reply
Yeah, that was exactly what I was thinking. Getting PC games to run in the 90s was an engineering discipline in and of itself.
[+] DaiPlusPlus|4 years ago|reply
> I don't think software was ever particularly reliable

Agreed - but software was also markedly less user-hostile and performance-wasteful: though that was due to limitations of the time: e.g. you can't spy on users or require phone-home DRM without a good internet connection (not that DRM dongles were much better).

If there's one good thing that might come from the trend towards computers locking users out of kernel-mode it's that games and media systems will stop being able to install their own rootkits.

[+] ChrisMarshallNY|4 years ago|reply
Ah. A good old-fashioned Jerry Maguire rant.

While I totally feel for the author, as someone else commented, money talks. People pay for dross, and companies will build their infrastructure around creating and shipping dross.

For myself, I am totally focused on writing and shipping ultra-high-Quality software. It’s difficult, frustrating, time-consuming, and very, very rewarding (but mostly on an emotional level).

I work for free. Honestly, I can’t see commercial entities paying the very substantial premium required to ship software at the Quality bar that I set. It’s orders of magnitude costlier than junk-grade software. Just yesterday, I spent almost all day, “tuning” a popover presentation controller to manage and present flexible content within a 1-pixel tolerance. It required that I rip out almost all my previous work, and realign my fundamental design.

I can think of no corporate entity that would tolerate that kind of anal-retentive level of detail. I would have been forced to ship the crap I had working the day before. People really, really like using my software, but they also don’t pay for it.

Calls for licensing are quite reasonable for things like medical devices, transportation, educational, infrastructure, children’s software, etc., but would be ridiculous for a significant swath of SV’s products.

As someone who has worked in a highly-structured, Quality-focused environment, I can report that the structure can easily crush innovation and development velocity. I call it “concrete galoshes”[0].

I feel that the industry is currently run by folks that use money as the only measure of success, and they aren’t actually mistaken. Money brings power and influence. It drives culture. The people who make the money, make the rules (the classic “golden rule”).

Until a reward structure (as opposed to a regulatory structure) is in place to encourage (as opposed to enforce) high Quality, I suspect we won’t be seeing significant improvement.

[0] https://littlegreenviper.com/miscellany/concrete-galoshes/

[+] systemvoltage|4 years ago|reply
This can be said about pretty much any field these days. Graphic design is shit, product design sucks, home interiors are all the same with chinese made single purpose kitchen tools, unrepairable gadgets surround you, iot apps last updated in 2016, complete assault on user-centric design, broken touchscreen displays in cars and feature creep driven software that will eventually become unusable. No one cares. Everything is fine. Nod along, pour a glass and sip some plastic bottle moonshine.
[+] ngc248|4 years ago|reply
"Market efficiencies" have led to a drop in quality. All items/services have become "consumable".
[+] timdaub|4 years ago|reply
I completely agree with this post and I also feel burned out by ~~assholes~~ making economical decisions to build software. I've decided to fight it by writing.

My motivation was: Can I flip the engineering mind towards understanding that well-built software can be a reasonable economic bet too?

My article is called "On Technical Yield (and Technical Debt)": https://timdaub.github.io/2021/09/06/technical-debt-and-yiel...

I've also been annoyed about scope creep as it's been my manager colleagues favorite micromanagement argument. Scope creep can be a great thing: https://timdaub.github.io/2021/06/18/when-scope-blows-up/

Let's just do things differently.

[+] krisoft|4 years ago|reply
The one concrete example he uses is a broken form to pay for electricity.

I’m pretty sure that the form works for most people most of the time. Why? Because otherwise the electricity utility in question would be hemorhaging money. If it would be completely borked the beancounters would be kicking the developers door in in no time.

I cannot say if this is what is happening but I have seen this many times where an expert user gets hung up on some process because they choose options only an expert user would even know about, while masses of regular people passes through uneventfully.

[+] mmarq|4 years ago|reply
Sometimes all it takes is an adblocker that breaks some tracking functionality that interacts with the payment form. An event handler throws, and the whole thing falls apart.

Sometimes it’s a bug, sometimes it happens by chance and the PO decides to keep it because it forces users to be tracked.

[+] yoit|4 years ago|reply
Longtime lurker, I just created an account to respond to this thread.

Why not consider how much software has improved our life collectively and only find faults in it.

Developer is someone who knows how to instruct machines. They are not the owner of the product. So it might not their decision to disable right click.

The era that author talks about when everything was achievable with "html form", we should also consider the number of people who had access to the internet at that time. Also most of the people were technologist or early adopters. The usecases have changed in last "28 years" that "HTML forms" are not enough. The problem they solved in 1993 was good for that time but it does not always fit for the current standards and requirements.

There are 1.7 billion websites[1]. If we apply "Six Sigma" to it, still there would be 5780 websites would be defected. what are the chances that author might be browsing only those sites /s

I'd like to close with this - we have made huge advancements as a humankind and many of them seems impossible imagine without software developers. I'm grateful for all the technologist before me for their contributions even if its just "JAVASCRIPT-POWERED CUSTOM TEXTBOX WIDGET".

[+] rtall10|4 years ago|reply
That is a consequence of corporations taking over "open" source projects. They hire too many people, many of them clueless. Because of "equity" no criticism is allowed and all "contributions" are merged.

We have a gigantic unstable churnfest with the most ridiculous unreliable half baked features getting you a promotion. Bonus points if you "grow the team"!

[+] l0b0|4 years ago|reply

  > Grab a notepad and make a note every time you encounter some software bug in production (be it yours or someone else’s), or need to rely on your knowledge as a computer expert to get a non-expert system to work. Email me your list in a week.
Sorry, writing that list would take longer than I spend actually using software. Evolution telling me I'm "offline" because the DNS resolution isn't working because of some resolv.conf DNS extension gobbledygook. Community members seriously suggesting to blow away configuration which took a long time to establish in order to maybe fix some completely broken desktop application. CLI tools doing their own hard line splitting because they don't understand how terminals work. Web sites broken in any browser other than Chrome (I could have sworn the IE6 days were over). I can find about 10x more bugs than I have energy to even report, considering how completely broken most bug reporting processes are.
[+] eterevsky|4 years ago|reply
I think different categories of software require different mentality. It's ok for a user-facing innovative software to be built in the "move fast and break things" way. On the other hand, it is totally inappropriate for a filesystem or almost any firmware.

You need to strike the right balance for your use case between reliability and being able to deliver new things.

[+] rasz|4 years ago|reply
Windows 10 Settings menu has no error checking. If you disable "critical" service like StateRepository you wont be able to start Settings. It will just crash!

    Faulting application name: SystemSettings.exe, version: 10.0.19041.789, time stamp: 0x4aa1ce82
    Faulting module name: KERNELBASE.dll, version: 10.0.19041.906, time stamp: 0x2f2f77bf
    Exception code: 0xc000027b
    Fault offset: 0x000000000010b2dc

If you disable something more subtle you can even get Settings to crash with stack buffer overflow :o and it wont be logged as an Error! Just a friendly Application Popup information Event in the System log:

    Application popup: SystemSettings.exe - System Error : The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.
[+] TacticalCoder|4 years ago|reply
> Every morning, I boot, then immediately hard-reboot, my workstation, because it seems to jigger my monitors into waking up properly to do their job.

I know Drew reads thread here (and I already sent him emails in the past) but... I've had similar issues over the years and typically these can be solved using several tricks: for example not using NVidia GPUs (and hence not using proprietary drivers).

Or configuring the system once to boot into text mode, then running "startx" from text mode.

Or booting into graphical mode, but then using a shortcut to switch to console mode, then back to graphical mode.

Similar shenanigans may happen too with the various types of sleep/powersaving modes. But I think they're all solvable or, at least, there are always workaround (not that Drew's "boot twice" workaround is that bad seen how fast Linux boots nowadays).

[+] native_samples|4 years ago|reply
Or just not using Linux. He seems to be generalizing from "my setup has unreliable power management" to "software is unreliable". No. Linux PM is unreliable. Macs never fail to wake from sleep.