top | item 34907970

Ask HN: What would happen if we prioritised all bugs over all new features?

42 points| tagspace | 3 years ago | reply

At Trevor.io we recently released some fundamental changes to our platform, which, unsurprisingly, came with a handful of bugs. This triggered a debate among the team: which bugs do we fix now? Which do we fix later? And when is later? If we don't fix them now, will we realistically ever fix them?

This led us to an interesting question: what if we just split all bugs into "will fix" and "won't fix", and then prioritise every "will fix" above all new features....always. In other words: we commit to only ever adding new features when we're bug free.

Has anybody tried this? Can it work?

74 comments

order
[+] throwawaythekey|3 years ago|reply
This is known as a zero bug policy [1]. I've had decent success implementing it with my team. The main advantages are:

- Prioritizing is hard, so avoid wasting brain cycles deciding how important your bugs are

- It encourages all of your team to get things right the first time, because if they don't they know they will be going back to fix it immediately.

- If you want to create a culture of quality then it's an obvious first step.

- It saves time in the long term by addressing problems when you have the most context and by avoiding building hack upon hack upon hack

In extreme cases you can relax the policy, but be aware that if you don't quickly correct course then things will be permanently worse. Also accept that hard external deadlines are not suited to this approach, but using triage some of that can be mitigated.

[1] https://sookocheff.com/post/process/zero-bug-policy/

[+] oblio|3 years ago|reply
https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s...

Article from back in 2000, based on info from ~1990.

> 5. Do you fix bugs before writing new code?

> To correct the problem, Microsoft universally adopted something called a “zero defects methodology”. Many of the programmers in the company giggled, since it sounded like management thought they could reduce the bug count by executive fiat. Actually, “zero defects” meant that at any given time, the highest priority is to eliminate bugs before writing any new code.

Please read the entire article, it's worth it ;-)

[+] wink|3 years ago|reply
One slightly different approach I've seen at some more old school industry customer: A bug with a semi-doable workaround is usually not urgent. This is less a problem with a more continuous deployment, but if your release cycles are measured in weeks or low-digit months, it's just often not feasible to roll back or hotfix, if the (even breaking) bug can be worked around somehow.

That made me rethink my usual "prioritize urgent bugfixes over features" stance, but it works better if you have a lower number of (known) users who communicate, over a mass audience.

[+] moconnor|3 years ago|reply
We did releases that were only bug fixes and maintenance. There can be a long tail of bugs in a complex system, most of which are relatively harmless and only occur in very rare situations; it’s hard to justify fixing these when paying customers are asking for a new feature.

Also you’d be surprised at what gets classified as a bug when you apply this.

Ultimately though, everybody does this. Critical bugs get fixed first. Anything too far down the backlog de-facto doesn’t get fixed. I like the intellectual honesty that this approach brings, in that it forces you to set a bar for bugs not worth fixing and consequently marking them as won’t fix.

Final note: whether a bug is worth fixing changes over time. Maybe your best engineer can’t find it after two weeks. Maybe your biggest customer just ran into it. Maybe you can’t reproduce it. Maybe the platform causing it got acquired by Google.

Can it work? Yes, but it doesn’t look very different to what you’re probably already doing in practice.

[+] orwin|3 years ago|reply
I think you can have alternative: bugs are either critical or "non-critical". Non-critical end up in the backlog, but are classed before improvements and before "non-critical" features.

How does it really work: let's say you have 2 week sprints (quite standard). Then you break the agile/scrum "story points" and velocity: it isn't usefull for your team (but might be for management). After quick estimates, each member take what he estimate to be a week, a week an a day of work on critical bugs, then critical features. Because let's be honest, you alway have more "critical" stuff to do each week, and will never get your backlog small enough to reach non-critical stuff.

Once you're done with your "critical task", depending on the time you have left, you take a non-critical stuff, in preference in backlog order if you're senior and long-time IC (lot of bugfixing), preference to "i'm sure i know how to do that" then "unknow project but seems easy enough" for juniors, and seniors who just arrived.

Having a gigantic backlog isn't an issue as long as each task are assigned to a product and a Major version: that will allows you to discard those tasks easily if the product isn't sold anymore of if the version changed.

[+] ozim|3 years ago|reply
It is also useful to have handful of "easy to fix" bugs on backlog to use as on-boarding for new developers .
[+] tagspace|3 years ago|reply
Your point on intellectual honesty really resonates.

I personally like the idea that for every bug you either: a) fix it with highest priority, or b) mark it as "won't fix".

I think this would really force you to make a decision on a bug, rather than adding it to some never ending list of lists.

If a bug is worth fixing, it will come up again

[+] rrwo|3 years ago|reply
Sometimes a feature request is considered a bug by customers: this product would be easier to use if we could do X, and the competing product lets us to that.
[+] MrPatan|3 years ago|reply
- New features help close sales.

- New features also add tech debt, complexity, cost, bugs, makes you slower, and will lose you some sales that way.

- At 0 features you really need to implement features instead of messing about with the bugs in the devops scripts.

- At e.g. Windows scale, maybe stop messing about with new features nobody asked for and fix some bugs, yes.

- In a Laffer-curve-like effect, there must be a point where it peaks and it's better to fix bugs than to implement a new feature.

- It's a very difficult to identify where you are in the curve.

- One of the measures is simple "do X, get money from this guy I have on the phone"

- The other measure is fuzzy, lags, is subjective, can't be traced to a particular feature.

Good luck!

[+] tagspace|3 years ago|reply
I like the curve idea. Makes sense.

When customers aren't signing up because of lacking feature -> build features. When customers are churning because of bugs -> fix bugs. Else -> somewhere in the middle

[+] Yizahi|3 years ago|reply
On a sufficiently complex product it is impossible to fix all known bugs, even classified as will fix. Another issue - it will take so much time that your company will be left behind by the competitors. Maybe less so in the web, but with hardware it's a question of multiyear contracts, so getting abandoned by a big customer in favor of a feature rich competitor may mean that you won't get a second chance with them any time soon. Of course a reverse happens too - if your market leading product is more and more buggy over time, they may abandon you or force to fix most bugs, or force to do significant structural changes etc.
[+] ThomasRedstone|3 years ago|reply
As systems get more complex slowing down and getting bugs fixed becomes more and more important. If you don't fix them fairly quickly you end up with code elsewhere (either consumers of your APIs, or within your own application) adapting to your bug, leaving you with new bugs when you get around to fixing it!
[+] tagspace|3 years ago|reply
> On a sufficiently complex product it is impossible to fix all known bugs

You really think so?

Surely it's just a matter of picking a sufficiently high bar for "will fix" and then focusing some time on it.

[+] bradwood|3 years ago|reply
Ask yourself: given the system in its current state what change will make my customers most happy? Is it fixing this bug, or is it shipping this new feature?

Do the thing that will produce the most customer happiness in the shortest amount of time next.

[+] tagspace|3 years ago|reply
This is a nice way to put it. Our platform is already pretty mature, and customers are happy.

Naturally, the wishlist of new features never ceases to shrink ... but the stability of the existing platform is what people really appreciate.

[+] erlich|3 years ago|reply
Morale decline from oscillating priorities.

You are bug-free, start working on new feature, new bug reports come in, and you have to pause and work on them.

Fixing bugs is not fun work because there is usually a quick fix in an ugly way, and then a perfect fix via a large refactor and re-architecture. This results in that "soul-destorying" feeling of: if I had enough time I could fix this properly in the right way with clean code and avoid huge amounts of bugs, but alas I am just piling on tech debt.

[+] kqr|3 years ago|reply
My main concern with the idea is that what counts as a "bug" is sometimes clear, but often subjective. A missing feature can easily masquerade as a bug in some organisations, and the policy you suggest would encourage dressing up missing features as bugs, leading to suboptimal information flows and feedback in the organisation.

Effectively, "I think this is important so I will argue you should work on this before that bug" is much better than "I think this is important so I will argue this is a bug."

[+] edelans|3 years ago|reply
I have seen this problem solved with a super simple bug classification that anyone can leverage (even customer success) associated with priority and time to fix expectations.

YMMV, you have to adapt it to your usecase (B2B / B2C? contractual SLA? ...). But it can be something around:

  - P0 : a bug prevents a significant part of the customers (= paying users) to perform one of the core functionality of the product : at least one dev drops what he is doing right now and investigate, fix it himself or send it to someone responsible for the bug who should drop what he is doing right now and fix it. Target is to have it fixed in under a few hours. 
  - P1 : a bug prevents a few customers to perform an important yet non core functionality of the product: next dev available have a look at it. Target is to have it fixed in under a few days. 
  - P2 : a bug customers can live with (concerns few customers / there is a workaround / ... ) -> fixed in best effort, in practice, we fixed them when doing other features near that code. It can take a lot of time to fix them (if we ever fix them, and it's ok. The good thing about tech debt is that it's a debt you don't have to pay, when you remove/replace a feature for instance).
[+] tagspace|3 years ago|reply
This is actually exactly what we had in mind. Except that P2 gets put in a "won't fix" bucket.

- P0 means drop what you're doing and fix it now

- P1 means fix after you've finished what you're doing

- P2 means "won't fix" (but keep a note of it in case we ever get to that perfect situation where we have more time than features to build ;))

[+] naet|3 years ago|reply
If you're driving users away with some major bugs you better try to fix them ASAP. But chasing "100%" is usually a bad business strategy IMO. When you have an absolutist policy like that you risk losing a lot of momentum over something small that isn't driving much value, just to satisfy the arbitrary policy.

Is it worth not making progress on any new features, all because of a smaller bug or issue? Can just one person work on the small bug while the rest of the team starts a new feature?

Sometimes the best bug fix is a new feature that depreciates the bug, so be sure to consider the estimated lifetime of the bug if you keep progressing your platform and maybe don't spend too much time fixing things that will be soon phased out anyways, unless they're really major issues that are rapidly hurting the business or reputation and need immediate fixing.

If you really want to halt all new features, I might try putting it on a calendar. Maybe you can afford 1 month, 1 quarter, or half a year on just bug fixing but eventually you have to keep moving forward in some way. Unless your platform is already pretty feature complete (which it doesn't sound like it is) you might do more harm than good when delaying your next core feature releases.

[+] lll-o-lll|3 years ago|reply
Yes you can do it, yes it can work. However as with all things there is nuance. This is a scheduling problem, and just like with a cpu scheduler there are degenerate cases you may need to avoid/handle.

Two problem cases likely to pop up.

1) Lots of fixing when a bigger refactor is required. A poorly written area of code, or a poor design, may be causing high churn and wasted effort. The solution I’ve found to this problem is to track defects by code area and review once metric exceeds heuristic.

2) A team choked on defects only for a long period of time. This obviously has many negative side effects. It tends to happen in really important component and require most experienced developers. Any new starters run for the hills when a team gets into this state, therefore compounding the issues. The solution to this (though this is just my opinion), is to never allow 100% of time over [fixed interval] to be spent on defects. No more than 50%. The bugs will still get fixed, just take longer, and new development is still happening.

Overall though, I think a “defects first” approach is the right one, just have a plan for these negative cases.

[+] tagspace|3 years ago|reply
Really good point about bug fixing affecting engineer morale. Super important.

One (arguably positive) side-effect I'm wondering might be possible is that: if bugs are always prioritised first .... and engineers are often very creative at solving problems .... will they perhaps come up with creative ways to reduce bugs in the first place?

Or, it might go all wrong -> and we create a dangerous culture of "swallow that exception" :D

[+] martin-adams|3 years ago|reply
Assuming these are bugs which have workarounds and don't cause things like data corruption. Then it probably falls into the adage from advertising "50% of all advertising is a waste, you just don't know which 50%".

Another way to look at it is the delayed effect of doing nothing in either area. Bugs creeping in over the months and years may only become a problem when a competitor starts to be noticeably more stable.

Features that are delayed may have a delayed effect of a competitor getting ahead of you in the market and launching months before you'd be ready.

So I would say, "it depends". If you're in a growth market and are trying to capture market share, features might be best before non-critical bugs.

If you're in a stable market serving a huge amount of people, then fixing bugs has a much larger impact on your users.

You also have to consider the team and their morale over time. Too much churning through low value bugs can be demoralising where individuals might need some type of higher level thinking and creativity.

If it were me, I'd look at bugfix only sprints and adjust the frequency based on the above factors.

[+] agilob|3 years ago|reply
I think MacOS did it once (around 2009-2011) where they made a release that contained pretty much only bug-fixes. Made a lot of users happy.
[+] pkrotich|3 years ago|reply
It would really depend on the severity of the bugs... assign severity levels as part of triaging.

Also what kind of bugs are coming up is important... I think bug do tell stories; it might help you identify issues with feature assumptions on workdlow or use-cases and that needs to inform your product development as a feedback loop to avoid technical debt and having to rewrite stuff later.

Depending on the size of your team, I would have a 2-4 people focusing on just bugs / QA (so you can catch most bugs going forward) while the rest of the team focuses on new features.

[+] Manjuuu|3 years ago|reply
Once you start factoring in reputational damage it becomes easier to choose what/if to fix. If you add new features but the resulting product is something that gives the idea that the platform is unstable or broken you'll end up with people switching to some other provider.

If the product has actual customers it makes ALWAYS sense to prioritize fixes along the hot paths. And it you should be easy even for non-technical people to understand. You lose customers.

A feature is not done until all major bugs or regressions are fixed.

[+] habibur|3 years ago|reply
Adding new features will introduce new bugs in your already "bug fixed" parts.

Therefore go for release cycles. Lock features, fix all bugs, then release the version. Repeat for every cycle.

You might be forced to release with a few bugs. The conventional procedure is to publicly document the known bugs for each version.

If you are fighting fire with bugs, then you don't have stable product yet. Cut off up to a reasonable feature set and fix all bugs when you want to release.

[+] AA6YQ|3 years ago|reply
I develop and support DXLab, a suite of interoperating, free (but not open-source) applications for the worldwide amateur radio community; these applications continuously populate databases with realtime information, and interact with many other applications and physical devices (radios, antenna rotators) via serial ports, DDE, UDP, and TCP links. Since the first public release 22 years ago, my policy has been "all reported defects are corrected within 24 hours". Interaction with the user community is direct - via an online group. I typically make public releases bearing new functionality 2-3 times per month.

This policy's results have been excellent: users are focused on learning to better exploit the applications and suggesting new functionality rather than complaining about long-deferred defect repairs; even minor, easily worked-around defects create a negative user community mindset that can snowball. The absence of defects increases user confidence, and reduces user-perceived complexity.

[+] EdgarVerona|3 years ago|reply
Whether this would work depends on the kind of system and the deadlines imposed by external needs of the company.

Does your system have a lot of (intentional or unintentional) emergent behavior, like a sandbox-heavy video game? You could end up never making a feature again.

Do your customers expect the frequent shipping of new features? Unless you can sell them on the idea of unexpected or infrequent addition of features, you could quickly lose your core audience.

However, if it is a product that no one is expecting tight deadlines on the release of new features, or the product's purpose is straightforward enough that there is no intentional emergent behavior in the system, then I could see someone running it in this way. I don't mean this in a cheeky way: there are definitely products that fit this criteria. It just won't be something that every product can reasonably do while also expecting to retain their user base.

[+] wkoszek|3 years ago|reply
Your role as a company is to maximize revenue. If presence of bugs makes you lose revenue by customer churning or burned-out developers leaving, fix bugs. If not, create new features. And define a bug as "thing that the customer complained about". If customers don't complain, it's either not a bug, or you are solving a wrong problem.
[+] ratorx|3 years ago|reply
That seems short-sighted. Just because a customer hasn’t complained about it doesn’t mean it isn’t a bug.

I’m not disagreeing that customer complaints are important for prioritisation, but the problem isn’t quite as first-order as that.

In an extreme case, imagine you found a bug that corrupts data if a customer name begins with z. Would you not fix this bug just because you don’t have any customers whose name begins with z?

[+] jonathanstrange|3 years ago|reply
That's a bit oversimplified. Surely a bug that could lead to data loss needs to be fixed ASAP regardless of whether a customer complains about it or not.
[+] tluyben2|3 years ago|reply
Many bugs don’t cause customer churn short term, but it does long term. And that is difficult to measure. They might not tell you for all kinds of reasons.
[+] mempko|3 years ago|reply
It's often faster and cheaper to fix bugs right away. It often takes longer to put the bug in a bug tracker and prioritize it than to fix it.

I have a simple rule that has worked to keep bugs at zero. If you find a bug, drop what you are doing and fix it. If it takes more than 4 hours, only then, put it in your backlog and prioritize it like everything else. Don't keep a separate bug list! Bugs should be in your backlog if they are serious enough (more than 4 hours to fix).

99% of bugs take less than 4 hours to fix.

Bugs slow you down. Keep them at zero! You will be able to finish everything faster!

[+] veidr|3 years ago|reply
Microsoft rather famously tried this[1]. And it seemed to really help them.

I've worked places that have tried it, too. It does work; in my experience, it literally always improves software quality. But I think it is like how pretty much all diets work, at first, but not over the long term.

It becomes unsustainable, as the pressure to work on features grows.

Continuing with my diet analogy, I now think of it like bulking and cutting phases. When you are bulking up your product with new muscles (features) you are also adding fat (bugs). At some point the percentage starts growing unhealthy, slowing you down and causing all sorts of ancillary problems.

Might be time for a cut phase then.

My analogy breaks down (because it's not really a good analogy) with a bigger team. Then you can have people assigned to bug cleanup. But this leads to its own set of problems — do you pay bug cleanup engineers a bunch extra? Because they will tend to be reading Who's Hiring top to bottom after a few weeks of that.

I think this does work great, though, when it is for a pre-determined fixed duration of time. (e.g. "3 iterations" or "one quarter").

[1]: I think where I read about this is now behind a paywall, but this seems to be the same initiative: https://sriramk.com/memos/zerodef.pdf

[+] 2rsf|3 years ago|reply
I worked at Microsoft later than that and my team (most of the times there is no "Microsoft did X", it is a team, department or product decision) tried a variation on Zero Bugs policy. We either fixed immediately, closed as won't fix (and here there's the question of correlating future related bugs with different symptoms) or turn into a new feature request of the fix requires bigger changes but is still needed. It worked for a while, but as others said it is hard to maintain this policy over time for a complex product.
[+] RugnirViking|3 years ago|reply
I like that idea of a cycle, going between a feature-adding phase and a bug-fixing phase. I imagine having experience in both and knowing the other is coming up shortly will also improve people's forward planning in terms of code structure
[+] tagspace|3 years ago|reply
Yeah - that's a super interesing point about bug fixing leading to engineers wanting to quit.

Have you found a balance that works?