When feature flags do and don’t make sense (2019)

[+] withinrafael|5 years ago|reply

I found this post to be a great complement to my hobby of collecting/documenting all the feature flags used in the Windows operating system for the past three years. [1] It's proven to be a reliable source of what's to come in future builds of Windows, sometimes to the chagrin of teams unfamiliar with the public facing artifacts generated by their experimentation <grin>.

Windows (what ships inbox, that is), as of last week (Sep 14) [2], has roughly 2500 feature flags. Some are permanently jammed into the on position, some off position, and the rest are configurable by its experimentation frameworks and hackers. (Apps are a separate beast, have their own experimentation tooling, their own flags, etc.)

I don't understand why the jammed-on features still exist in the OS. I'd imagine there will (eventually?) be a measurable impact to leaving all this trash behind. I suspect, like the author noted, it's non-zero risk work that no one wants to complete.

[1] https://github.com/riverar/mach2/tree/master/features

[2] https://github.com/riverar/mach2/blob/master/features/20215....

[+] aflag|5 years ago|reply

I loved the testimonials section. What do you do with that? Do you keep turning them on and off to see how they work? How did you originally find out about how you can extract and set/under feature flags in Windows?

[+] nickm12|5 years ago|reply

Solid article. I've worked with complex feature flags / gating systems for years at multiple large internet companies and have largely come to the same conclusions.

That feature flags let one version of the software run in a combinatorial number of modes is both their superpower and kryptonite. Use them wisely and clean them up as soon as possible.

One problem I've seen happen over and over again is when people pile feature flags on top of feature flags. With the systems I've used, all the flags can be independently enabled by the gating service, but that doesn't mean that every combination of flags is a valid state for the program to run in. If your system will break if flag A is active while flag B is not, then it's worth the effort to write an abstraction that checks both flags and fails to a valid state.

[+] silvestrov|5 years ago|reply

> combinatorial number

My experience is that the marketing department think of the complexity as 1 + 1 + 1 + 1 + 1 + 1 + 1 = 7 versions, when it really is 2^7 = 128.

[+] yojo|5 years ago|reply

In many cases you are better off using a kill switch than a feature flag. This may seem pedantic, but the way your system fails (on vs off) can protect you from disaster when your flag setting framework has a bug.

On a large codebase it is easy to forget to clean these things up, and a flag that hasn’t been set to off in a year can be masking a major regression. At my last job we had two major outages in as many years from defunct flags defaulting to “off” when the feature flag system failed to return flag states.

Failing to “on” is a simple design choice to protect you from your tech debt. There are more expensive better fixes (e.g. automated enforcement of removing flags from codebase), but none as easy to implement.

[+] gravypod|5 years ago|reply

Another thing about tech debt to watch out for is reusing previous flags left before your time [0].

[0] - https://en.wikipedia.org/wiki/Knight_Capital_Group#2012_stoc...

[+] kqr|5 years ago|reply

I'm not sure I agree completely with this. Defaulting to "on" and only rarely going to "off" would, at least in the teams I've worked, result in the "off" path being much less tested and more likely to contain regressions.

If, then, the "off" state is meant as an emergency oops-this-new-code-didnt-work-well-at-all-lets-revert-to-a-safe-state then having it be the less tested state seems like a problem waiting to happen.

In other words, by defaulting to "on", you get two states that are likely to contain bugs: the on state, because it contains new code, and the off state, because it's the less rigorously exercised configuration.

The nice thing about defaulting to off is that in that case, the off state will contain both old, known code, and be rigorously exercised. The on state will be the brittle one, but it would be anyway on account of running the new code.

I guess we can both agree defaults may seem pedantic but matter a lot.

[+] buster|5 years ago|reply

Usually you remove flags after they heva ebeen on for some time. I find in a "every few weeks" release cycle it works well to remove the flags of the last release (which by then have been live for a few weeks).

[+] Raed667|5 years ago|reply

Feature flags are awesome and beyond the obvious AB testing, kill-switch, they allow us to merge code constantly and toggling them on when it is ready (as the author suggests).

However, don't forget to do some house-cleaning from time to time. Experiments end (successfully or not), features get permanently rolled-out or killed, and that code will become dept very fast unless you clean your flags and all the code related to them regularly.

[+] mcv|5 years ago|reply

I've seen a lot of people mention the advantage of feature flags over feature branches. I've never worked with feature flags on any reasonable scale[0], and I can't help but wonder how they can possibly work while keeping your code well-organised.

A lot of features touch a lot of different parts of the code. Some features require parts of the code to be refactored, or it will turn into a mess of warty exceptions and if-statements all over the place, which to me sound like it would make your code hard to read, hard to maintain, and hard to test.

So how does this work? Are there good frameworks that abstract this mess a way in some easily readable and maintainable manner? Do you postpone refactoring the code until after a successful launch? But then how do you launch the refactored code? No matter how I think about feature flags, I can't help but conclude that it would turn your code into a hard-to-maintain mess.

So how would this work? How do you keep your code clean? I'm sure someone here can explain this or point me towards a good explanation of how to actually implement feature flags the correct way.

[0] In my current project we've got a login system that needs to behave differently in different environments, controlled by environment variables, and it's by far the ugliest part of our code. I have no idea what's going on there, which makes it very hard to address bugs.

[+] wikibob|5 years ago|reply

See: BranchByAbstraction.com

And: TrunkBasedDevelopment.com

Essentially you go up in the code path until you find a single place where you can introduce an abstraction, and make that the single toggle.

It works brilliantly. It is more overall effort. But it keeps the velocity higher as well because you no longer have head of line blocking.

[+] Cthulhu_|5 years ago|reply

You don't need to apply the flags to every part of the application where the feature is implemented, mainly in the user-facing code (so the UI and (exposed, documented) API). The rest can just be left alone and developed (in trunk/master). The main thing is to not expose a feature until it's done, which will actually allow you to do all the implementation, refactorings, testing, etc.

[+] kissgyorgy|5 years ago|reply

There are multiple ways you can do this. Yes, a lot of if statements are messy.

What we did and worked very well for use is to develop whole modules, until all the features was ready, and only at the and we "wired in" the module, which meant a "application.register(new_module)" call or something like that. This need a very well organized and modular code base.

[+] kqr|5 years ago|reply

I'm going to respond point-by-point, and you'll tell me if that helps at all.

> I can't help but wonder how they can possibly work while keeping your code well-organised.

They don't keep your code well-organised. They are an intentional complexity debt you pay for their other benefits.

> A lot of features touch a lot of different parts of the code.

This is a sign of high coupling and/or low cohesion. It's a code smell. The ideal is to first refactor these features and make them loosely coupled and highly cohesive. Then they will have only one or very few connection points with the rest of the code.

Feature flags are really hard to use in bad code (and in my experience, most code is bad until the first feature flag that touches that part of the code.) So step one when feature flagging is to refactor and improve the code to the point where there's a logical place to switch between the behaviours.

> Some features require parts of the code to be refactored

Yes. You don't need feature flags to refactor. Refactoring does not change any external behaviour, so you can safely do that guided by your compiler and test suite.

> Are there good frameworks that abstract this mess a way in some easily readable and maintainable manner?

There are frameworks that help with this, but they aren't really necessary. In essence, a feature flag is a single if/else branch somewhere, that plugs in either this feature or that feature (or no feature at all.)

> Do you postpone refactoring the code until after a successful launch?

Opposite! You frequently have to do the refactoring first, because the code was not written to be extendable/subsettable – which is a requirement for feature flagging to be really useful.

(In a way, feature flags are like tests in that both force you to write better code that is more loosely coupled and more cohesive.)

> But then how do you launch the refactored code?

I'm probably sounding like a broken record now, but if it's pure refactoring, just ship it. (Assuming it passes the build and tests.)

> No matter how I think about feature flags, I can't help but conclude that it would turn your code into a hard-to-maintain mess.

Feature flags are meant to be temporary, so we excuse whatever additional maintenance cost they come with. I'm more worried about maintenance a year down the line than I am for the next few weeks or months.

If you're using feature flags, you sort of have to first refactor your codebase to a better state where the feature you want to toggle is extendable/subsettable, and a year down the line when the feature flag is removed – you're still getting the maintenance improvement from the refactoring you had to do.

So in that sense, feature flags lead to even better code in the long run, because you can't "cheat" and just swap some code out for different code. You have to actually make the code properly designed and architected first.

[+] pondidum|5 years ago|reply

> Death by Flags

This is something I have spoken[1] (slides[2]) and written about quite a lot - and my solution has usually been to add a monitoring system to any feature flag system used.

If a flag hasn't changed state in a period of time, or hasn't been queried in a period of time, then an issue is filed against the relevant repositories. The time period is different based on teams and services themselves, and there are also exclusions for flags which should be kept around.

I'd like to improve the system to do something like an automatic PR to remove a flag, but at this point, it seems more effort than it's worth.

[1]: https://www.youtube.com/watch?v=LZgQBSr36p8

[2]: https://andydote.co.uk/presentations/index.html?feature-togg...

[+] wikibob|5 years ago|reply

A built in feature on LaunchDarkly.com

They are extremely awesome for the price.

[+] antris|5 years ago|reply

> Sure, and we should also not allow our tech debt to accumulate and we should follow every single best-practice religiously. Unfortunately, this never happens in any corporate environment. Even in great teams, tech debt often gets de-prioritized in the face of new requests. Newcomers to the team or those on their way out, aren’t always disciplined enough to clean up their flags after a successful rollout. And sometimes, these tasks simply slip through the cracks and get forgotten.

Yeah, you can make the same argument about any practice. If the team doesn't care, you cannot fix anything by doing or not doing any practice. A good team can suffer from a bad practice, but no practice can overcome a bad team.

Clean up your feature flags and definitely don't use a framework for it unless you are doing continuous A/B testing.

[+] Darkstryder|5 years ago|reply

> the recommendation at places like Google is to rollback first and investigate the problem later

I always have the same question regarding this advice: how are you supposed to handle data corruption?

When a buggy new version creates lasting problems in the persistent data, reverting to the last known good version may not fix the problem as the correct code now has to deal with incorrect data, which lead to a different, possibly incorrect behavior compared to before the buggy deployment.

Reverting the data itself is often not possible as you usually can’t say to your customers « woops, we deleted all of your bank transactions / incoming emails for the last two weeks because of a rollback on our side ».

So in my experience you still have to roll forward in a lot of cases, to fix both the code and data together.

How do you guys handle that in practice?

[+] joshuamorton|5 years ago|reply

A few things: you usually rollback quickly. After minutes, not weeks.

Defense in depth. Backups, architecture that's supports replay etc.

Dogfooding. No one will complain if you're the person who wrote the code that lost your emails.

Make changes backwards compatible when possible. Add a new column. Wait a while. Write data to the column. You can rollback cleanly without issue at each step. Naturally backwards compatible schemas (protos) help with this.

Dual writing. Similar to the above, if you're replacing a with b, dual write a and b for a while. Validate everything. Start relying only on b, but keep writing a. Eventually stop writing a after a while. At every point, you can roll back to the previous state without issue, so if you can validate everything functions at each interim state, you're always able to roll back.

I'm sure there's more I'm forgetting

[+] zorked|5 years ago|reply

Rollbacks are easy and simple to do and "solve" a large number of problems and go a long way bisecting the problem. They are not expected to solve every problem.

[+] saberdancer|5 years ago|reply

What is different between rollback and manual data fixing or roll forward and manual data fixing? Well, rollback is usually faster and safer.

In some cases you do not have data corruption and can roll back safely. If not, you can probably be quicker to fix the data than find the solution for the underlining problem.

[+] swsieber|5 years ago|reply

Sort of off topic:

I had fun writing feature flag system for java/spring for work. We rolled a lot of our own stuff for tighter integration and this was not exception. You could define an interface, slap on a special annotation and spring would provide an instance backed by a proxy. And this was for a web app, and so we made the toggles would configurable in the main app by admins. The UI for targeted roll outs was pretty simple but powerful. I never got around to adding mandatory cleanup dates, but it wouldn't be very hard. Adding automatic a/b tests would certainly have been a lot harder.

We did find them very nice for fresh implementations of common workflow screens. It was a big app and invariably some functionality would be missed and so we could just turn of the migrated section until we fixed it. Definitely had to do more QA work, but it shortened the turn around time and reduced the stress quite a bit because patches weren't as urgent.

[+] social_quotient|5 years ago|reply

I like to call this light switch per light bulb to drive home the point with customers about toggles for everything. If there is not a clear point behind it, it can become a bit obnoxious.

When working on a system that is or might become multi tenancy we tend to add light switches per light bulb

[+] smanikim|5 years ago|reply

Article highlight some great points. But I think Feature flags bring value where we dont need to rollback code if its failing. Also failure so not so straight forward in most of cases where we enable a feature and it fails. Sometimes it degrades performance over time, sometime you want to enable it slow to understand its semantic under production traffic, sometimes we want to enable something around some launch date etc.

For sure, it brings extra complexity in code because of left over flags which are not cleaned up. Not only complexity which is added while writing flag but later new code also added for all these condition to unsure compatibility. Cleanup should be managed like tech debt but its not unfortunately.

[+] golergka|5 years ago|reply

I think that the article omits an important distinction: feature flags that disable some code before the compilation, or flags that are read by program in runtime to decide what behaviour to choose.

I strongly prefer the feature flags of the second type, because that way, you have at least the guarantee that program compiles with any combination of feature flags; and therefore, you can be somewhat sure that any such combination produces correct behaviour (as much as you can be sure of that in general with your codebase: obviously, results may vary from C to Haskell).

[+] amai|5 years ago|reply

"Software engineering is primarily an exercise in managing complexity."

That is so true. Unfortunately many engineers do not manage complexity but simply increase it until it becomes unmanageable. Then they do a big rewrite and (if the project survived this, often it fails at that point) the cycle starts again.

[+] tln|5 years ago|reply

Funny, I was just doing some feature flag cleanup. The article makes good points, but I think I'll just clean up once a year or so and not worry too much about adding the flags :)

[+] gridlockd|5 years ago|reply

I am always baffled by how much disorienting crap there is on the Amazon desktop UI. What their A/B tests do not show them is me doing the checkout for family members who just give up.

[+] skratlo|5 years ago|reply

> For example, consider the Facebook Android app, which contains code contributed by hundreds of different teams

You don't say

40 comments