One Way Smart Developers Make Bad Strategic Decisions

[+] jmull|4 years ago|reply

Nice, interesting article.

But I would stress less that "Seeing Like a State" -- that is a top-down, global solution -- was not the problem.

The problem was that "Tim" didn't really understand the problem he was trying to solve (well, none of us truly understand very much at all, but he didn't understand it better than many of the teams associated with the individual services).

"Tim"'s proposal probably solved some problems but created various other problems.

The best solution, though, (IMO) isn't that Tim should be smarted and better informed than everyone else combined, nor that every team should continue to create an independent solution. Instead "Tim" could propose a solution, and the 100 micro service teams would be tasked with responding constructively. Iterations would ensue. You still really, really need "Tim", though, because multiple teams, even sincere and proficient ones, will not arrive at a coherent solution without leadership/direction.

> A global solution, by necessity, has to ignore local conditions.

That's just flat wrong. A global solution can solve global concerns and also allow for local conditions.

[+] hosh|4 years ago|reply

> That's just flat wrong. A global solution can solve global concerns and also allow for local conditions.

Past a certain level of complexity, that's no longer true.

_Seeing Like a State_ is a great introduction to this, but I think Carol Sanford's work goes much more into detail. The main thing with the high-modernist view that James Scott was critiquing is that it comes from what Sanford would call the Machine World View. This is where the entire system can be understood by how all of its parts interact. This view breaks down at a certain level of complexity, of which James Scott's book is rife with examples.

Sanford then proposes a worldview she calls the Living Systems World View. Such a system is capable of self-healing and regenerating (such as ecologies, watersheds, communities, polities), and changing on its own. In such a system, you don't affect changes by using direct actions like you do with machines. You use indirect actions.

Kubernetes is a great example. If you're trying to map how everything work together, it can become very complex. I've met smart people who have trouble grasping just how Horizontal Pod Autoscaling works, let alone understand its operational characteristics in live environments. Furthermore, it can be disconcerting to be troubleshooting something and then have the HPA reverse changes you are trying to make ... if you are viewing this through the Machine World View. But viewed through Living Systems World View, it bears many similarities to cultivating a garden. Every living thing is going to grow on its own, and you cannot control for every single variable or conditions.

For similar ideas (which I won't go into detail), there is Christopher Alexander's ideas on Living Architecture. He is a building architect that greatly influenced how people think about Object Oriented Programming (http://www.patternlanguage.com/archive/ieee.html) and Human-Computer Interface design (what the startup world uses to great affect in product design).

Another is the Cynefin framework (https://en.wikipedia.org/wiki/Cynefin_framework). Cynefin identifies different domains -- Simple, Complicated, Complex, and Chaos. Engineers are used to working in the Complicated domain, but when the level of complexity phase-shifts into the Complex domain, the strategies and ways of problem-solving that engineers are used to, will no longer work. This includes clinging to the idea that for any given problem, there is a global solution which will satisfy all local conditions.

[+] phkahler|4 years ago|reply

>> > A global solution, by necessity, has to ignore local conditions.

>> That's just flat wrong. A global solution can solve global concerns and also allow for local conditions.

So lets rephrase that. A global solution that ignores local conditions will have problems and will likely fail.

[+] bentcorner|4 years ago|reply

Makes sense. In my work I've seen this when trying to get developers on my team using certain patterns, styles, types, conventions, or tools (or the inverse - deprecating them).

Suggestions are usually well grounded (e.g., "let's migrate to this `std` class instead of this old home-rolled wrapper), but sometimes there's some nuance to how something is currently done and deep discussion of the proposal can work through these bits.

[+] whakim|4 years ago|reply

> That's just flat wrong. A global solution can solve global concerns and also allow for local conditions.

In theory, yes. But in practice, no (and this is the author's point, I think). In theory, the more "local conditions" you have to account for, the more exponentially complex your "global solution" becomes. (This is the "state" metaphor.) In practice, you can't build that impossibly complex system (and it might not be desirable, anyways!) - so you're likely to try to change local practices in service of a more streamlined global solution. The more you do that, the farther away you move from respecting local conditions.

[+] sokoloff|4 years ago|reply

That depends a lot on the cardinality of the set of Tims.

If there’s one Tim per team, you’ll have 100 Tims proposing different global improvements and 100 teams needing to respond intelligently to those suggestions.

[+] ryukoposting|4 years ago|reply

Through enough iteration, all problems can be solved. But, how many iterations will be required to reach a solution that works for everyone? At that point, is there a solid business case for the project?

[+] chasil|4 years ago|reply

The scenario reminds me of this story:

https://mwl.io/archives/605

[+] GiorgioG|4 years ago|reply

> A global solution can solve global concerns and also allow for local conditions.

Not if standardization is the priority.

[+] darkerside|4 years ago|reply

I think the key difference is whether the local teams have the choice to opt out or not, and my belief is that they should. If they can, they can solve their own problem if the global solution doesn't work. If the global solution wants to keep them as consumers, they must adapt. If they can't leave, the global team will almost certainly stop responding to their needs over time. Like communism, a global solution is terrific in theory, but human behavior causes it to break down in practice.

Caveat, for small enough problems, good enough solutions, and charismatic enough leaders, global solutions can work. But they all break eventually.

[+] NateEag|4 years ago|reply

For anyone interested in social systems that help avoid this top-down, centralized failure mode, I cannot recommend RFC 7282 enough:

https://datatracker.ietf.org/doc/html/rfc7282

A whole lot of wisdom is captured in that document, including a deep understanding of the differences between unanimity, majority rule, and consensus.

If you're involved in standardization efforts in any way, whether it's deciding where your team will put braces in source code or running software architecture for a Fortune 100, it will well repay your reading time.

[+] svilen_dobrev|4 years ago|reply

interesting. For long time i've found that negative logic is more powerful/overarching than positive one - #ifndef NOT_THIS is more powerful than #if THIS .. and this article applies that even to agreeing vs not-disagreeing.

[+] sudhirj|4 years ago|reply

This seems to be hallmark of a “Middle” developer. Not so junior that they couldn’t build a working solution that they assume everyone should use, but not senior enough to think twice about whether they should be building it.

The “we should make a common framework” for this line is the dominant thought at this level. Never even a library. A framework. Everyone must do it this way.

The more senior people share concepts and maybe libraries, and allow the team to use them if they see fit.

[+] shuntress|4 years ago|reply

It's the large-scale version of taking "DRY" too literally.

Junior devs just repeat themselves because they don't know better.

Middle devs rush into an incomplete abstraction by overzealously not-repeating-themselves.

Senior devs just repeat themselves because they know they don't understand what it would take to abstract out the solution.

Like everything... "It Depends". Don't Repeat Yourself Too Much.

[+] GiorgioG|4 years ago|reply

I've worked at bigger companies and there are plenty of folks much higher than 'middle dev' forcing these types of things down the organization's throat.

[+] 1123581321|4 years ago|reply

This kind of consequential decision can happen at high levels. Obviously less often when a truly brilliant developer ends up in a small organization (but that has its own risks.)

In the example, it was determined that they could not afford to let each service solve its individual bottlenecks ad hoc. So a corresponding strategic error was also made/forced at the senior business level.

It's easy to speculate in hindsight, but in this case I could imagine a globally enforced throughput mandate supported by a widely visible and frequently reviewed dashboard, new tools/libraries as needed, and an optional central queue service required to 'compete for business' of the individual service teams.

I can see potential problems with that too, though. In a sense, failure has already happened when growth management is deemed to be too important to be left to capable individuals on decentralized teams.

Enjoyed the article.

[+] Fiahil|4 years ago|reply

I agree.

People use the most practical things at their disposal. If Tim had opted to publish a repository of easy and _simple_ recipes for managing kafka and postgres integrations, while retaining the ability to use original libraries, then I see no reason why it would not have gained traction.

[+] yellowstuff|4 years ago|reply

This article does a good job describing one failure mode that's not understood well, but the opposite failure mode is much more common in my experience- having lots of ways to do the same thing can be very inefficient and brittle, even at small companies. The right answer is not "never unify systems" or "always unify systems", but develop judgement about when things should be unified.

[+] didibus|4 years ago|reply

Agree, I too have seen the lack of unification more often then not, because business projects are always local. This client wants feature Y, why build it for all clients right now if only one client wants it, I only want to pay for getting the feature out to the client as cheaply and quickly as possible. And now you've got a single use feature. Then next client comes over, and you can't reuse the feature, so you build it again in a slightly different way by different people, maybe even in a different team, rince and repeat. I see that all the time. And that's just one example of how people get their velocity down to a crawl over time. The only solution then is to hire more and more engineers until you're a huge engineering department maintaining a single product.

Of course, this is such a rampant problem in the software industry that a whole market for reusable standard generic solutions was created. That's why we got the cloud, and the array of SaaS, PaaS, IaaS, etc. And don't forget the entire open source is about standards, being able to reuse existing components and frameworks.

What I think the article doesn't mention is that unifying and creating a standard solution is a harder task then creating custom solutions one after the other for each use case/local context. In practice I've seen people try and fail, but often it's not the person with most experience trying, or the business isn't truly willing to put in the effort to succeed, both of these can sabotage things. And again, because it is hard, you have to be willing to fail the first time, but use those learning to try again, and again, until you crack it. And doing that is often worth it long term, cause when you crack it the efficiency and scale will go through the roof, if your business is smart, you might even realize what you have is more valuable than your current business, and pivot to being a SaaS vendor haha. Or you can keep it secret as a competitive advantage.

[+] travisgriggs|4 years ago|reply

Lots of resonant points here. It’s worth making it to the end.

I work at a company where there’s a number of different little less-than-one-man projects, and there’s a lot of variety, and so a couple of non-tech types, frustrated with resource allocation (having the right kind of skills at the right place at the right time in the right amount) wants to standardize and simplify.

What I’ve observed though is that when you tell your house painters they can only work with black paint, they can only give your customers black walls, and when your customer wants wood panel, or textured fuschia, then you can’t earn revenue from that market demand.

[+] kerblang|4 years ago|reply

In general, "unity" is something software developers routinely pursue just for the sake of unity itself, failing to understand that unity comes with significant tradeoffs. It is much harder to build a unified solution than a localized, one-off solution. Divide-and-conquer is often a much better engineering strategy: DAC might create more work than unity, but the work is more likely to succeed instead of falling apart because we failed to anticipate all the use cases within the unified framework, especially when we lack experience in the domain.

Also refer to Jeff Atwood's Rule of Threes (which he borrowed from someone else) here.

[+] eternityforest|4 years ago|reply

I've noticed that ALL beginners seem to have a reinvented global solution phase.

Everyone who does electronics might say "Oh I'm going to use this one connector for everything". And it's either ok, if it's a standard connector, or a giant pile of crap that means they can't use a lot of existing stuff because they insisted on this insane DIY grand scheme.

Usually such things have an element of "I want to do Y, so I'll build a modular kit X and use that to Y". And then X becomes the real project and Y is never finished.

The insidious part is how the new product is often a tiny bit better than what's out there. But it doesn't matter. The mediocre standard solution is still way less trouble than the beautiful perfect custom thing. I'd rather have Just Works tech than tech that's Just Right. Anything that seems perfect and beautiful and simple, I don't trust, because it was probably made for one specific task, not to be a general standard you don't have to think about.

I think of the failures with global solutions are because someone did them on a small scale, or because they have to do with natural systems.

Fully top down planning of manmade things by a giant industry consortium is most of why tech is great. Otherwise we would have no USB C, and 12 different CPU architectures.

Sometimes design by comittee protocols suck, but usually because they didn't have enough control, and instead of a protocol, they deliver a description language for companies to make their own protocol, with every feature optional so that compliance does not necessarily mean compatibility.

When you do it internally it can suck because it's more effort than it's worth to replace all your existing stuff.

[+] kingdomcome50|4 years ago|reply

Counter example: Tom standardized a bunch of services... and it worked! Everything is easier and more efficient now.

I agree with the thrust of this post: Changing something that is not understood is a dubious undertaking. But the author fails to make a compelling connection between the above and software development. A poor solution may be a result of not understanding enough of the system as a whole, or it may not. We simply can't tell.

Standardization (i.e. simplification) is generally a good thing in software development. How would Tim's system look if they had opted for his approach from the start? How does the 3rd iteration of the system compare to the 1st iteration? Maybe Tim's solution is stepping-stone to something better. Impossible to tell.

[+] EnKopVand|4 years ago|reply

> Counter example: Tom standardized a bunch of services... and it worked! Everything is easier and more efficient now.

I’m sorry, but that isn’t really a counter point unless you have some cases to back it up.

In my completely anecdotal experience standardisation never really works. I say this as someone who’s worked on enterprise architecture at the national level in Denmark and has co-written standardisations and principles on how to define things from common building blocks.

The idea was that something like a journal of your health can be defined as a model that can be used by everyone who ever needs to define a journal for health data. And for some cases it works well, it lets thousands of companies define what a “person” is as an example and which parts are the person and which parts are the employee and so on, and it lets them exchange data between systems.

Until it doesn’t. Because all of the sudden an employee is two different things depending on what time of the day it is, because a Nurse has different responsibilities while patients are awake, in some hospitals, and not in others. But because the “standardisation” doesn’t account for this, 50 years of enterprise architecture in the Danish public sector is yet to really pay off.

Some of our best and most successful public sector projects are the ones that didn’t do fanatical standardisation but build things with single responsibilities so that they could easily be chained together to fit a myriad of unique needs.

Now, I’m not against standardisation in any way, but sometimes it just doesn’t make sense and sometimes it does. The issue is that the standardisation approach tends to begin before anyone knows which situation you are actually in.

[+] jamesfinlayson|4 years ago|reply

> How would Tim's system look if they had opted for his approach from the start? How does the 3rd iteration of the system compare to the 1st iteration? Maybe Tim's solution is stepping-stone to something better. Impossible to tell.

Reminds of something a senior developer once told me about rewriting systems: the first iteration is ad-hoc and messy; the second iteration is well thought out but completely over-engineered and the third iteration gets it right because the developers have done extremes and know where the correct middle ground is.

[+] alephxyz|4 years ago|reply

A well publicized example is EA mandating all its studios to use the Frostbite engine

[+] __turbobrew__|4 years ago|reply

I used to work on the Frostbite team at EA and it was quite the train-wreck. Many game teams spent considerable time moving to Frostbite only to fail and go back to the old game engine, the game teams which managed to move to the frostbite engine were unable to keep up with engine updates and got stuck on old arcane engine versions, the frostbite engine team was split between multiple geolocations and the teams in different locations didn’t get along well and ended up developing silos, and finally there were about a million layers of management in the frostbite team — I heard from old timers that the team used to be much more engineering focused.

[+] didibus|4 years ago|reply

Counter example would be all games made using the Unity engine and the Unreal engine no? Or how well RE engine is working out for Capcom. Or how the Decima engine works out for Sony and Kojima, etc.

[+] fnbr|4 years ago|reply

I have a bunch of friends who work for an EA studio (or who used to work for an EA studio) and they all _hate_ Frostbite. With a passion.

[+] orf|4 years ago|reply

People are similar to water: they will often the path of least resistance.

The trick is to find a solution, document its “shape”, make it easy to integrate and market the hell out of it internally. Then you let the market decide.

Building a big shared common library can be a mistake, but that’s not because it’s intrinsically the wrong choice, it’s impact is partly a function of how many resources you can dedicate to effectively designing and maintaining it. At a certain scale the economics of this suddenly flips.

The problem in the post seems to scream infrastructure rather than code. Identify the different types of queues services need, pick some off the shelf and preferably managed solutions, then make it take 5 minutes to get started.

[+] galaxyLogic|4 years ago|reply

As a whole the strategy of "Let's see what's common in all these systems" is a good start to understanding the systems. There is a limit to the complexity any single person can understand. Unification is simplification. It helps understanding. But I agree it is no good trying to make the landscape fit a simple map when reality is much more complex. There's no Silver Bullet in trying to combat complexity.

But rather than trying to unify everything think about micro-services. Each service can be its own isolated solution. Of course it needs to be optimized in terms of how well it works when all the other services are running as well. But I think isolation is the key to independently optimizing everything.

I like this sentence from the article: "Lots of it doesn’t matter, but some of it matters a lot".

[+] didibus|4 years ago|reply

Micro Services need a lot of unified standards to talk to each other.

Maybe everyone understands "unification" differently, but for me making something simpler means making things that are small, independent, and where if they change internally they break nothing externally.

But for this to work, you need common patterns, properties, interfaces, otherwise you can't combine your modules into a greater more complex system.

If you do it right, the system as a whole might be complex, but each piece is simple to understand and reason about and making changes to them is not a risk to breaking the whole system.

But it also means reuse, those small simple independent parts can be reused for more than one thing, that's why the system grows complex, so each micro-service is very much a unified standard on its own, just of a small enough scope to successfully build and maintain for multiple clients to leverage.

But now, if you go to the next level up, you have a problem with the complex set of micro-services you now have, and that complex arrangement gets hard to reason about. That is where people added the idea of Supervisor, to have systems that watch over the subsystems, it serves as a way to unify a set of micro-services into a more reliable and understandable hole.

[+] a1445c8b|4 years ago|reply

With a microservices architecture, it's still possible to locally optimize each microservice but then end up with a global architecture that is far from ideal. It seems to me that Tim's starting point was exactly that.

[+] Splizard|4 years ago|reply

Except when Tim decides to standardize how each micro-service is built, using a custom framework...

[+] didibus|4 years ago|reply

The article seems a bit defeatist to me. When I create an Interface and have a few implementations for it, I've created a standardization, most people would agree it's a big improvement in code reuse and maintainability to leverage interfaces over just having a bunch of one time use concrete classes.

If you think of the United States, you might argue having a central government was better than each state being its own country and maybe that's the edge the US has over Europe.

Deciding to build roads everywhere top down definitely helped with overall car transportation.

Having HDMI as a common format for streaming video and audio sources is a big improvement over each TV using its own format.

There's plenty of counterexample to what the article talks about. So what gives? And on the example mentioned, how do you know if that's a great use of standardization or not?

I've been in many companies and I think they often fail to invest enough in frameworks and standards across the tech side. AWS was born out of Amazon's own effort to do top down standardization for example. And now it's their biggest cash cow and the entire industry is standardizing over its services.

Way too often I see people stuck on contract like work. Each locally scoped small problem needs its own locally scoped big project to be handled. That just simply doesn't scale.

The beauty of most engineering is exactly finding these top-down mechanism that do scale, even if it involves changing the business processes themselves. Think of a dishwasher for example, a top-down design that can wash most things but not all, eventually things that are not dishwasher safe became less and less popular and almost extinct, because people want to scale their efficiency.

Can top-down standards fail, ya sometimes, but when they succeed they take things to the next level of scale.

Don't shoehorn every problem in the same solution, but also, don't solve every solution independently of one another and reinvent the same wheel, or your engineering team of 10 will quickly become thousands while your business product will have barely grown.

[+] twic|4 years ago|reply

I haven't read Seeing Like A State. I have to say, i am extremely skeptical about the author's fable about forestry. Commercial foresters today still mostly use monocultures with evenly-spaced planting patterns, and i simply don't believe that they would be doing that if there was a straightforwardly better way to grow trees, even it was less "legible". This has a powerful scent of Gladwell-esque insight porn - the sort of story we love because it's counter-intuitive and makes us feel cleverer than people who haven't heard it.

I don't suppose we have any foresters on this board who can comment?

[+] ryukafalz|4 years ago|reply

Some good points. At the same time, sometimes standardization is good and necessary. Imagine if everyone had to reimplement TCP/IP analogues to communicate over a network; we'd never get anything done!

[+] NateEag|4 years ago|reply

And those types of universal standards arrive by a process not unlike evolution - everyone who wants to has a crack at the problem and the solutions compete for a painful decade or two. Eventually a winner emerges from the top few frontrunners, and by 2040 only networking historians remember ALOHAnet or token ring networks.

However, you can't top-down design a universal standard.

With sufficient skill, patience, and wisdom, you may be able to design a standard that's good enough to be widely applicable.

Humans have free will, though (in practice, at least, no matter what you think about the philosophical question).

A standard only becomes universal when everyone chooses to adopt it.

[+] saila|4 years ago|reply

The first part of the article describes the common situation where we see similarities across projects and effort being duplicated, apparently unnecessarily.

To improve understanding and efficiency, we come up with proposals that involve some kind of standardization, such as a shared library abstracting a service.

My team has recently embarked on such an effort, and I was really hoping for a new take on how to avoid the various pitfalls involved in attempting this kind of standardization.

Obviously, central planning works quite well for certain things and, just as obviously, it hasn't worked well for other things. In most situations, it seems that a combination of planning + flexibility works best.

Further, what works or doesn't work for nation-states isn't particularly applicable to software architecture. The analogy feels quite tortured to me.

As to the example given in the article regarding trees, one could just as easily choose an example from modern agriculture where "central planning" seems to work quite well.

In the end, I feel like the article just boils down to: "Make sure you understand the problem domain and the cost of large scale change before you spend a lot of time and effort making said changes."

[+] memorythought|4 years ago|reply

My reading of Seeing Like A State was not "central planning bad". The author explicitly acknowledges that there are many benefits to what they refer to as "high modernist" projects which we all enjoy on a daily basis. My reading was more that large scale projects designed to entirely change the way something is done in order to make it legible to a central authority necessarily throws away an enormous amount of local information and this leads to a tradeoff. You end up being able to build much bigger systems, but those systems are not as flexible.

In the context of microservices that seems like a pertinent point because the purpose of a microservice architecture in the first place is often to allow an organisation to be more flexible and accommodate local knowledge.

[+] wvenable|4 years ago|reply

I disagree with the conclusion although not necessarily with the situation described.

A programmer only has so many hours in the day; if you want to be a more efficient programmer you either have to learn to type/think faster or you need to build frameworks, write libraries, and codify common practices.

There are situations where that doesn't work but if your job is to pump out code for an organization there's a good chance that most of your applications will have the same authentication, the same look and feel, etc. Putting effort into that core will pay dividends later. But you can't be a slave to your own code; if it doesn't fit in the box then don't force it.

[+] sokoloff|4 years ago|reply

> to be a more efficient programmer you either have to learn to type/think faster or you need to build frameworks, write libraries, and codify common practices.

That’s focused on the writing/creating side of the equation. In my side projects, I became a lot more efficient when I decided to put effort into using frameworks, adopting libraries, and copying common practices.

[+] mathgladiator|4 years ago|reply

While there is much to worry about over centralization, there are benefits to at least trying if you can consolidate n-m efforts when n is the total number efforts using a queue and m are the snowflakes. It all depends on organization size and whether some costs can be amortized.

Everyone having specialized everything is exceptionally expensive at scale. Here, the key is tower of babel. What Tim should have done is kick start the effort with one or two teams, then organically grown it by reducing operational burden for new teams.

The mistake here is trying to globally optimize rather than seeking different vertical consolidation.

By the way, this is why Amazon has so many services.

[+] unknown|4 years ago|reply

[deleted]

[+] hyperpallium2|4 years ago|reply

I expect "Seeing Like a State" goes into it, but top-down global abstraction and standardization often does work. e.g. mass production, mass media, and States themselves are unbelievably, fantastically successful.

The point is more than it doesn't always work, as standardization is a poor substitute for actual understanding.

[+] dre85|4 years ago|reply

Very interesting article! I was left wondering though in what ways the queue abstraction solution failed?

[+] dilatedmind|4 years ago|reply

for this specific example, I think the shared library is not the correct approach. Queues work for simple fifo behavior, but in this case they also need fairness (and perhaps other logic, like rate limiting per client, different priority for certain clients etc)

For example, "Customer-A is clogging up the work queue and starving other customers out". The solution to this could look something like linux's completely fair scheduler, where every client is allocated some amount of messages per interval of time. This means messages need to be processed in a different order then they are enqueued, and queues are not good at reordering messages.

I would suggest implementing the queue abstraction as a rest or grpc service, backed by a database like postgres, which holds message payloads and everything related to message state (in progress, retry after times, etc). Now we can implement all the necessary scheduling logic within our queue service.

[+] kerblang|4 years ago|reply

They hinted at the fact that kafka is not actually a queue, and it especially has problems with fairness if you try to use it as a queue for unequally sized "jobs" and/or unequally sized consumers. Kafka is for the exceptional case; actual message queues are for the default/general case.

75 comments