sandal | 9 years ago | on: Alan Kay has agreed to do an AMA today
sandal's comments
sandal | 10 years ago | on: Beginning to climb out of the software death spiral
sandal | 10 years ago | on: The sad graph of software death
sandal | 10 years ago | on: The sad graph of software death
sandal | 10 years ago | on: The sad graph of software death
Most of the proposed fixes are ones I agree with in spirit, but implemented a little bit differently in practice.
sandal | 10 years ago | on: The sad graph of software death
The benefit I had in this particular project is that I was an outside consultant with full access to everyone and everything in the organization AND the trust of some of the leadership there as well as a couple of the developers.
That's a pretty big benefit and not typical. Still, I think it's important to talk about what did work (I waited years before publishing this essay, and repeated it a couple times elsewhere), because even if you don't have the leverage, knowing how to make a strong case is half the battle.
Will post the followup within the next day or two!
sandal | 10 years ago | on: The sad graph of software death
This is great advice and is ultimately what we focused on once I got past this initial triage/prioritization problem in the org I was helping.
> To make the analogy work the backlog needs to get rated on how much each open issue costs the company. Most organizations use some stratified measure of "severity". Few actually try to put a price on a problem, but I think it's worth the effort to try.
This is hard to do when the backlog is increasing by hundreds of open issues every couple months.
But the trick is ultimately to find a way to very quickly trim the backlog and see what crops back up, and then put each resurfacing issue through an economic decision making framework (even if it's a back-of-the-napkin calculation), as you suggest.
sandal | 10 years ago | on: The sad graph of software death
sandal | 10 years ago | on: The sad graph of software death
The goal of the essay though is not to suggest that tracking issue count is a useful metric.
Instead, what I'm suggesting is that if you see lots of stuff going wrong in a project (things that are clearly costing time and money, angering customers, etc.) and then you look at whatever issue tracker is in place and you see a pattern like this, it's a sign of a broken triage and prioritization process.
It's so common for this to be happening that I've seen this in many different projects, and I think you're spot on when you say it's because an issue tracker wants to be so many things at once, and it rarely closely tracks value.
But in crisis situations, people on the front side of the business feel like they're "doing something" to help customers and users by filing tickets and then continuing to ask for status updates for them, to fight for their inclusion in prioritization meetings, etc.
Because this does not actually help anything, it actually ends up hurting things. The tracker ends up as a mediator that obscures or limits communication about the real issues, and without a change, that can be disastrous in a troubled organization.
So... the point of the graph is more of a quick way to confirm that triage/prioritization is being done wrong, so that you can make a decision on how to fix that.
And that's what I'll cover in the followup essay. :-)
sandal | 10 years ago | on: The sad graph of software death
If you're seeing a massive amount of problems in your organization AND you have what appears to be a badly broken prioritization/triage/issue tracking process, you need to fix your triage process before you'll be able to solve the real underlying problems.
There's no situation in which opening 500 issues in four months and only closing a tiny fraction of that amount is healthy, regardless of severity or whether they're feature requests or bug reports, or whatever.
The graph makes that point, and I'll hold the fact that this is on the front page of HN, /r/programming, and lobste.rs as evidence of that point being sufficiently informative and well understood by the vast majority of readers.
I would have updated the image on the first report of the issue if I was able to edit the post, but this is an archive from an email newsletter entry. It is worth fixing, and I will fix it in time for the followup essay.
I am just generally bothered by how incredibly, unbelievably pedantic it is to fixate on this one point and act as if the whole essay isn't valuable because it took an extra few seconds to read the graph.
But hey, this is the internet. We can ignore the larger points and focus on fine-grained details, and that's normal, right?
sandal | 10 years ago | on: The sad graph of software death
sandal | 10 years ago | on: The sad graph of software death
But it's important to note that this is an accumulation graph... so it starts at zero but that doesn't mean a backlog of zero (my guess is it was massive, because it was massive at the time I arrived)... this instead counts issues opened and closed during the time period across the entire system. Some issues closed during this period would be ones that were opened long before the start of the period.
This means that the graph will always increase, and that the space between the two lines represents the growing net increase of unresolved issues in the tracker.
I wish I still had the raw data on this, because it's a little hard to tell from the scale that the differences from week-to-week can be pretty big (like 10 issues opened one week, 50+ another)
When I arrived, there were a few specific issues in play, along with all the usual code quality / project management issues that plague any troubled project:
(1) A high number of support requests that required manual setup work from developers, which increased greatly due to overall growth in the customer base.
(2) Some high defect density areas in the codebase that generated emergencies, and fixes that would end up breaking other things in the process of dealing with the acute issue, along w. infrastructure/architectural scaling problems.
(3) As you guessed, reduction and restructuring of team capacity, without a corresponding change in the workload.
So these things... they happen more often when we wish in the software industry, and it produces graphs like this.
(that said, keep in mind this was pretty much a napkin sketch. any issue tracker is going to have a TON of noise, unless it's very well pruned -- the sole purpose was to point out the wide gulf and relate it to the problems already obviously observable onsite, and then use that to motivate real work to change things.)
sandal | 10 years ago | on: The sad graph of software death
I plan to fix this when I use this graph elsewhere, but I really don't understand this comment. Is it an automatic knee-jerk reaction because you looked at the graph axes and didn't read the article?
Or did you read it, and have a genuinely hard time making sense of what was going on?
If the latter... sorry about that. However, I'm surprised at just how many people seem to have understood this idea, even without the labels, if it is such a severe problem.
sandal | 10 years ago | on: The sad graph of software death
sandal | 10 years ago | on: The sad graph of software death
You can't assume this unless you freeze development. I wrote the essay... the opposite was true: rushed work was creating a runaway defect density while the product was externally experiencing growth, which in turn brought old tech debt to light as for example... things with a few percent defect rate that were easily to manually deal with before suddenly saw 10x-100x increase in frequency.
So... if you allow for a cooling phase, yes... you end up with a "more issues, but less severe" pattern. If you're in a high-growth situation with a development team that's at near 100% capacity utilization... you get infinite death graph doom. :-/
sandal | 10 years ago | on: The sad graph of software death
sandal | 10 years ago | on: The sad graph of software death
However, if you know for sure that there are tough problems going on (related to growth, limited dev capacity, code quality issues, etc), then having well-functioning planning tools can be useful. A tracker that looks like this, and the process that leads to this trend is a problem that is worth fixing in that situation.
So that's what my follow up essay will cover. How to put a very simple process in place that gets you back to the point of reliable project prioritization and measurements, without putting in a large time investment.
Because I think even smart, capable teams can end up in this situation due to various forms of external pressures, knowing how to dig out is pretty important, and ideally long before it hits the tipping point I had been at when I started helping with this particular project.
sandal | 10 years ago | on: The sad graph of software death
I've repeated this process elsewhere too, both for other companies and on my own projects... though the example shown was among the most severe.
In the essay, I'll try to be a bit more precise though, because in practice you might have a hard time with getting buy in on "Delete the issues" and an easier time with what I actually did on this project:
"create a priority queue that only the product owner and CEO can touch, treat that as the new official backlog until the crisis is resolved, put a bunch of rules on what gets in there and how much it can hold, then track progress actively"
More details will be shared in a couple days, can't wait to hear responses. :-)
If you want the followup essay in your inbox, sign up here: https://tinyletter.com/programming-beyond-practices
sandal | 10 years ago | on: The sad graph of software death
I linked the Reddit thread because there are some good thoughts there, but I'd also love to hear what HN has to say!
https://www.reddit.com/r/programming/comments/3z1pfp/the_sad...
sandal | 10 years ago | on: Building a feedback loop (Writing a programming book, episode 5)
This advice isn't (or at least shouldn't be) anything new to those of your building your own businesses. However, the idea of "getting out of the building" and talking to others may be less obvious when it comes to producing technical writing works like books (or even blog posts).
There have been times where I've worked on a long complicated article for a hundred hours or more and released it to the sound of crickets.
In my more recent works, I've been actively getting involved with people in my target audience, and might have dozens of conversations for a single ten page article before it even lands on the public internet.
Needless to say, the difference between those two extremes is huge! Happy to discuss more if anyone has thoughts or ideas to share.
So there's this inherent tradeoff between "easy to process" and "expressive" -- and I imagine deciding which side you want to lean toward depends on the context.
Check this out for a practical example: https://www.practicingruby.com/articles/information-anatomy
(not a Ruby article, but instead about essential structure of messages, loosely inspired by ideas in Gödel, Escher, Bach)