My personal experience with story points is that the number never really mattered, but the process of the team discussing how to rate the complexity of a task was very useful. In terms of having utility of estimating how long something will take, I personally have never been able to translate story points into a reliable indicator for that for many reasons (e.g team changing, domain changing, variability in operational load outside of development work).
Overall I tend to avoid using story points, but on the few teams I worked on who really wanted to use it, I always framed it around building shared understanding rather than a metric that is actually useful for estimating work.
I used story points for years with my teams and they worked "as advertised", which is they helped the teams understand the effort, complexity and risk for each story involved.
When there was disagreement, it helped them dig deeper to understand why and usually reveal somebody's incorrect assumptions.
It helped make sure teams didn't overcommit to the amount of stories they stuffed into a sprint and avoid either burning out, or worse, normalizing not finishing the sprint causing negative impacts to morale/motivation. (For some reason my teams often thought they could do more than the points implied!)
Most importantly, when large projects were proposed or were in progress, we were able to give realistic estimates to the various stakeholders about when to expect the various milestones to arrive, which bought us engineers a ton of credibility, trust, and respect with the rest of the company.
And yes, management wanted to see the story points and measure the team against them. I told them to F-off. Nicely. Kinda.
It helped that I was either a CTO or a senior enough exec in those cases with 3-8 agile teams. I essentially was the middle management and could put a stop to any destructive practices like evaluating teams against their velocity.
>In terms of having utility of estimating how long something will take, I personally have never been able to translate story points into a reliable indicator for that for many reasons (e.g team changing, domain changing, variability in operational load outside of development work).
If your measure of the utility of story points is how well they help you estimate time, then you're right, they're useless to you. If you're on a scrum team, they're a useful back-of-the-envelope way to estimate which bits of validated functionality you're going to be able to get into the codebase this sprint. No one outside the scrum team should care about story points, and they certainly shouldn't be used to generate reports. Velocity is for the team's benefit, as a tool to help manage workload and schedule.
The points are needed because it's convenient to do reports based off of it. If we used estimation language such as "easy, medium, hard, dummy-thicc" they'd still need to assign points to those labels so they can do math on reports and graphs to watch your performance
The biggest sin of course is then trying to predict velocity, but the consequences of that usually just make people doing the reporting look silly for no reason. I think even the slow developers rarely get fired, and you also get nothing for clearing more story points than other developers.
No bonuses for higher velocity is the real reason no one takes it seriously.
This is my experience as well. Story points in and of themselves are worthless, but as an excuse to discuss the project as a team, they can have some value.
Those items after your eg are part of the problem why story points don't work for you.
You really do need to control the amount of time your devs get distracted by other things. If they're solely focusing on that one project, you'll be fine.
Its why i personally appreciate XP more than plain old scrum.
That's a very healthy approach to it and deserves applause. Kudos.
The goal of mobbing around task breakdowns is to drive the communication and building that shared understanding. Just with an output of writing most of it down in a way that makes it easy to track progress and approximate the size at the same time.
They're also useful in retro for the simple fact that something was written down. If your story point estimate was way off for a particular task, you can pull up the original conversation, identify what assumptions were violated or what new info was surfaced that wasn't present at the original estimation and whether there's learnings/process changes to be made.
Of course, retro is usually the first thing to go in a deluded attempt to increase velocity and story points hang on as this vestigial tail, contributing to more cargo cult software engineering.
edit: One scenario that might play out:
The team all agreed that tasks A, B & C were worth 3, 5 & 1 points respectively but Steve and Mei thought task D was worth 5 points but Carol thought it should be 13 because it involved integration of an external API where her experience was that other API integrations in the past had exposed hidden complexity.
Task D ultimately was not completed because during the process of integration, it was discovered that the API did support a key feature that was needed and instead, an in house alternative needed to be built.
It was decided that going forward, the team would instead assign a 1 point task to building a toy app for any new API integrations that would then feed into the process of deciding the story points for features requiring that integration.
Yeah, the discussion can be really illuminating. If the points weren't recorded or mentioned afterwards I think it would be a net positive in some cases.
Software does not exist in a vacuum. You are probably right in thinking that story points should not matter to you, as a developer; but they do matter to external stakeholders.
Forecasting and management of expectations is necessary because software often has external real-world dependencies: available funding for a first release, marketing materials, hardware that can't be released without software, trade shows, developer conferences, yearly retail release parties, OEM partners, cyclic stable releases for enterprise customers who won't push software into production without extensive pre-testing, graphics adapters that can't be released without drivers, rockets that won't launch, cars that won't drive, etc.
All of these things require some degree of forecasting and appropriately-evolving management of expectations. Here's where we stand. Here's what we can commit to. Here's what we might deliver, but are willing to defer to a future release. Here are the (low value) items were under consideration that we will definitely defer to a future release, here are the features you will need to drop to get feature b in this release.
The purpose of story points is to help provide forecasting and management of expectations (with appropriately limited commitments) to stakeholders, on the understanding that forecasts are approximate.
Calibrated burndown of story points is pretty much the only basis on which forecasting and management of expectations can be done in an agile process. The key is to make sure that stakeholders understand the difference between forecasting and commitment, and to make sure your development team is appropriately protected by a healthy development process.
Whether the author's claim that you get better forecasts by just counting stories, instead of summing story points... color me skeptical. I do get that it prevents some obvious abuses of process, while enabling other that are just as bad. If somebody is using story points as a developer performance metric (which they shouldn't), there's nothing that prevents them from using completed stories as a developer performance metric (which they shouldn't). The corresponding abuse of process to combat that metric would be to hyper-decompose stories.
Story points aren't time (as OP states). They're relative complexity, and uncertainty (hence the fibbonacci sequence building uncertainty in to larger numbers). And stories should be able to sized as big numbers. I've never been on a team comfortable with more than a 7, at least not since my first agile experience where we all took an agile/scrum training together for a few days. We'd frequently give things like 21 or 30 or 50 points, as appropriate. That's the only place I've ever seen a burndown chart that looked like it should. Everywhere else, it's flat until the last day and then drops to zero as all those "it's a 7 I promise" get carried over to the next sprint for the 3rd time.
I haven't done story points estimating in years, but at the time, an 8 was rarely acceptable, a 13 surely was not. Estimates that high were basically saying "this story is too big or too poorly defined to estimate accurately" and we'd try to break it down into several stories of 5 points or less.
The vast majority of our stories were 2, 3, or 5 points.
The x-axis of a burndown chart is time, right? So if you create a chart that measures points/time then you encourage the idea that a certain number of points can/should be completed in a day, ergo that points are a proxy for units of time. Otherwise what's the point in the chart?
Yeah, it seems like it's fairly common for people/teams to follow the idea that any story that is 8 or more points should be broken down to tasks of 5 or less. This simply doesn't make sense to me. If the most simple task is 1 point, is your most complex task allowed really only 5 times as complex? Story points usually follow an exponential increase for a reason, enforcing staying in the mostly linear portion is just pretending the complexity and uncertainty has been decreased.
The use of the Fibonacci sequence is so pseudo-intellectual. It's completely arbitrary, but use of the Fibonacci sequence makes it sound smarter or justified somehow.
This article argues against story points but then concludes that the solution is breaking down work packages into atoms called tasks and measure the queue length. Those are two different dimensions. No matter how hard you try to break things down, a task queue will still have items in it with different sizes.
There is nothing saying you can't refine work packages together in your team, while still using story points. That's actually how it's done almost everywhere. When items end up with a very high estimate there will be a push to refine it to something smaller. Something you should do but only as long as it still makes sense.
In fact the worst place i ever worked at was where we were given strict orders to break down every story until they all became 1 story point (still using story points...). Doesn't take a genius to figure out what happened next. All packages started having pointless micro-tasks with thousands of cross dependencies between them: "open the editor", "write a function" "write a unit test", "commit the code", "review the code". How am i supposed to write a unit test before the function signature has even been made? How am i supposed to iterate when finding bugs? More complex tasks still overran their estimates by factor 10, in fact even worse than before, some things just can't split, yet they still needed 1 point estimate.
Using the queue length and the impact on variability is still an interesting concept, i just don't think you should connect it with breaking down everything into single-sized items.
I wish the article spent longer with discussing the queuing view and less time on pointing out the flaws with story points, but I guess I'm biased because the flaws already are "obvious" to me.
I think there's an inherent tradeoff between the overhead and misery of breaking down a task into granular subtasks and the variance of task completion time. In practice what this would mean using a queue style form of tracking would be that you trust your team to break down work and do time-bounded investigation into unknowns. Then you look at your task completion rate. Now measuring this as an RV gives you not just the average task completion time, as we reduce to using Little's Law, but also variance of task completion time. If we find task completion time to have too much variance despite the input queue length of tasks not actually growing very much (i.e. the arrival rate is staying stable and low), then it's probably worth having the team break down tasks in a more granular fashion (or maybe it's a single person who keeps making giant tickets or something). On the other hand if the team keeps complaining about straight-jacket ticket discipline, it's probably worth letting folks be more loose with task creation. There's a human element here but there always is, since it's humans who are doing the work, and that's fine.
I've always argued that output per person on a team should be modeled as RVs, but I really like this queuing approach and it's something I may bring up on my team. Again in practice I'd probably just track task completion times on a weekly basis. This would be much simpler than story points and instead of the bickering that comes with trying to break a ticket up, it would give engineers more autonomy over task creation.
The bit where it broke down for me was where they initially make the (great) point that agile falls down where small story points for simple tasks (which is accurate) are added to large story points for complex tasks (which is much less accurate), leading to a net inaccurate estimate. So the take away is to break down complex tasks.
Then they introduce queues, which are made up of small tasks.
I admit I stopped reading at this point. Is the useful thing the queue, or the fact everything is now broken down into small tasks?
Story points were only ever an approximate unit of time. To fit them into a clear unit of time such as a sprint is a clear indicator of this. To call them complexity or anything else is misleading- transcribing a dictionary by hand into a CSV file is super simple (1 pt) and takes a long time (how many points is 3 FTE-months in your calibration?).
Eatimating effort-time vs completion time are quite different, serving different stakeholders. A story that takes 1 point of effort by anyone's pointing system could still take a week due to any number of factors (crucial collaborator gets sick, laptop crunched by a car, ransomware attack, whatever). The only really estimable aspect is how long the developer will spend on the work, not when it will be done. Air speed, not ground speed.
That said, it's not clear how queue analysis helps when you haven't spent any time saying how long you might expect each task in the queue to be, or what the dependencies are between tasks within and across teams. Given engaged team members, I've gotten very good results on predicting the pace of progress for sprints, and it all never mattered because everything needed to be shipped. About 4-6 weeks before each X was to be completed we could say with confidence that X would be ready in 4-6 weeks. Not terribly useful.
I just don't get what the new terminology is for. Like how is this helping anybody?
Manhours = story points
Task = story
Subtask = sprint
Upcoming tasks = backlog
Task turn around time = ??
Project goal(s) = epic(s)
I'm probably not even doing this right, but what are we doing here anyway?!
It gets real fun when the project involves software/firmware + mechanical engineering (think machines, robotics, etc), gotta love their faces when you teach them the special magic advanced project words for special software people.
Also, the whole thing about "velocity" is just weird. It's as if some marketing goon heard the word "Velocity" and decided to over load the term in a manner that does not mean Velocity in the mathematical sense. Like in Calculus... the velocity curve is the first derivative of the linear graph, and the accelleration curve is the 2nd derivative. And for that matter, who cares about velocity, we have always wanted accelleration which sorta yields velocity. The problem with velocity, it's a dumb idea, because if I drop a rock from a tall building, it builds velocity as it free-falls down, and so it goes -- you have differetn kinds of velocity, but you only really have one kind of accelleration.
Another issue, as the blog points out, story points are a simple scallar number, and you cannot easily say how that sum is factored into risk, complexity, effort, etc... So why even have it? Clearly a story needs to be a kind of vector with these properties (risk, complexity, effort...) set to possitive or negative values, and then do the typical vector math to plot the dirrection and magniture of the story in the n-dimensional space... Then, one can actually calculate stupid ideas like velocity or accelleration to speak in the language of project/program managers.
Ultimatly the managers need a way to quantize the units of work done by development staff to better plan, and that's understandable, yet really hard. I think putting the cart before the horse is usualyl a dumb idea, and things should go back to measuring performan after the work is done, instead of estimating performance before work begins. Nobody wants to commit to a performance contract for each and every task ad nausium, and thats what happens durring sprints.
But you have !sprints! and if you have laptop crunched you either have other dev taking over - so something else doesn't get done - or story that dev with crunched laptop is not going to deliver. That's just life and I also understand how disconnected management can be - but a lot of time you cannot say "exactly" when something will be done, because that is just not possible.
I don't understand people like this author. I guess I do. It's all marketing flamebait to get attention.
He knows exactly what story points are, he goes through them exhaustive, but strangely deriding them the whole time.
Then he concludes by purporting to invent the very practice you ALWAYS were supposed to have been doing to make story points work. You have to find a set of repeatable work to compare new stories to for reference. That's the whole game. That's his tasks "idea". That has always been part of every implementation and lesson on story points ive been exposed to.
It's a completely nonsensical article.
Story points have always been about queues and implementing Little's Law. Always.
Yes, it sucks to be on teams that just argue about points and don't work to refer to standard architectures for building blocks. That doesn't mean story points are broken, it's pointing out something else in your organization is broken.
A lot of real criticism in this comment, but props to the author for writing at least. It's more than I do as a part time internet complainer.
I can see that perspective I suppose, but it's certainly not marketing flame bait.
Whatever story ports were always supposed to be, they aren't. Numerous people's real world experiences go sideways, because of the way that they are designed. You're setup for failure and confusion.
I never claimed to invent anything. I'm highlighting Donald Reinertsen's work that more people should be following.
The purpose of the article is to remind people of why everything is broken so that they can identify it and fix it, including examples.
That's like saying a system that generally yields bad results isn't to blame, its people.
For a vast majority of managers in "software companies" things like story points are about asserting control over what is created, getting commitments from various folks, and then increasing stress to have you "sprint" constantly "behind schedule" so they can inject additional requirements or pivot to the new thing they want to do.
Story points are a useful exercise when trying to discuss complexity of a task but IME it completely falls apart when used as a metric to determine anything useful like velocity.
I had a terrible experience with them once. I was a relatively new, enthusiastic engineer on a struggling team of guys who'd been at the company a long time and were pretty burnt out. Inevitably I started getting all the "hard" stories with a lot of points, til it got to some stupid point where I was outputting about 80-90% of our team's combined story points. Management caught wind of it, didn't like it, so what they decided to do was adjust my points "downward" to be more in line with the rest of my team's output. It really irritated me, because it'd result in absurd situations where my teammates would get a "3 point" ticket that was like, just updating some field in a config file to an already-known value and checking it in somewhere, and I'd get this whole-ass project wrapped in a single ticket and they'd give me the same amount of points for it. And of course this was tied in to performance reviews, adding to how annoying it was.
Another super irritating thing that would happen is I'd be asked to estimate complexity on some super vaguely defined ticket describing some problem that would take a lot of detective work to even figure out how to solve, so how am I supposed to give an accurate estimate of complexity? If I knew that much, I'd already have probably fixed whatever the issue was.
When my company first started using Scrum it was with Rally. It had story point, but also estimated time and actual time. We used them all, and I found the estimated and actual time to be the most useful. We then went to Jira, which kept the points and dropped the time.
Since we were tracking the actual time things took as well, our estimates got better over time. The team would help to keep others honest. For example, writing documentation always took at least 3x longer than anyone expected, so we’d alway make people add more time to those stories.
Once we had reasonably accurate time estimates, there was also a feature for available hours. If someone was on vacation, we’d subtract those hours while planning the sprint. It was then easy to see who was over committed, balance the work, and to make sure we were being realistic.
We worked like this for about 2 years. It was probably the best 2 years I had in the job. We got a lot done, we were being strategic about our work rather than reactive, and while we would push to get things wrapped for the end of the sprint and the demo, it never involved all nighters or heroics. We pushed because people on the team wanted to do more and go faster, not because of pressure from the outside. I noticed people got more down when they tried to do less, so I’d often push back on people trying to overload the sprint. If they finished everything, we could always add more. Outside the team, our VP told us we were 3+ months ahead of everyone and if I wanted to go hang out in a cafe in Europe for a few months, go.
This is all a distant memory now. So many lessons learned, but they all fall on deaf ears in the current organization.
The value of story points is that it acknowledges that not all stories are equally time-consuming. Importantly, story points also provide a process for identifying stories that should be further decomposed.
In my experience, story points allow forecasting that's as good as any forecasting method I've ever used. And I've used pretty much every schedule forecasting method over my long career.
The author touches on many of the reasons why story points don't work. And pretty much every reason he gives is something that you are not supposed to do.
The key to getting them to work is trust, and a commitment to never use story points as metrics to measure the performance of developers. Any attempt to do so will result in gaming of the system. The tradeoff that stories provide is lack of precision in exchange for not having to spend 50% of your development cycle up front doing detailed analysis required to provide detailed estimates (which never worked anyway).
Things you must also never do:
- compare calibrated burndown factors between teams.
- Ask why the calibration factor isn't N. THe calibration factor is.
- Have stories with more than N story points (where N is 3 or 5). Decompose them.
- Introduce the least bit of stress.
- Use burndown rates to generate commitments, instead of forecasts.
- Use forecast results to justify asking developers to work overtime.
The last point, I think, is particularly interesting. The manager who was my first scrum master made the following commitment to us: you will never work overtime again. And voluntarily working overtime will be consider a bad thing, not a good thing, since it impairs predictability of the team's productivity. "I know you don't believe me", he said. But he was right. We never worked overtime again.
Is there anyone else out there still doing waterfall and estimating in hours and everything is just going fine, or am I just that lucky?
I work on a small professional services team customizing a couple of our products for our customers. A few times a month we get a request for an estimate to add a new feature or workflow. We do a high level customer requirements doc, discuss it as a team, the seniors from each area (design, development, qa) each provide an estimate in a range of hours. This all gets wrapped up into a final price to the customer. If they approve it, we dive into detailed design and have them sign off on the result. Then we go into development for weeks to months, qa, and then release. Our processes really haven’t changed in over 20 years. We’re constantly ranked as one of the most productive teams, and get high scores on our post-project surveys from our customers.
Just estimate time. In the end, whenever I worked with story points in various companies, even though developers, Project Managers or Scrum Masters would often state that Story Points are a measure of complexity, in the end, the Velocity for a given sprint was measured in Story Points as well. So, in the end a Story Point is equal to an amount of time in a sprint.
This is also stated in the article:
> Story points do not represent Time, yet the Velocity metric they are usually combined with defacto converts them to time, sabotaging everyone from the start by doing the thing that you can't do with a precise number and a range...adding them together.
Better yet, just don't bother with SCRUM and all it's pointless and time-consuming ceremonies and just get shit done. This is my preferred mode of working and I've been lucky to be able to work like this for the last couple of years.
I wish management types would get it through their heads that you just cannot reliably estimate most software development projects. (I said "most" -- there are of course exceptions.) You can't evaluate employee performance by looking at a burn-down chart. You can't show pretty graphs at the end of every sprint and expect that to predict the future of the project.
What you can do is set a reasonable deadline with your team, have them work toward it, and allow them to adjust your expectations on what exactly you will be getting by that deadline. Yes, establishing that deadline in the first place requires some sort of estimation, but story points, t-shirt sizes, etc. are useless for that. Everyone on the team sitting down, breaking things down into as-small-as-possible tasks, and coming up with time ranges for each task is the way to do that. Then you add up all the minimums and maximums and you have a time range for the whole project. But that range is still only a guess, and can't be taken as gospel. And it may be wild, like "somewhere between 6 weeks and 6 months", and you have to accept that.
That's it. That's the best you can do. As the project carries on, the only thing you can reasonably report on is the list of features or functionality that's been implemented so far, and the new range of the estimate based on what's remaining to do. You can also look at the completed work, and map out where in the per-task estimate range the team ended up hitting, but that still can't predict the future.
You especially can't evaluate performance based on this stuff. That requires being an involved (but not micro-manager-y) manager who knows the team and can identify when their people are shining bright, and when they are struggling (or just slacking off). It's called people management for a reason; you have to involve the humans in that process and can't evaluate them based on some made-up numbers and dodgy, hand-wavy math.
Story points are pointless on their own, but estimation meetings are invaluable. In those meetings, story points serve as shortcuts for expressing gut feelings. However, after the meetings, they become completely useless, and even harmful, because, as the article mentions, people start treating them like precise numbers and do arithmetic with them.
As someone looking at algorithms for managing buffer bloat, this resonates.
In IP routers, the goal is to keep the congested link busy. i.e., Idle time from a momentary hiccup is wasted time. You need a small buffer to do this, but piling on more data adds latency without actually doing any good.
Algorithms like CoDel realized that a lot of previous attempts to make this were noisy as heck. Minimum latency through the queue is the signal that makes sense. Everything else is misleading or gives inaccurate predictions. Why should it be any different for managing tasks for human workers?
I've personally had great success with story points.
Success with story points comes when everybody realized they are useless for anything outside of a dev cycle and when you realize that the effort into making them somewhat accurate is the valuable part.
The article lost me at a certain point, somewhere around the "solving the conondrum".
It lost me, because we have two estimations - an overall size guess of an epic and an actual implementation estimation of an epic. Like the overall size guess is just 2-3 seniors looking at an issue and wondering if this takes days, weeks, months or years to implement.
The actual implementation discussion is however what the article is talking about. We get most or all of the team into a meeting and we talk through what needs to be done, and structure all of that into individual concrete tasks everyone can have an idea of implementing them. And then we estimate those tasks.
And this estimation in turn is communication to management. Like, we've realized that about 21 is what one of us can do in a usual monthly iteration outside of massive outages and such (we're an operational team). So if an epic turns out to require some 3 21's and 3 13's... that can easily take 6-12 months unless we put exceptional focus on it. With high focus... as a team of 4-5, that will still take 3-6 months to do.
On the other hand, something that falls into a bunch of 5's and 9's and such tends to be muddled and struggled through regardless of whatever crap happens in the team much more reliably. It needs smaller chunks of overall attention to get done.
And note that this communication is not deadlines. This is more of a bottom-up estimation of how much more or less uninterrupted engineering time it takes to do something. A 21 in our place by now means that other teams have to explicitly make room for the assigned person to have enough headspace to do that. Throw two interruptions at them and that task won't happen.
I've never understood the point of the points (pun intended) you can try to convince yourself they're measuring "complexity" or whatever but what they're measuring is time, so why not cut out the middleman?.
I'm not saying to estimate tasks accurately to the hour. Just rough ballparks: "I think I can do these 3 on friday". "Give me a week, it's gonna take a pile of tests to make sure we get it right". "Hmm I probably need the whole sprint for that, we don't have a clear view of the impact, it sounds like a small change but it uppends our current architecture".
These are the real discussions, the numbers are a silly abstraction on top of this and are unnecessary.
It should also be 100% expected that the estimates will be wrong. New bugs will show up, regressions will be introduced, requirements will change last minute. It should also be expected that some tasks that were supposed to get done won't be and other tasks that weren't in the planning could be snuck in. You are "Agile" are you not?
If you're not than just go back to waterfall and stop dragging the Agile Manifesto through the mud, thanks.
The entire point of the exercise is the discussion, it gives you some idea of what is likely to go well and be finished and what is a risk factor. If you're measuring "velocity" god help you.
I think some of the problems and issues we're discussing here derive from confused project management methodologies that call themselves 'agile' yet require detailed estimation, measurement and reporting of task timing.
Story point allocation can be useful to give a quick and easy 'good enough' estimation of time/effort required for a significant chunk of work (epic).
I find that this approximate approach is almost always more accurate than trying to estimate every little task.
If the project manager and engineers try to break down a project into small granular tasks with time estimates then it's almost inevitable that the effort will be underestimated because it's virtually impossible to anticipate every sub task, blocker, unforseen delay, etc (and then there's the extra time it takes to manage all these micro tasks in your PM system!).
In such situations the old project manager trick of doubling all estimates tends to provide a more accurate timeframe.
This is why story points can be more accurate: because you are estimating the effort it takes to do something relative to your previous experience of similar workloads.
So, if you avoid estimating granular tasks and keep your estimates as the approximate amount of effort relative to something you've done before, then you will end up with a more realistic timeframe. Story points can help with this mindset. Also your team will not have to waste time faffing around in Jira too much, or whatever system you use.
To me story points, planning poker, scrum, and a bunch of other "agile" artifacts are funny concepts.
Making 4-5 highly paid professionals sit around in a circle with a bunch of cards trying to estimate how long/complex/??? the task is, writing it down, and doing that every two weeks, to dubious results, is exactly what one can expect out of today's businesses.
My guess is management doesn't trust that engineers aren't just fiddling around, so they hire a bunch of management-like people to watch over the drones, and show them who's in control. And how else you can show you are in control if you don't introduce magical rituals and tracking useless metrics, and make grown people participate? That way, all these meeting rooms can be occupied and the owners can feel like something important's happening and that they are getting their money's worth.
The fundamental issue isn't addressed but it sure is hinted at. Scheduling is an NP problem. As queue size, or backlog, or (insert thing here that tracks work to be done) grows it takes n^p calculations to schedule it optimally. This is hit on when it mentions small teams hitting their estimates. They can do this because their task list is small enough to go through all the permutations and actually come up with an accurate estimate. The only way to keep an n^p problem under control is to divide and conquer it. The leafs of that process must not go beyond a fixed size and the task divisions can't be recombined prematurely. Everything else is just yet another management idea that that will fall apart when the task has too many pieces. Once agile or any other management methodology acknowledges the fundamental mathematics at the core of things I may actually take them more seriously.
I've always thought of story points as ideally* working like a kind of international currency exchange, where it is normal and expected that Team Teal Dollars will not have any permanent or consistent relationship to Team Maroon Doollars nor to Actual Time. (The saying "time is money" notwithstanding.)
The points will inflate or deflate over the course of months, or even abruptly change with team composition or shifting to new technologies or even vague morale issues. All that is normal, it captures important facets of work, and trying to stop it from happening only creates other problems.
What matters is that somebody is looking at near-past behavior and using it to make a near-future estimate.
* If someone tries to mandate a fixed arbitrary correspondence between points and real world time, that would be a non-ideal scenario for various reasons.
[+] [-] KothuRoti|1 year ago|reply
Overall I tend to avoid using story points, but on the few teams I worked on who really wanted to use it, I always framed it around building shared understanding rather than a metric that is actually useful for estimating work.
[+] [-] vladgiverts|1 year ago|reply
When there was disagreement, it helped them dig deeper to understand why and usually reveal somebody's incorrect assumptions.
It helped make sure teams didn't overcommit to the amount of stories they stuffed into a sprint and avoid either burning out, or worse, normalizing not finishing the sprint causing negative impacts to morale/motivation. (For some reason my teams often thought they could do more than the points implied!)
Most importantly, when large projects were proposed or were in progress, we were able to give realistic estimates to the various stakeholders about when to expect the various milestones to arrive, which bought us engineers a ton of credibility, trust, and respect with the rest of the company.
And yes, management wanted to see the story points and measure the team against them. I told them to F-off. Nicely. Kinda.
It helped that I was either a CTO or a senior enough exec in those cases with 3-8 agile teams. I essentially was the middle management and could put a stop to any destructive practices like evaluating teams against their velocity.
[+] [-] vannevar|1 year ago|reply
If your measure of the utility of story points is how well they help you estimate time, then you're right, they're useless to you. If you're on a scrum team, they're a useful back-of-the-envelope way to estimate which bits of validated functionality you're going to be able to get into the codebase this sprint. No one outside the scrum team should care about story points, and they certainly shouldn't be used to generate reports. Velocity is for the team's benefit, as a tool to help manage workload and schedule.
[+] [-] hnthrow289570|1 year ago|reply
The biggest sin of course is then trying to predict velocity, but the consequences of that usually just make people doing the reporting look silly for no reason. I think even the slow developers rarely get fired, and you also get nothing for clearing more story points than other developers.
No bonuses for higher velocity is the real reason no one takes it seriously.
[+] [-] bsder|1 year ago|reply
It's intuitively obvious, but I never realized it was that bad.
I'm going to have to run that down and see if it's actually backed by real data. Too many business books are complete flimflam.
[+] [-] JohnFen|1 year ago|reply
[+] [-] freitzkriesler2|1 year ago|reply
You really do need to control the amount of time your devs get distracted by other things. If they're solely focusing on that one project, you'll be fine.
Its why i personally appreciate XP more than plain old scrum.
[+] [-] recroad|1 year ago|reply
[+] [-] brightball|1 year ago|reply
The goal of mobbing around task breakdowns is to drive the communication and building that shared understanding. Just with an output of writing most of it down in a way that makes it easy to track progress and approximate the size at the same time.
[+] [-] shalmanese|1 year ago|reply
Of course, retro is usually the first thing to go in a deluded attempt to increase velocity and story points hang on as this vestigial tail, contributing to more cargo cult software engineering.
edit: One scenario that might play out:
The team all agreed that tasks A, B & C were worth 3, 5 & 1 points respectively but Steve and Mei thought task D was worth 5 points but Carol thought it should be 13 because it involved integration of an external API where her experience was that other API integrations in the past had exposed hidden complexity.
Task D ultimately was not completed because during the process of integration, it was discovered that the API did support a key feature that was needed and instead, an in house alternative needed to be built.
It was decided that going forward, the team would instead assign a 1 point task to building a toy app for any new API integrations that would then feed into the process of deciding the story points for features requiring that integration.
[+] [-] immibis|1 year ago|reply
[+] [-] shrimp_emoji|1 year ago|reply
You haven't discovered the secret formula: make your estimate and then mindlessly triple it.
[+] [-] burnished|1 year ago|reply
[+] [-] rerdavies|1 year ago|reply
Forecasting and management of expectations is necessary because software often has external real-world dependencies: available funding for a first release, marketing materials, hardware that can't be released without software, trade shows, developer conferences, yearly retail release parties, OEM partners, cyclic stable releases for enterprise customers who won't push software into production without extensive pre-testing, graphics adapters that can't be released without drivers, rockets that won't launch, cars that won't drive, etc.
All of these things require some degree of forecasting and appropriately-evolving management of expectations. Here's where we stand. Here's what we can commit to. Here's what we might deliver, but are willing to defer to a future release. Here are the (low value) items were under consideration that we will definitely defer to a future release, here are the features you will need to drop to get feature b in this release.
The purpose of story points is to help provide forecasting and management of expectations (with appropriately limited commitments) to stakeholders, on the understanding that forecasts are approximate.
Calibrated burndown of story points is pretty much the only basis on which forecasting and management of expectations can be done in an agile process. The key is to make sure that stakeholders understand the difference between forecasting and commitment, and to make sure your development team is appropriately protected by a healthy development process.
Whether the author's claim that you get better forecasts by just counting stories, instead of summing story points... color me skeptical. I do get that it prevents some obvious abuses of process, while enabling other that are just as bad. If somebody is using story points as a developer performance metric (which they shouldn't), there's nothing that prevents them from using completed stories as a developer performance metric (which they shouldn't). The corresponding abuse of process to combat that metric would be to hyper-decompose stories.
[+] [-] andrewstuart2|1 year ago|reply
[+] [-] SoftTalker|1 year ago|reply
The vast majority of our stories were 2, 3, or 5 points.
[+] [-] ozim|1 year ago|reply
After the sprint you can kind of infer the time but it should not be guideline for next estimations unless these are tasks like "fix typos".
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] danparsonson|1 year ago|reply
> ...burndown chart...
The x-axis of a burndown chart is time, right? So if you create a chart that measures points/time then you encourage the idea that a certain number of points can/should be completed in a day, ergo that points are a proxy for units of time. Otherwise what's the point in the chart?
[+] [-] xboxnolifes|1 year ago|reply
[+] [-] klysm|1 year ago|reply
[+] [-] Too|1 year ago|reply
There is nothing saying you can't refine work packages together in your team, while still using story points. That's actually how it's done almost everywhere. When items end up with a very high estimate there will be a push to refine it to something smaller. Something you should do but only as long as it still makes sense.
In fact the worst place i ever worked at was where we were given strict orders to break down every story until they all became 1 story point (still using story points...). Doesn't take a genius to figure out what happened next. All packages started having pointless micro-tasks with thousands of cross dependencies between them: "open the editor", "write a function" "write a unit test", "commit the code", "review the code". How am i supposed to write a unit test before the function signature has even been made? How am i supposed to iterate when finding bugs? More complex tasks still overran their estimates by factor 10, in fact even worse than before, some things just can't split, yet they still needed 1 point estimate.
Using the queue length and the impact on variability is still an interesting concept, i just don't think you should connect it with breaking down everything into single-sized items.
[+] [-] Karrot_Kream|1 year ago|reply
I think there's an inherent tradeoff between the overhead and misery of breaking down a task into granular subtasks and the variance of task completion time. In practice what this would mean using a queue style form of tracking would be that you trust your team to break down work and do time-bounded investigation into unknowns. Then you look at your task completion rate. Now measuring this as an RV gives you not just the average task completion time, as we reduce to using Little's Law, but also variance of task completion time. If we find task completion time to have too much variance despite the input queue length of tasks not actually growing very much (i.e. the arrival rate is staying stable and low), then it's probably worth having the team break down tasks in a more granular fashion (or maybe it's a single person who keeps making giant tickets or something). On the other hand if the team keeps complaining about straight-jacket ticket discipline, it's probably worth letting folks be more loose with task creation. There's a human element here but there always is, since it's humans who are doing the work, and that's fine.
I've always argued that output per person on a team should be modeled as RVs, but I really like this queuing approach and it's something I may bring up on my team. Again in practice I'd probably just track task completion times on a weekly basis. This would be much simpler than story points and instead of the bickering that comes with trying to break a ticket up, it would give engineers more autonomy over task creation.
I really like the idea.
[+] [-] hi_hi|1 year ago|reply
Then they introduce queues, which are made up of small tasks.
I admit I stopped reading at this point. Is the useful thing the queue, or the fact everything is now broken down into small tasks?
[+] [-] quantified|1 year ago|reply
Eatimating effort-time vs completion time are quite different, serving different stakeholders. A story that takes 1 point of effort by anyone's pointing system could still take a week due to any number of factors (crucial collaborator gets sick, laptop crunched by a car, ransomware attack, whatever). The only really estimable aspect is how long the developer will spend on the work, not when it will be done. Air speed, not ground speed.
That said, it's not clear how queue analysis helps when you haven't spent any time saying how long you might expect each task in the queue to be, or what the dependencies are between tasks within and across teams. Given engaged team members, I've gotten very good results on predicting the pace of progress for sprints, and it all never mattered because everything needed to be shipped. About 4-6 weeks before each X was to be completed we could say with confidence that X would be ready in 4-6 weeks. Not terribly useful.
[+] [-] bboygravity|1 year ago|reply
Manhours = story points Task = story Subtask = sprint Upcoming tasks = backlog Task turn around time = ?? Project goal(s) = epic(s)
I'm probably not even doing this right, but what are we doing here anyway?!
It gets real fun when the project involves software/firmware + mechanical engineering (think machines, robotics, etc), gotta love their faces when you teach them the special magic advanced project words for special software people.
[+] [-] parasense|1 year ago|reply
Another issue, as the blog points out, story points are a simple scallar number, and you cannot easily say how that sum is factored into risk, complexity, effort, etc... So why even have it? Clearly a story needs to be a kind of vector with these properties (risk, complexity, effort...) set to possitive or negative values, and then do the typical vector math to plot the dirrection and magniture of the story in the n-dimensional space... Then, one can actually calculate stupid ideas like velocity or accelleration to speak in the language of project/program managers.
Ultimatly the managers need a way to quantize the units of work done by development staff to better plan, and that's understandable, yet really hard. I think putting the cart before the horse is usualyl a dumb idea, and things should go back to measuring performan after the work is done, instead of estimating performance before work begins. Nobody wants to commit to a performance contract for each and every task ad nausium, and thats what happens durring sprints.
[+] [-] dilyevsky|1 year ago|reply
[+] [-] ozim|1 year ago|reply
[+] [-] arcbyte|1 year ago|reply
He knows exactly what story points are, he goes through them exhaustive, but strangely deriding them the whole time.
Then he concludes by purporting to invent the very practice you ALWAYS were supposed to have been doing to make story points work. You have to find a set of repeatable work to compare new stories to for reference. That's the whole game. That's his tasks "idea". That has always been part of every implementation and lesson on story points ive been exposed to.
It's a completely nonsensical article.
Story points have always been about queues and implementing Little's Law. Always.
Yes, it sucks to be on teams that just argue about points and don't work to refer to standard architectures for building blocks. That doesn't mean story points are broken, it's pointing out something else in your organization is broken.
A lot of real criticism in this comment, but props to the author for writing at least. It's more than I do as a part time internet complainer.
[+] [-] brightball|1 year ago|reply
Whatever story ports were always supposed to be, they aren't. Numerous people's real world experiences go sideways, because of the way that they are designed. You're setup for failure and confusion.
I never claimed to invent anything. I'm highlighting Donald Reinertsen's work that more people should be following.
The purpose of the article is to remind people of why everything is broken so that they can identify it and fix it, including examples.
[+] [-] hobs|1 year ago|reply
For a vast majority of managers in "software companies" things like story points are about asserting control over what is created, getting commitments from various folks, and then increasing stress to have you "sprint" constantly "behind schedule" so they can inject additional requirements or pivot to the new thing they want to do.
[+] [-] JohnMakin|1 year ago|reply
I had a terrible experience with them once. I was a relatively new, enthusiastic engineer on a struggling team of guys who'd been at the company a long time and were pretty burnt out. Inevitably I started getting all the "hard" stories with a lot of points, til it got to some stupid point where I was outputting about 80-90% of our team's combined story points. Management caught wind of it, didn't like it, so what they decided to do was adjust my points "downward" to be more in line with the rest of my team's output. It really irritated me, because it'd result in absurd situations where my teammates would get a "3 point" ticket that was like, just updating some field in a config file to an already-known value and checking it in somewhere, and I'd get this whole-ass project wrapped in a single ticket and they'd give me the same amount of points for it. And of course this was tied in to performance reviews, adding to how annoying it was.
Another super irritating thing that would happen is I'd be asked to estimate complexity on some super vaguely defined ticket describing some problem that would take a lot of detective work to even figure out how to solve, so how am I supposed to give an accurate estimate of complexity? If I knew that much, I'd already have probably fixed whatever the issue was.
[+] [-] al_borland|1 year ago|reply
Since we were tracking the actual time things took as well, our estimates got better over time. The team would help to keep others honest. For example, writing documentation always took at least 3x longer than anyone expected, so we’d alway make people add more time to those stories.
Once we had reasonably accurate time estimates, there was also a feature for available hours. If someone was on vacation, we’d subtract those hours while planning the sprint. It was then easy to see who was over committed, balance the work, and to make sure we were being realistic.
We worked like this for about 2 years. It was probably the best 2 years I had in the job. We got a lot done, we were being strategic about our work rather than reactive, and while we would push to get things wrapped for the end of the sprint and the demo, it never involved all nighters or heroics. We pushed because people on the team wanted to do more and go faster, not because of pressure from the outside. I noticed people got more down when they tried to do less, so I’d often push back on people trying to overload the sprint. If they finished everything, we could always add more. Outside the team, our VP told us we were 3+ months ahead of everyone and if I wanted to go hang out in a cafe in Europe for a few months, go.
This is all a distant memory now. So many lessons learned, but they all fall on deaf ears in the current organization.
[+] [-] rerdavies|1 year ago|reply
In my experience, story points allow forecasting that's as good as any forecasting method I've ever used. And I've used pretty much every schedule forecasting method over my long career.
The author touches on many of the reasons why story points don't work. And pretty much every reason he gives is something that you are not supposed to do.
The key to getting them to work is trust, and a commitment to never use story points as metrics to measure the performance of developers. Any attempt to do so will result in gaming of the system. The tradeoff that stories provide is lack of precision in exchange for not having to spend 50% of your development cycle up front doing detailed analysis required to provide detailed estimates (which never worked anyway).
Things you must also never do:
- compare calibrated burndown factors between teams.
- Ask why the calibration factor isn't N. THe calibration factor is.
- Have stories with more than N story points (where N is 3 or 5). Decompose them.
- Introduce the least bit of stress.
- Use burndown rates to generate commitments, instead of forecasts.
- Use forecast results to justify asking developers to work overtime.
The last point, I think, is particularly interesting. The manager who was my first scrum master made the following commitment to us: you will never work overtime again. And voluntarily working overtime will be consider a bad thing, not a good thing, since it impairs predictability of the team's productivity. "I know you don't believe me", he said. But he was right. We never worked overtime again.
[+] [-] heywire|1 year ago|reply
I work on a small professional services team customizing a couple of our products for our customers. A few times a month we get a request for an estimate to add a new feature or workflow. We do a high level customer requirements doc, discuss it as a team, the seniors from each area (design, development, qa) each provide an estimate in a range of hours. This all gets wrapped up into a final price to the customer. If they approve it, we dive into detailed design and have them sign off on the result. Then we go into development for weeks to months, qa, and then release. Our processes really haven’t changed in over 20 years. We’re constantly ranked as one of the most productive teams, and get high scores on our post-project surveys from our customers.
[+] [-] sumedh|1 year ago|reply
[+] [-] wsc981|1 year ago|reply
This is also stated in the article:
> Story points do not represent Time, yet the Velocity metric they are usually combined with defacto converts them to time, sabotaging everyone from the start by doing the thing that you can't do with a precise number and a range...adding them together.
Better yet, just don't bother with SCRUM and all it's pointless and time-consuming ceremonies and just get shit done. This is my preferred mode of working and I've been lucky to be able to work like this for the last couple of years.
[+] [-] kelnos|1 year ago|reply
I wish management types would get it through their heads that you just cannot reliably estimate most software development projects. (I said "most" -- there are of course exceptions.) You can't evaluate employee performance by looking at a burn-down chart. You can't show pretty graphs at the end of every sprint and expect that to predict the future of the project.
What you can do is set a reasonable deadline with your team, have them work toward it, and allow them to adjust your expectations on what exactly you will be getting by that deadline. Yes, establishing that deadline in the first place requires some sort of estimation, but story points, t-shirt sizes, etc. are useless for that. Everyone on the team sitting down, breaking things down into as-small-as-possible tasks, and coming up with time ranges for each task is the way to do that. Then you add up all the minimums and maximums and you have a time range for the whole project. But that range is still only a guess, and can't be taken as gospel. And it may be wild, like "somewhere between 6 weeks and 6 months", and you have to accept that.
That's it. That's the best you can do. As the project carries on, the only thing you can reasonably report on is the list of features or functionality that's been implemented so far, and the new range of the estimate based on what's remaining to do. You can also look at the completed work, and map out where in the per-task estimate range the team ended up hitting, but that still can't predict the future.
You especially can't evaluate performance based on this stuff. That requires being an involved (but not micro-manager-y) manager who knows the team and can identify when their people are shining bright, and when they are struggling (or just slacking off). It's called people management for a reason; you have to involve the humans in that process and can't evaluate them based on some made-up numbers and dodgy, hand-wavy math.
[+] [-] egeozcan|1 year ago|reply
[+] [-] ooterness|1 year ago|reply
In IP routers, the goal is to keep the congested link busy. i.e., Idle time from a momentary hiccup is wasted time. You need a small buffer to do this, but piling on more data adds latency without actually doing any good.
Algorithms like CoDel realized that a lot of previous attempts to make this were noisy as heck. Minimum latency through the queue is the signal that makes sense. Everything else is misleading or gives inaccurate predictions. Why should it be any different for managing tasks for human workers?
[1] https://en.wikipedia.org/wiki/CoDel
[+] [-] jf22|1 year ago|reply
Success with story points comes when everybody realized they are useless for anything outside of a dev cycle and when you realize that the effort into making them somewhat accurate is the valuable part.
[+] [-] tetha|1 year ago|reply
It lost me, because we have two estimations - an overall size guess of an epic and an actual implementation estimation of an epic. Like the overall size guess is just 2-3 seniors looking at an issue and wondering if this takes days, weeks, months or years to implement.
The actual implementation discussion is however what the article is talking about. We get most or all of the team into a meeting and we talk through what needs to be done, and structure all of that into individual concrete tasks everyone can have an idea of implementing them. And then we estimate those tasks.
And this estimation in turn is communication to management. Like, we've realized that about 21 is what one of us can do in a usual monthly iteration outside of massive outages and such (we're an operational team). So if an epic turns out to require some 3 21's and 3 13's... that can easily take 6-12 months unless we put exceptional focus on it. With high focus... as a team of 4-5, that will still take 3-6 months to do.
On the other hand, something that falls into a bunch of 5's and 9's and such tends to be muddled and struggled through regardless of whatever crap happens in the team much more reliably. It needs smaller chunks of overall attention to get done.
And note that this communication is not deadlines. This is more of a bottom-up estimation of how much more or less uninterrupted engineering time it takes to do something. A 21 in our place by now means that other teams have to explicitly make room for the assigned person to have enough headspace to do that. Throw two interruptions at them and that task won't happen.
It's more bin-packing than adding, tbh.
[+] [-] sirwhinesalot|1 year ago|reply
I'm not saying to estimate tasks accurately to the hour. Just rough ballparks: "I think I can do these 3 on friday". "Give me a week, it's gonna take a pile of tests to make sure we get it right". "Hmm I probably need the whole sprint for that, we don't have a clear view of the impact, it sounds like a small change but it uppends our current architecture".
These are the real discussions, the numbers are a silly abstraction on top of this and are unnecessary.
It should also be 100% expected that the estimates will be wrong. New bugs will show up, regressions will be introduced, requirements will change last minute. It should also be expected that some tasks that were supposed to get done won't be and other tasks that weren't in the planning could be snuck in. You are "Agile" are you not?
If you're not than just go back to waterfall and stop dragging the Agile Manifesto through the mud, thanks.
The entire point of the exercise is the discussion, it gives you some idea of what is likely to go well and be finished and what is a risk factor. If you're measuring "velocity" god help you.
[+] [-] fallinditch|1 year ago|reply
Story point allocation can be useful to give a quick and easy 'good enough' estimation of time/effort required for a significant chunk of work (epic).
I find that this approximate approach is almost always more accurate than trying to estimate every little task.
If the project manager and engineers try to break down a project into small granular tasks with time estimates then it's almost inevitable that the effort will be underestimated because it's virtually impossible to anticipate every sub task, blocker, unforseen delay, etc (and then there's the extra time it takes to manage all these micro tasks in your PM system!).
In such situations the old project manager trick of doubling all estimates tends to provide a more accurate timeframe.
This is why story points can be more accurate: because you are estimating the effort it takes to do something relative to your previous experience of similar workloads.
So, if you avoid estimating granular tasks and keep your estimates as the approximate amount of effort relative to something you've done before, then you will end up with a more realistic timeframe. Story points can help with this mindset. Also your team will not have to waste time faffing around in Jira too much, or whatever system you use.
[+] [-] ath3nd|1 year ago|reply
Making 4-5 highly paid professionals sit around in a circle with a bunch of cards trying to estimate how long/complex/??? the task is, writing it down, and doing that every two weeks, to dubious results, is exactly what one can expect out of today's businesses.
My guess is management doesn't trust that engineers aren't just fiddling around, so they hire a bunch of management-like people to watch over the drones, and show them who's in control. And how else you can show you are in control if you don't introduce magical rituals and tracking useless metrics, and make grown people participate? That way, all these meeting rooms can be occupied and the owners can feel like something important's happening and that they are getting their money's worth.
[+] [-] jmward01|1 year ago|reply
[+] [-] Terr_|1 year ago|reply
The points will inflate or deflate over the course of months, or even abruptly change with team composition or shifting to new technologies or even vague morale issues. All that is normal, it captures important facets of work, and trying to stop it from happening only creates other problems.
What matters is that somebody is looking at near-past behavior and using it to make a near-future estimate.
* If someone tries to mandate a fixed arbitrary correspondence between points and real world time, that would be a non-ideal scenario for various reasons.