Scalable Agile Estimation and Normalization of Story Points: Review of published scalable agile estimation methods (Part 3 of 5)
In Part 1 of this multi-part blog series, I introduced the topic of the blog series and provided an overview. Scalable agile estimation methods are required to provide reliable estimates of workload (work effort) and also reliable velocity metrics (both estimated velocity and observed velocity) at the team, program and portfolio levels of large-scale agile projects. Without reliable estimates of workload and reliable velocity metrics at all levels, effective and meaningful determination of costs, return on investments and project prioritization cannot be made. For scalable agile estimation methods to work properly, story points across the space dimension (teams, epic hierarchies, feature group hierarchies, goals, project hierarchies, programs, portfolios, etc.) as well as across the time dimension (sprints, release cycles) need to have the same meaning and the same unit of measure. In other words, story points need to be normalized so they represent the same amount of work across the space and time dimensions.
In Part 2 I reviewed the key requirements that must be satisfied for traditional velocity-based planning to work properly. I presented three key challenges associated with traditional velocity-based planning, and how they are exacerbated as agile projects begin to scale up. The three key challenges are:
- A single story point is unlikely to represent the same amount of work across teams and sprints.
- Bottom-up story point data is frequently not available for estimating work during program and portfolio planning.
- Yesterday’s weather model requirements may not hold for one or many of the multiple teams involved in large-scale agile projects.
In this Part 3 I present two published scalable agile estimation methods along with my critique. The first method is covered by Mike Cohn in his Agile Estimating and Planning book. The second method is a story point normalization method promoted by SAFe that I refer to as the “1 Point = 1 Developer Day Normalization Method” (1NM for short).
Method 1: Agile estimation techniques explained in Agile Estimating and Planning book
Mike Cohn partially addresses Challenge 1 (a single story point is unlikely to represent the same amount of work across space and time) by stating that “It is important that each of the teams estimates consistently. What your team calls three story points or ideal days had better be consistent with what my team calls the same. To achieve this, start all teams together in a joint planning poker session for an hour or so. Have them estimate ten to twenty stories. Then make sure each team has a copy of these stories and their estimates and that they use them as baselines for estimating the stories they are given to estimate” (Agile Estimating and Planning, Chapter 6, Page 58).
There are practical issues associated with this technique when you apply it to large projects: it is very difficult to pull together a large number of geographically distributed teams to play a joint planning poker session. This technique is even harder to scale across multiple teams supporting multiple portfolios spanning multiple programs. The total number of geographically distributed members may run into the hundreds in differing time zones. For large enterprises with a large number of projects, it will be difficult for members to have a joint planning poker session. More importantly, different teams may not have a good understanding of each other’s application domains to meaningfully participate in a joint planning poker session. Any estimation scheme that is based on centralized control and participation will have difficulty scaling up.
As portfolio planning is outside the scope of the book, it does not address Challenge 2: Bottom-up story point data is not available for estimating work during program and portfolio planning.
The book recommends that we should let an agile team work for a few sprints, and then start using its average velocity to forecast, assuming that the velocity has stabilized. It simply assumes that yesterday’s weather model is applicable. The book does not present any solution for estimating velocity when yesterday’s weather model does not apply.
Larman and Vodde in their Practices of Scaling Lean & Agile Development book (Chapter 5, pp. 182-183) state the following method for estimation of multi-team projects.
All members (or team representatives) of all teams join in a common estimation workshop and identify items for which they have common understanding. Then they estimate these, using planning poker with story points. This canonical set of items is a baseline of shared understanding and is used as the baseline of future estimation workshop. This cross-team estimation workshop occurs not only before the first iteration, but repeatedly in subsequent ones to re-synchronize.
The practical issues associated with Larman & Vodde method when applied to large projects will be similar to those associated with Mike Cohn’s method.
Method 2: SAFe’s approach to story point normalization
The Scaled Agile Framework® (SAFe™,) developed by Dean Leffingwell, is an interactive knowledge base for implementing agile practices at enterprise scale. SAFe’s approach for scalable agile estimation is described in the normalized story points blog summarized below.
In standard Scrum, each team’s story point estimation – and the resultant velocity – is a local and independent concern. In SAFe however, story point velocity must be normalized to a point, so that estimates for features or epics that require the support of many teams is based on rational economics. SAFe requires all teams use 2-week sprints and assumes about 20% time for planning, demoing, company functions, training and other overload. This leaves 8 workdays for each member in a 2-week sprint (further adjusted for any personal vacation, company holidays, part time work, etc.). The algorithm used by SAFe for normalizing teams to a common, starting story point and velocity baseline is as follows:
- For every developer tester on the team, give the team eight points (adjust for part timers).
- Subtract one point for every team member vacation day and holiday.
- Find a small story that would take about a half-day to code and a half-day to test and validate (as team effort). Call it a 1.
- Estimate each story relative to that one.
Example: Assuming a 6 person team composed of 3 developers, 2 testers, and one PO, with no vacations, then the estimated initial velocity = 5 members * 1 point/day * 8 work days = 40 points/sprint (5 people working for 8 ideal days in a two-week sprint). In this way, one story point is equivalent to one ideal developer day (IDD), and all teams estimate size of work in a common fashion, so management can thereby fairly quickly estimate the cost for a story point for teams in a specific region. Then they have a meaningful way to figure out the cost estimate for an upcoming feature or epic.
There is no need to recalibrate team estimating or velocities after that point. While teams will tend to increase their velocity over time – and that is a good thing – in fact the number tends to be fairly stable over time, and a team’s velocity is far more effected by changing team size and technical context than by productivity increases. And if necessary, programs and planners can adjust the cost per story point a bit. In our experience, this is a minor concern, compared to the wildly differing velocities teams of comparable size may have in the un-normalized case. That simply doesn’t work at enterprise scale, because you can’t base your decision on economics that way.
Although traditional story points are unitless numbers, normalized story points in SAFe represent work in ideal time unit. In SAFe, 1 normalized story point is equivalent 1 IDD. This is the reason I call SAFe’s method for story point normalization “1 IDD Normalization Method” or 1NM in short. It is interesting to note that SAFe gets away from the dogma that proclaims that story points must be unitless numbers and must have no connection with ideal time units. SAFe ties 1 standard story point to 1 IDD for all teams. It uses a hybrid scheme of relating relative sizes (story points) to ideal time units (IDD). Dean Leffingwell explains the merits of this hybrid scheme in his book Agile Software Requirements (Chapter 8, Page 152) as follows:
- The team can still use planning poker, as well as modified Fibonacci series, and gain most of those tangible benefits.
- The estimate is still a consensus and doesn’t say who is doing it. It’s not so political.
- They can start immediately. They have their first velocity estimate (8 * team members) on day one.
- The relative methods still avoid any tendency to overinvest in estimating.
- The translation to cost is obvious. Average the daily cost across all practitioners, including burden. The cost for one point is equal to that number, multiplied by 1.25 (because we also have to pay for the days that are not included in the IDD).
Mike Carey, SPC and agile coach, has posted his views on SAFe normalized story points in his blog normalized story points.
SAFe’s response to Challenge 1: A single story point is unlikely to represent the same amount of work across teams and sprints. As each team sets a standard for the 1 IDD story, and the estimation scale is limited to a finite range (Fibonacci scale of 1, 2, 3, 5 and 8 IDDs,) the team members can size stories quickly relative to the 1-IDD baseline story. Also, since all teams are using the same unit of measure (IDD), the roll-ups are in the same unit. So, if you have a feature broken down into 5 stories, each one being allocated to a different team, the roll-up of those discreet team estimates is a meaningful projection of the total size of the feature. Same kind of roll-ups will be meaningful across epics, feature groups, project hierarchies, programs, portfolios, etc.
As all story sizes across all teams, features, epics, programs are now expressed in the same unit (IDD), roll-up and all story point math will make sense. As estimates are now apple-to-apple comparisons (and in the same unit of measure), so too are the observed velocities. The roll-up is meaningful and more reliable than it would have been if each team were using its own unit of measure for story points and velocity.
What would happen if an enterprise has not adopted SAFe, or some projects in an enterprise are following SAFe and some are not, or some projects are following only a subset of SAFe framework and practices? These scenarios (“partial SAFe”) are lot more likely than an entire large enterprise following SAFe 100% for all its projects, programs and portfolios.
What would happen if some teams find it difficult to identify 1-IDD (1 normalized story point) baseline story? They may find stories of 1.3 IDD, 1.7 IDD, 2.1 IDD or 0.9 IDD, but may not find exact 1-IDD story. This could certainly happen. In all these scenarios, you will need a general solution for normalizing story points (more general than 1NM).
In SAFE, each team’s capacity is calculated based on its developers and testers (adjusted for part timers, vacation days and holidays). Although conceptually it is simple to understand the need to adjust each team’s capacity for part timers, vacations and holidays, no specific capacity calculation method is prescribed in SAFe. In my view, a lack of standard specific way to calculate capacity in SAFe may very likely lead to variations and inconsistencies in the ways used by different teams to do capacity calculations and adjustments for their sprints.
SAFe’s response to Challenge 2: Bottom-up story point data is not available for estimating work during program and portfolio planning. In SAFe, the feature estimation effort at the program level may go through three stages: preliminary, refined, and final.
- Preliminary (Gross, Relative Estimating) – In this stage, product management may simply need a rough, gross estimate before it goes to the teams for discussion. The feature doesn’t require an estimate in story points. Instead, a simple relative estimating model can be used in which the “bigness” of one feature is compared to another. This relative estimation of “bigness” scale could be 1, 2, 3, 5, 8, 13, and 20. Depending on complexity of design, implementation, and testing, the feature may be estimated only be product management, or may include representatives from system architecture, development, and the teams.
- Refined (Estimates in Story Points) – In this stage, estimates are done in story points. Specifying this level of refinement can most easily be done when there is historical data and new features can be compared to similar features whose story points are known based on historical information. The teams are often brought in at this point for discussion, though this form of grooming should be done on a cadence to minimize disruptions to the team.
- Final (Bottom-up, Team Based Estimating) – When the fidelity of an estimate needs to be improved, such as at a PSI/Release boundary or Release Planning Meeting, the teams do the estimate using a bottom-up approach by breaking the feature down into stories and estimating the stories in story points.
Note that there is no relationship between Preliminary (stage 1) “bigness” estimates and Refined (stage 2) estimates in story points. It remains to be seen how “bigness” numbers can be used to make effective decisions.
SAFe’s response to Challenge 3: Yesterday’s weather model requirements may not apply for a team. SAFe assumes that each team will satisfy yesterday’s weather model requirements within few sprints, i.e., it is assumed that Challenge 3 will no longer exist after a few sprints. SAFe does not explicitly address the challenge that yesterday’s weather model requirements are difficult to hold as you scale up.
In Part 4 of the blog series, I will present a scalable agile estimation method, called Calibrated Normalization Method (CNM). I have developed, taught and applied CNM by working with clients in agile training and coaching engagements since 2010. Part 4 emphasizes CNM bottom-up estimation (from teams to programs up to portfolios). I will also explain how CNM can be used by large enterprises that have a large number of projects that are mostly independent of each other.
In Part 5 I will explain how CNM performs the top-down estimation (from portfolios to programs down to teams). CNM estimates the scope of work at the portfolio and program levels without knowing lower-level story point details that are not available at this time in order to answer two important and legitimate business questions asked by management. In Part 5 I will also compare and relate 1NM with CNM, and explain how CNM fully address all three challenges explained in Part 2.
Acknowledgements: I have greatly benefited from discussions and review comments on this blog series from my colleagues at VersionOne, especially Andy Powell, Lee Cunningham and Dave Gunther.
Your feedback: I would love to hear from the readers of this blog either here or by e-mail (Satish.Thatte@VersionOne.com) or hit me on twitter (@smthatte).
Part 1: Introduction and Overview of the Blog Series – published on 14 October 2013.
Part 2: Estimation Challenges Galore! – published on 4 November 2013.
Part 4: Calibrated Normalization Method for Bottom-Up Estimation – published on 18 November 2013.
Part 5: Calibrated Normalization Method for Top-Down Estimation – published on 2 December 2013.