Forecasting, confidence and estimation in lean software delivery

Back with another article on lean software delivery. This time, I’ll be discussing forecasting, confidence and estimation at a high level. I’ll write another article further in depth on how to estimate and what tools to use. Make sure you check out my DORA metrics article and MTTR discussion to get an idea what I’ve been discussing for software delivery.

What are percentiles in software delivery?

A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found.

In the context of cycle and lead times, percentiles help us understand the spread of our data more accurately and can help us make more informed predictions about our delivery times.

Let’s take an example.

Suppose you’ve measured the cycle time of 100 completed tasks, and you’ve found the following:

  • The 50th percentile (the median) cycle time is 5 days. This means that 50% of your tasks have a cycle time of 5 days or less.
  • The 85th percentile cycle time is 8 days. This means that 85% of your tasks have a cycle time of 8 days or less.
  • The 95th percentile cycle time is 13 days. This means that 95% of your tasks have a cycle time of 13 days or less.

By knowing these percentiles, you can make more informed predictions about future tasks. For example, you can say with reasonable certainty (85% confidence) that a new task will be completed within 8 days. If you want to be even more certain (95% confidence), you would predict a cycle time of 13 days.

The same logic can be applied to lead times as well.

Percentiles can give you a more nuanced understanding of your cycle and lead times, especially if there is variability in your data. If there is a lot of variability, averages can be misleading. For example, if you have a lot of tasks that take 2 days, but also a few tasks that take 30 days, the average might be 7 days. But the 85th or 95th percentile might be 15 days or 20 days, indicating that you have a long tail of tasks that take much longer than the average.

By focusing on reducing the cycle and lead times at the higher percentiles, you can improve your overall delivery times and become more predictable. This can lead to higher customer satisfaction and better business outcomes.

The problem with this is that only the task level is predictable at the moment? How would you roll this up and forecast? Before we get into that, I’ll talk about outliers.

What are outliers? Why do they matter?

In the context of cycle and lead times in Agile software delivery, outliers can provide valuable insights:

  1. Identifying bottlenecks or issues: Outliers can indicate potential issues in your delivery process. For example, if certain tasks are taking much longer to complete, it could mean that these tasks are more complex, there’s a knowledge gap within the team, or there’s a bottleneck in your process.
  2. Understanding variability: Outliers add variability to your data. High variability can make it harder to predict future cycle or lead times. It’s useful to understand what’s causing this variability so that you can make your process more predictable.
  3. Improving estimation: By studying outliers and the reasons behind them, you can improve your future estimations. If certain types of tasks often become outliers, this might mean that you need to adjust your estimation process for these types of tasks.
  4. Refining your process: Outliers can indicate areas where your process could be improved. Perhaps you need to break down large tasks into smaller ones, or perhaps you need to involve certain specialists earlier in the process.

When interpreting outliers, it’s crucial to not just discard them as “one-offs”. Each outlier has a story to tell, and understanding this can lead to significant improvements. However, it’s also important to not overreact to outliers. They don’t necessarily indicate a problem — sometimes, they’re just a natural result of variability in your process.

Now, going back to forecasting. How do we roll up and forecast out a larger epic or series of tasks? First, let’s define what probabilistic forecasting is.

What is probabilistic forecasting?

In the context of software delivery, probabilistic forecasting uses historical data like cycle times, lead times, or velocity to predict things like how much work can be completed in the future, or how long a specific task will take to complete.

Benefits of Probabilistic Forecasting

  1. Dealing with Uncertainty: Probabilistic forecasting embraces the uncertainty inherent in software development. Instead of providing a single, potentially misleading estimate, it provides a range of potential outcomes.
  2. Risk Management: Probabilistic forecasts can help you understand the risks associated with different decisions. For example, if there’s a 20% chance that a feature won’t be ready for a big marketing launch, you might decide to postpone the launch or start working on a contingency plan.
  3. More Accurate Over Time: As you collect more data, your forecasts will become more accurate. Probabilistic forecasting can also help you understand how accurate your past forecasts were, which can help you improve future forecasts.

Potential Drawbacks

  1. Complexity: Probabilistic forecasting can be complex and requires a good understanding of statistics. This might make it hard for some teams to adopt.
  2. Requires Historical Data: Probabilistic forecasting relies heavily on historical data. If your team or project is new, or if your way of working has drastically changed, you might not have relevant data to use.
  3. Misinterpretation: It’s possible to misinterpret probabilistic forecasts. For example, if you say there’s an 85% chance that a task will be done in 5 days, some people might hear that it will be done in 5 days, while others might focus on the 15% chance that it won’t be.
  4. Changes in Process: If your team’s process or context changes significantly (for example, if your team grows, if you start a different type of project, or if external conditions change), your historical data might no longer be a good predictor of the future.

OK, I want to do a probabilistic forecast! How can I start

Here’s a basic outline of how you might approach this for 1 epic:

  1. Breakdown: Decompose the epic into smaller tasks or user stories. The more granular these are, the more accurate your estimation will be. For example, my epic has 10 stories and 20 tasks. There are 30 tickets now.
  2. Understand your throughput per week historically: Ie, for the last 5 weeks. My team has completed 5,3,2,1,2 tickets per week. Alternatively, if you’re a scrumban team you may want to Estimate Each Task. You might need to use story points or t-shirt sizes at first, and then start collecting data on how long tasks actually take to complete before the estimation occurs.
  3. Understand an use Percentiles: This will eventually provide a range, ie I feel 90% confident we can deliver this software feature by June 30th. However, the other 10% might result in a July 30th release date. .
  4. Adjust for Dependencies and Risks: If there are dependencies between tasks or other risks that could slow down the work, take these into account in your forecast.
  5. Aggregate Estimates: Since you’re dealing with probability distributions, you can’t just add up the median estimates. You’ll need to use a technique like Monte Carlo simulation, which basically involves running thousands of simulations based on your estimates and seeing how the total time distribution looks like. Be sure to add a bit of buffer for those 30 tickets. There’s always work that gets uncovered down the road!

For a next article, I’ll walk through an in depth guide for forecasting and some tools on how to do the above. Remember, probabilistic forecasting is not about predicting the future with certainty. It’s about understanding the range of possible outcomes and their probabilities, so you can make informed decisions.

Finally, it’s important to update your forecast as you gather more data. Probabilistic forecasting is most effective when it’s used as part of an iterative process of estimation, work, learning, and re-estimation.

It’s also crucial to remember that these are just tools and techniques. They don’t replace the need for skilled people, good communication, and effective practices — instead, they can enhance these things and make your team more effective.

In conclusion, probabilistic forecasting can be a powerful tool for managing uncertainty in software delivery, but it’s not without its challenges. It should be used as part of a broader toolkit for project management and decision-making, and its results should always be interpreted with care.

If you need help forecasting and gathering all of the above, feel free to contact me. I’ve done this professionally in addition to my esports roles for a decade!

Latest articles:

Leave a Comment