Stop guessing, use data! (Or how we estimate at Lunar Logic)

In our line of business estimating software projects is our bread and butter. Sometimes it’s the first thing our potential clients ask for. Sometimes we have already finished a product discovery workshop before we talk about it. Sometimes it is a recurring task in projects we run.

Either way, the goal is always the same. A client wants to know how much time building a project or a feature set is going to take and how costly it will be. It is something that heavily affects all planning activities, especially for new products, or even defines the feasibility of a project.

In short, I don’t want to discuss whether we need to estimate. I want to offer an argument how we do it and why. Let me, however, start with how we don’t do estimation.

Expert Guess

The most common pattern that we see when it comes to estimation is the expert guess. We ask people who would be doing work how long a task will take. This pattern is used when we ask people about hours or days, but it is also in work when we use story points or T-shirt sizes.

After all, saying a task will take 8 hours is an uncertain assessment as much as saying that it is 3 story point task or it is an S size. The only difference is the scale we are using.

The key word here is uncertainty. We make our expert guesses in the area of huge uncertainty. When I offer that argument in discussions with our clients typically the visceral reaction is “let’s add some details to the scope of work so we understand tasks better.”

Interestingly, making more information available to estimators doesn’t improve the quality of estimates, even if it improves the confidence of estimators. In other words, a belief that adding more details to the scope makes an estimate better is a myth. The only outcome is that we feel more certain about the estimate even if it is of equal or worse quality.

The same observation is true when the strategy is to split the scope into finer-grained tasks. It is, in a way, adding more information. After all, to scope finer-grained tasks out we need to make more assumptions. If not for anything else we do that to define boundaries between smaller chunks. Most likely we wouldn’t stop there but also attempt to keep the level of details we’ve had in the original tasks, which means even more new assumptions.

Another point that I often hear in this context is that experience in estimating helps significantly in providing better assessments. The planning fallacy described by Roger Buehler shows that this assumption is not true either. It also pinpoints that having a lot of expertise in the domain doesn’t help nearly as much as we would expect.

Dan Kahneman in his profound book Thinking Fast and Slow argues that awareness of our flaws in the thinking process doesn’t impregnate us from falling into the same trap over again. It means that even if we are aware of our cognitive biases we are still vulnerable to them when making a decision. By the same token, simple awareness that expert guess as an estimation technique failed us many times before and knowledge why it was so, doesn’t help us to improve our estimation skills.

That’s why we avoid expert guesses as a way to estimate work. We use the technique on rare occasions when we don’t have any relevant historical data to compare. Even then we tend to do it at a very coarse-grained level, e.g. asking ourselves how much we think the whole project would take, as opposed to assessing individual features.

Ultimately, if expert guess-based estimation doesn’t provide valuable information there’s no point in spending time doing it. And we are talking about activities that can take as much as a few days of work for a team each time we do it. That time might have been used to actually build something instead.

Story Points

While I think of expert guesses as a general pattern, one of its implementations–story point estimation–deserves a special comment. There are two reasons for that. One is that the technique is widely-spread. Another is that there seems to be a big misconception of how much value story points provide.

The initial observation behind introducing story points as an estimation scale is that people are fairly good when it comes to comparing the size of tasks even if they fail to figure out how much time each of the tasks would take exactly. Thanks to that, we could use an artificial scale to say that one thing is bigger than the other, etc. Later on, we can figure out how many points we can accomplish in a cadence (or a time box, sprint, iteration, etc., which are specific implementations of cadences).

The thing is that it is not the size of tasks but flow efficiency that is a crucial parameter that defines the pace of work.

For each task that is being worked on we can distinguish between work time and wait time. Work time is when someone actively works on a task. Wait time is when a task waits for someone to pick it up. For example, a typical task would wait between coding and code review, code review and testing, and so on and so forth. However, that is not all. Even if a task is assigned to someone it doesn’t mean that it is being worked on. Think of a situation when a developer has 4 tasks assigned. Do they work on all of them at the same time? No. Most likely one task is active and the other three are waiting.

development team flow efficiency

The important part about flow efficiency is that, in the vast majority of cases, wait times outweigh work time heavily. Flow efficiency of 20% is considered normal. This means that a task waits 4 times as much as it’s being worked on. Flow efficiency as low as 5% is not considered rare. It translates to wait time being 20 times longer than work time.

With low flow efficiency, doubling the size of a task will contribute only to a marginal change in the total time that the task spends in the workflow (lead time). With 15% flow efficiency and doubling the size of the task lead time would be only 15% longer than initially. Tripling the size of the task would only result in a lead time that is 30% longer. Let me rephrase: we just increased the size of the task three times and the impact on lead time is less than one third of what we had initially.

estimation low flow efficiency

Note, I go by the assumption that increasing the size of a task wouldn’t result in increased wait time. Rarely, if ever, such an assumption would hold true.

This observation would lead to a conclusion that investing time into any sizing activity, be it a story point estimation or T-shirt sizing, is not time well-invested. It has been, as a matter of fact, confirmed by the research run by Larry Maccherone, who gathered data from ten thousand agile teams. One of the findings that Larry reported was that velocity (story points completed in a time box) is not any better than throughput (a number of stories completed in a time frame).

In other words, we don’t need to worry about the size of tasks, stories or features. It is enough to know the total number of them and that’s all we need to understand how much work there is to be done.

The same experience is frequently reported by practitioners, and here’s one example.

If there is value in any sizing exercise, be it planning poker or anything else, it is in two cases. We either realize that a task is simply too big when compared to others or we have no clue what a task is all about.

We see this a lot when we join our clients in more formalized approaches to sizing. If there is any signal that we get from the exercise it is when the biggest size that is in use is used (“too big”) or a team can’t tell, even roughly, what the size would be (“no clue”). That’s by the way what inspired these estimation cards.

Historical Data

If we avoid expert guesses as an estimation strategy what other options do we have? There is a post on how approaches to estimation evolved in the agile world and I don’t want to repeat it here in full.

We can take a brief look, however, at the options we have. The approaches that are available basically fall into two camps. One is based on expert guess and I focused on that part in the sections above. The other one is based on historical data.

Why do we believe the latter is superior? As we already established we, as humans, are not well-suited to estimate. Even when we are aware that things went wrong in the past we tend to assume optimistic scenarios for the future. We forget about all the screw-ups we fought, all the rework we did, and all the issues we encountered. We also tend to think of ideal hours, despite the fact that we don’t spend 8 hours a day at our desks. We attend meetings, have coffee breaks, play fussball matches, and chat with colleagues. Historical data remembers it all since all these things affect lead times, and throughput.

Lead time for a finished task would also include the additional day when we fought a test server malfunction, a bank holiday that happened at that time, and unexpected integration issue we found when working on a task. We would be lucky if our memory remembered one of these facts.

By the way, I had the opportunity to measure what we call active work time in a bunch of different teams in different organizations. We defined active work time as time actively spent on doing work that moves tasks from a visual board towards completion when compared to the whole time of availability of team members. For example, we wouldn’t count general meetings as active work time but a discussion about a feature would fall into this category. To stick with the context of this article, we wouldn’t count estimation as active work time.

Almost universally I was getting active work time per team in a range of 30%-40%. This shows how far from the ideal 8-hour long workday we really are despite our perceptions. And it’s not the fact that these teams were mediocre. Conversely, many of them were considered top performing teams in their organizations.

Again, looking at historical lead times for tasks we would have the fact that we’re not actively working 8 hours a day taken care of. The best part is that we don’t even need to know what our active work time is.


The simplest way of exploiting historical data is looking at throughput. In a similar manner we account for velocity we may get the data about throughput in consecutive time boxes. Once we have a few data points we may provide a fairly certain forecast what can happen within the next time box.

Let’s say that in 5 consecutive iterations there has been 8, 5, 11, 6 and 14 stories delivered respectively. On one hand, we know that we have a range of possible throughput values at least as wide as 5 to 14. However, we can also say that there’s 83% probability that in the next sprint we will finish at least 5 stories (in this presentation you can find the full argument why). We are now talking about a fairly high probability.

estimation probability - 83% chance that the next sample falls into this range

And we had only five data points. The more we have the better we get with our predictions. Let’s assume that in the next two time boxes we finished 2 and 8 stories respectively. Pretty bad result, isn’t it? However, if we’re happy with the confidence level of our estimate around 80% we would again say that in the next iteration we would most likely finish at least 5 stories (this time there’s 75% probability). It is true, despite the fact that we’ve had a couple pretty unproductive iterations.

new estimation probability - 75% chance that the sample falls into new range

Note, in this example I ignore completely the size of the tasks. One part of the argument why is provided above. Another part is that we can fairly safely assume that tasks of different sizes would be distributed across different time boxes so we are actually invisibly taking size into consideration.

The best part is that we don’t even need to know the exact impact of the size of a task on its lead time and, as a result, throughput. Yet again, it is taken care of.

Delivery Rate

Another neat way of using historical data is delivery rate, which is based on the idea of takt time. In manufacturing takt time presents how frequently manufacturing of an item is started (or finished). Using it we can figure out throughput of a production line.

In software development, workflow is not as predictable and stable as in manufacturing. Thus, when I talk about delivery rate, I talk about average numbers. Simply put, in a stable context, i.e. stable team setup, in the longer time frame we divide the time (number of days) by a number of delivered features. The answer we’d get would be how frequently, on average, we deliver new features.

We can track different time boxes, e.g. iterations, different projects, etc., to gather more data points for analysis. Ideally, we would have a distribution of possible delivery rates in different team lineups.

Now, to assess how much time a project would take all we need is a couple of assumptions: how many features we would eventually build and what team would work on the project. Then we can look at the distribution of delivery rates for projects built by similar teams, pick data points for the optimistic and pessimistic boundaries and multiply it by the number of features.

Here’s a real example from Lunar Logic. For a specific team setup, we had delivery rate between 1.1 and 1.65 days. It means that a project which we think will have 40-50 features would take between 44 (1.1 x 40) and 83 (1.65 x 50) days.

Probabilistic Simulation

The last approach described above, technically speaking, is oversimplified, incorrect even, from a mathematical perspective. The reason is that we can’t use averages if the data doesn’t follow normal distribution. However, from our experience the outcomes it produces, even if not mathematically correct, are of quality high enough. After all with estimation we don’t aim to be perfect; we just want to be significantly better than what expert guesses provide.

By the same token, if we use a simplified version of throughput-based approach and just go with an average throughput to assess the project the computation wouldn’t be mathematically correct either. Yet still, it would most likely be better than expert guesses.

We can improve both methods and make them mathematically correct at the same time. The technique we would use for that is the Monte Carlo simulation. Put simply, it means that we randomly choose one data point from the pool of available samples and assume it will happen again in a project we are trying to assess.

Then we run thousands and thousands of such simulations and we get a distribution of possible outcomes. Let me explain it basing on the example with throughput we’ve used before.

Historically, we had a throughput of 8, 5, 11, 6 and 14. We still have 30 stories to finish. We randomly pick one of data samples. Let’s say it was 11. Then we do it again. We keep picking until the sum of picked throughputs reach 30 (as this is how much work is left to be done). The next picks are 5, 5 and 14. At this time we stop a single run of simulation assessing that remaining work requires almost 4 iterations more.

software project forecast burn up chart

It is easy to understand when we look at the outcome of the run in a burn-up chart. It neatly shows that it is, indeed, a simulation of what can really happen in future.

Now we run such a simulation, say, ten thousand times. And we get a distribution of the results between a little bit more than 2 iterations (the most optimistic boundary) up to 6 iterations (the most pessimistic boundary). By the way, both extremes are highly unlikely. Looking at the whole distribution we can find an estimate for any confidence level we want.

software project forecast delivery range

We can adopt the same approach and improve delivery rate technique. This time we would use a different randomly picked historical delivery rate for each story we assess.

Oh, and I know that “Monte Carlo method” sounds scary, but the whole computation can be done in excel sheet with super basic technical skills. There’s no black magic here whatsoever.

Statistical Forecasting

Since we have already reached the point when we know how to employ the Monte Carlo simulation we can improve the technique further. Instead of using oversimplified measures, such as throughput or delivery rate, we can run a more comprehensive simulation. This time, we are going to need lead times (how much time has elapsed since we started a task till we finished it) and Work in Progress (how many ongoing tasks we’ve had during a day) for each day.

The simulation is somewhat more complex this time as we look at two dimensions: how many tasks are worked on each day and how many days each of those tasks takes to complete. The mechanism, though, is exactly the same. We randomly choose values out of historical data samples and run the simulation thousands and thousands of times.

At the end, we land with a distribution of possible futures and for each confidence level we want we can get a date when the work should be completed.

estimating distribution

The description I’ve provided here is a super simple version of what you can find in the original Troy Magennis’ work. For doing that kind of simulation one may need to employ support of software tools.

As a matter of fact, we have an early version of a tool developed at Lunar Logic that helps us to deal with statistical forecasting. Projectr (as this is the name of the app) can be fed with anonymized historical data points and the number of features and it produces a range of forecasts for different confidence levels.

To make things as simple as possible we only need start and finish dates for each task we feed Projectr with. This is in perfect alignment with my argument above that size of tasks in the vast majority of cases is negligible.

Anyway, anyone can try it out and we are happy to guide you through your experiments with Projectr since the quality of data you feed the app with is crucial.

Estimation at Lunar Logic

I already provided you with plenty of options how estimation may be approached. However, I started with a premise of sharing how we do that at Lunar Logic. There have been hints here and there in the article, but the following is a comprehensive summary.

There are two general cases when we get asked about estimates. First, when we are in the middle of a project and need to figure out how much time another batch of work or the remaining work is going to take. Second, where we need to assess a completely new endeavor so that a client can get some insight about budgetary and timing constraints.

The first case is a no-brainer for us. In this case, we have relevant historical data points in the context that interests us (same project, same team, same type of tasks, etc.). We simply use statistical forecasting and feed the simulation with the data from the same project. In fact, in this scenario we also typically have pretty good insight into how firm our assumptions about the remaining scope of work are. In other words, we can fairly confidently tell how many features, stories or tasks there is to be done.

The outcome is a set of dates along with confidence levels. We would normally use a range of confidence levels 50% (half the time we should be good) to 90% (9 times out of 10 we should be good). The dates that match the confidence levels of 50% and 90% serve as our time estimate. That’s all.

estimating range

The second case is trickier. In this case, we first need to make an assumption about the number of features that constitutes the scope of a project. Sometimes we get that specified from a client. Nevertheless, our preferred way of doing this is to go through what we call a discovery workshop. One of the outcomes of such a workshop is a list of features of a granularity that is common for our projects. This is the initial scope of work and the subject for estimation.

Once we have that, we need to make an assessment about the team setup. After all, a team of 5 developers supported with a full-time designer and a full-time tester would have different pace that a team of 2 developers, a part-time designer and a part-time tester. Note: it doesn’t have to be the exact team setup that will end up working on the project but ideally it is as close to that as possible.

When we have made explicit assumptions about team setup and a number of features then we look for past projects that had a similar team setup and roughly the same granularity of features. We use the data points from these projects to feed the statistical forecasting machinery.

Note: I do mention multiple projects as we would run the simulation against different sets of data. This would yield a broader range of estimated dates. The most optimistic end would refer to 50% confidence level in the fastest project we used in the simulation. The most pessimistic end would refer to 90% confidence level in the slowest project we used in the simulation.

In this case, we still face a lot of uncertainty, as the most fragile part of the process are the assumptions about the eventual scope, i.e. how many features we would end up building.

software project forecasting two distributions

In both cases, we use statistical forecasting as the main method of estimation. Why would I care to describe all other approaches then? Well, we do have them in our toolbox and use them too, although not that frequently.

We sometimes use a simple assessment using delivery rate (without the Monte Carlo simulation) as a sanity check whether the outcomes of our statistical forecast aren’t off the chart. On occasions we even retreat back to expert guess, especially in projects that are experimental.

One example would be a project in a completely new technology. In this kind of situation the amount of technical research and discovery would be significant enough that would make forecasting unreliable. However, even on such occasions we avoid sizing or making individual estimates for each task. We try to very roughly assess the size of the whole project.

We use a simple scale for that: can it be accomplished in hours, days, weeks, months, quarters or years? We don’t aim to answer “how many weeks” but rather figure out what order of magnitude we are talking about. After all, in a situation like that we face a huge amount of uncertainty so making a precise estimate would only mean that we are fooling ourselves.

This is it. If you went through the whole article you know exactly what you can expect from us when you ask us for an estimate. You also know why there is no simple answer to a question about estimation.

  • Marco

    Nice article… even though, following the Kanban community, I already knew about those things. :) How do you cope, in Lunar Logic, with different work item types? For example, statistical/historical data for new features could be bigger than, say, bug fixes. Do you estimante only new features? Do you have different estimations for different work item types?

    • Paweł Brodziński

      We ignore bug fixes altogether and focus on a number of features only. Let me give you an example. If we are in a project, and we have lots and lots of bug fixes and developers get pulled from working on new features two things can happen. Lead times may go through the roof, which means that developers interrupt work on new stuff to get bugs fixed. If not that then work in progress (counted as new features in progress) nosedive, which means that developers fix bugs between working on two different features.

      No matter which is the case the statistical forecast in such a situation would produce more pessimistic range.

      If we, on the other hand, can improve the quality it will be seen either in lead times improvement or bigger number of active features in progress. And again, once we feed the data into the statistical forecasting machinery it would produce more optimistic forecast.

      In typical scenarios we rarely need to explicitly take care of different type of tasks (a.k.a. different classes of service) so usually we just look at features.

      • Marco

        Not sure I’ve understood, Paweł. If bugs appears, you have to take care of them, so reducing time for features (LT increase/WIP decrease) or you have to count them, so obtaining the same effect.
        Obviously, better care on existing/ongoing features will reduce those numbers, but in any case, even 1 bug will (very slightly) change your LT/WIP and will be (again, very slightly) reflected in your data. This will produce a (very slightly) worse forecast for the next round… Am I correct?
        So, somehow, appearing bugs (don’t tell me you don’t have any, you liar!) ;) will increase your forecasted LT and reduce WIP. Probably by small amounts, I grant it, but you should see those changes.

        Instead, distinguishing the work item type (let’s say bug and feature), you should be able to “purely” forecast for each of them and also be able to plan next things you’ll work on, both considering forecasts and things to take care of.

        Or, at least, I think so. :)

        • Paweł Brodziński

          You mainly got the idea how an unplanned task, e.g. a bug, affects future forecasts right. What is important to stress here, though, is that to prepare any forecast we use historical data from our past projects. Why is it important?

          We do have bugs to fix. Few of them, when compared to what I saw over my career, but of course we don’t write ideal code. However, the same way as we will have bugs to fix in the new projects that we forecast right now, we had them in projects we did in the past, and thus statistical forecasting is fed with the data that was affected by such situations.

          This means that we actually took into account that some bugs would pop up, and thus we will be distracted to fix them. The mechanism you describe (we get the next forecast slightly more pessimistic) would be in work if the volume of bugs would be significantly bigger than what we normally have.

          We take care of that thing be maintaining a reasonable set of engineering practices and sustaining craftsmanship as one of key values we believe in. It means that I can hardly think of a project where the volume of bugs was really off the chart.

          By the way, if you run a forecast a few times using exactly the same historical data it may produce slightly different results. It is an inherent attribute of statistical simulation. These changes though are negligible, the same way as the impact of one bug fix is negligible to the estimate for the whole project.

          Let’s go back to distinguishing versus not distinguishing task types. We normally go with the former and for forecasting perspective only look at a number of features. This means that we take bug fixes and chores into account by the impact they had on lead times and WIP. Because of that we compare only historical data for features and need only information about features in a new project.

          We could employ a different approach though. We could use all the tasks from past projects to simulate a new one. It would however require us to assume some volume of tasks that are non-features. Looking at historical data we could find that out of the scope of the project some 40% tasks were bug fixes 50% new features and 10% chores. We could assume that the distribution in a new project would be the same (and having made the assessment about the number of features we can come up with the number of bugs). Or better we could run a Monte Carlo simulation to find out what is likely distribution in the new project based on past data. In either case we’d end up with an assessment how many tasks in total we’d end up with in a project. This would be a single number that would consist a sum of features, bug fixes and chores.

          To simulate such a project we’d need historical data that includes lead times and WIP for all the tasks, including bug fixes and chores and we’d be good to go.

          What I’m trying to say is that whichever method (either features-only strategy, or the one that includes all types of tasks undistinguished) is fine. The important thing is to use the same strategy when assessing the scope of a new project (or new batch of work) which we use when we pick historical data. That’s all.

  • Igors

    But how do you decide what is a feature and what isn’t? For example you are making Android application and you deliver a first set of features. Clients are happy and keep working on the second set. During that time new Android version is released and suddenly critical feature that is already delivered is unusable because on screen keyboard is hidding important content that needs to be visible while typing and after quick investigation you find out the only fix is to implement custom keyboard. Is that then a feature or bug fix? Do you count it in your data or not?

    • Paweł Brodziński

      Interestingly enough, we don’t care much how we label such edge cases. One may call it a feature, another may call it a big fix. Whatever the label is we need to write some code to get the app working. This is what counts at that moment.

      Now, the best part is that from estimation perspective it doesn’t matter much either. Let’s assume we call it a bug. We still work on it and it affects our lead times and throughput in a way I explained in the answer to Marco’s comment. In future estimates the fact that such unplanned work sometimes happens will be taken into account. After all we will use historical data.

      It is highly likely that in some projects we used to seed statistical forecasting similar situations happened so we even included it in our initial estimates.

      Now, let’s assume that we call it a feature. This means that we just increased the scope of the project. Now we have one feature more. We can run a forecast for adjusted scope and we will get updated numbers. The moment where we take into account such situations before they happen is when we talk about uncertainty about a number of features. Even if we get an exact list of features from a client, say 32, we would still run a forecast against a range of 30-40. It reflects our experience that scope always changes, and much more frequently it means growing, not shrinking.

      This way or the other the issue is taken care of so it doesn’t matter how we label a task. Whichever way we pick it make sense to do it consistently in the same way. But then again, if occasionally we label a task differently everything is OK.

      • Marco

        Oh, ok… this clarify a more what I understood above! So let’s say that “everything is a feature” (or wa may call it “task”, “something to do”, “foobar”, … whatever!): you just measure the time it take to complete such feature/task/whatever regardless its type and you’ll have your statistical data changing accordingly.

        What about the “size” of such tasks? I could suppose that, usually, bugs are probably smaller (in time/effort/”story points”/…) than actual new features so when resolving a bug you’ll be probably faster and your forecast will get better while, for features, your forecasts will increase. Won’t help you distinguish them, so that whether you’ll have a long list of “bugs” to solve you’ll have better forecasts, probably showing shorter time?

        Could you clarify with simple examples of what do you consider features/tasks? I’m still not clear, generally speaking, on how to cope with (relative) sizes: when a bug/taks is smaller than a feature/task, my forecast should show that I’ll do the former faster than the latter. If I have a customer waiting for such bug, I can’t use the whole lot of statistical data (including features or whatever else), when, possibly, bugs-only forecasts/data are “better”.

        • Paweł Brodziński

          The big part of the argument in the first part of the article is that the size doesn’t matter nearly as much as we think it does. In fact, Larry Maccherone’s research showed that throughput (sheer number of tasks) is equally good measure of progress as size-based approach (story points).

          It’s been broadly reported in Lean Kanban community that size is much less predictive than flow efficiency when it comes to assessing how much a team can accomplish.

          From that perspective, when we discuss all sorts of edge cases, like something that can be interpreted as a bug or a feature, or features that are super small, etc. there is no universal answer to follow. The best advice I can give is to keep a consistent approach, whatever the approach is. The reason is that the consistent approach would mean that we have interpreted things the same way in the past as we do it now. This would mean that historical data that we feed statistical forecasting with would reflect our current approach to build new projects. Which is exactly kind of the outcome we want to achieve.

          And just for the sake of transparency. For us features are typically things that add new functionality. We wouldn’t really fight though if someone call a feature something that is fixing a broken app though. Especially if it is a very occasional thing.

  • Katya Terekhova

    Hi Pawel, thank you a lot for the article!
    I have two questions, sorry if they are annoying.

    1. In the beginning of your post you say: “A client wants to know how much time building a project or a feature set is going to take and how costly it will be”. From my understanding your post covers mostly the part ‘how much building a project is going to take’, but not ‘how costly it will be’. How do you deal with it without ‘expert guesses’ at Lunar Logic? This part always puzzled me, I completely understand that expert guesses are not trustworthy and far from reality, but how is it possible to abandon them if you need to provide client not just with timeline of project/feature set, but also with a quote.

    2. Again, I may have gotten everything wrong, but it looks to me that if data sample has high variance (for example if margin values are popular: many iterations with low throughput and many with high), then forecasts based on it wouldn’t be really helpful (‘we can implement 10 features for 5-50 weeks’ – doesn’t sound like a good forecast). Obviously, system should be more or less stable, it can be achieved by getting to ‘pull’, limiting work in progress etc. Still, sometimes teams deal with unstable demand and it can also affect historical data (even if this is a super pull-system), it doesn’t mean they become fully idle, they might do some intangible stuff that doesn’t directly deliver value to the customer (like code refactoring). What would you do in that case? Ignore periods/iterations with low external demand?

    Thank you!

  • khairop

    Hi Pawel, This is a detailed and interested article. Thank you for sharing

    I have a question on the visual representation of wait time. Since Kanban is a pull system, wait time cannot be represented precisely on the kanban board. Wait time is the time measured after completed on the first service center and before start worked on the second center. If in your example (you don’t have any WIP limitations) there are 5 tasks in coding state, 2 of them are active so the 3 are still in wait state. I think the red line sizes cannot accurately measure the wait time if you dont keep notes of the exact completed timestamp of the previous state.


    • Alyona

      Hi, khairop,

      I could be mistaken, but probably it is enough to have “start time” – when the task is pulled into the progress and “end time” – when the task is ready for delivery. When you have this info and you know how much time was reported by programmers and qa engineers as “worked hours” you could easily calculate exact time of “waiting”.
      Please share your thoughts!