A Dash of Queueing Theory

Posted on Saturday Apr 2, 2016 by Tero Parviainen (@teropa)

I originally wrote this article in January 2015 for the Static Showdown. I'm republishing it here since Divshot has shut down.

The modern world is full of queues. There are queues everywhere from supermarkets and airports to web servers and databases. We organize ourselves in queues and we organize our work in queues. Therefore it is useful to understand a little bit about how these things called queues behave.

The problem is that the behavior of queues is actually pretty counterintuitive. Most queues that occur in the real world occur in situations with uncertainty and randomness. This brings us to the realm of statistical probabilities, which the human mind is not equipped to handle that well.

The field of queueing theory provides tools for getting around this: A quantitative approach combined with visualizations makes up for our lack of intuition. This article explores a few bits and pieces that I've found interesting when reading about queueing theory.

Queues in Product Development

Most of the ideas presented in this article are based on Donald G. Reinertsen's book The Principles of Product Development Flow, which, incidentally, may be the best book on product development processes that I have ever read.

In the book, Reinertsen examines the challenges that organizations have when developing new products, and suggests a quantitative economic framework for dealing with those challenges. That framework systematically demolishes many ideas we hold about managing product development. Ideas like “variance should be eliminated”, and “systems should be worked at maximum capacity”, and “centralized control is good” are challenged. These ideas are replaced with new ones, grounded on economic principles rather than business axioms and instinct.

One of the central themes of Reinertsen's book is the importance of queues. Specifically, there's a lot Reinertsen has to say about the fact that queues are so often ignored by product managers, who instead choose to focus on timelines and efficiency. Measurements about timelines and efficiency can be useful, but we can do better than that when managing highly uncertain activities like product development. Queues are a better tool. Or, at the very least, they are far too important to ignore.

While the ideas in Reinertsen's book are applied to the domain of product development, many of them apply to all kinds of situations where queues and randomness occur. That includes things like traffic control, department stores and restaurant kitchens, server resource management and software architecture. So, let's dive in.

Markov Processes

As we already discussed, real-world queues most often occur in situations where there is also randomness. We're not talking about ust any kind of random randomness, however. When reasoning about queues, one of the most useful tools we have is randomness modeled as Markov processes.

A Markov process is a collection of random events over time that has two parcticularly interesting properties:

The next event is more likely to occur sooner than later.
The occurrence of the next event is not dependent on what's happened earlier.

Here's what a Markov process looks like when plotted over time:

Markov processes - and the particular type of continuous-time Markov processes called Poisson processes that we are modeling - can be used to model many real-world events remarkably well. They are also very useful for our purposes, since they work well for approximating the arrival of tasks onto queues.

When a Markov process is used to model incoming tasks - bug reports, restaurant orders, HTTP requests - it is useful to plot it as a cumulative graph:

The graph simulates a new Markov chain each time. However, its general shape is the same from one instance to the next.

Visualizing Queues

A Markov process by itself does not form a queue - it's just an ever-increasing pile of work. But you can make a queue by combining two Markov processes: An arrival process that generates work, and a service process that performs work.

While the service process can also be modeled as a Markov process, it is not identical to the arrival process: When the queue is empty, the service process does not do anything. When there are no customers, there's nothing to do. When work does arrive, it is served following the rules of a Markov process.

We can plot the arrival process and the service process on top of each other:

If we fade out the arrival process from the diagram, what's left is the area between the arrival process and the service process. That is, the queue:

This visualization is called a Cumulative flow diagram, and it is a very good tool for visualizing queues. It essentially plots the size of a queue over time, and has a few properties that are particularly useful:

At any given t, the height of the band represents the size of the queue at that point in time.
At any given y, the width of the band represents the time it took for a unit of work to pass through the system - its combined time in queue and time to process.
The slope of the (upper) arrival edge visualizes the demand feeding into the queue.
The slope of the (lower) service edge visualizes the capacity of the service process to process work.

The Problem with Maximizing Capacity Utilization

Let's now switch our focus from the mechanics of visualizing queues to what the data actually represents. What the visualization above is showing is a queue where the arrival and service processes have an identical frequency. Service capacity matches the capacity demand.

Creating a queue such as this one may seem at first to be a reasonable thing to do: Resources are expensive, and it would make sense to try to match the amount of resources to the demand exactly - no more, no less. It doesn't make sense to have programmers or customer service people sit idle, does it?

However, if you look at the queues that form when resources are matched, you're likely to see relatively wide queue bands, when work arrives at a quick succession while the service process is stuck on a slower job. Furthermore, these bands don't seem to narrow down as quickly as they should.

This phenomenon is called diffusion, and it is something most people have poor intuition about (and I'm certainly one of them): If you have matching resources for your arrival process and service process, you might expect the size of the queue to stay pretty close to zero. But this does not happen. Even a single slow job can grow the queue quickly and it does not immediately “correct itself” back to an empty state. Queueing times quickly shoot up. Customers are waiting, bug reports are piling up.

“We simply cannot rely on randomness to correct the problems that randomness creates.”

— Donald M. Reinertsen, in Principles of Product Development Flow

Let's examine the situation a bit more closely by calculating some statistics:

Here we collect several numbers from each simulation:

Capacity utilization - The amount of time the service process was working (= was not idle). With matching capacities, as we have here, this tends to be close to 100%.
Percent unblocked - The amount of time the service process was unblocked (ready to take on work immediately). This is “100% - Capacity Utilization”.
Average items in queue - The average number of items in queue, waiting to be processed. It can be estimated as “(Capacity Utilization)² / (1 - Capacity Utilization)”. As capacity utilization approaches 100%, queue size grows exponentially.
Percent queue time - The average percentage of time spent in the queue for each time, of its whole cycle time. It can be estimated as being the same percentage as capacity utilization. This is a remarkable phenomenon: If you load your system to 95% capacity, you can expect for work items to spend 95% of their cycle time in the queue.

Adding Excess Capacity

If we allow differing capacities for the service and arrival processes, these numbers start to look very different.

For instance, if we have service process with a lot of excess capacity, the problems we saw above are greatly diminished: Percent queue time goes down and the service process is much better able to cope with the load. Even when congestion occurs, it is cleared quickly.

However, what also happens is capacity utilization goes way down. When queueing times are low, capacity utilization is also low. We have a lot of idle resources:

It doesn't seem to be possible to have a situation where capacity is highly utilized and queueing times are low. Plugging different combinations to the simulation above almost always makes one or the other get out of hand. Of course, since the processes include randomness, we may occasionally get lucky. But relying on luck is certainly not a viable strategy in the long term.

Eliminating Variability

If you can't have your cake and eat it too when it comes to queue times and capacity utilization in variable queues, perhaps trying to eliminate the variability would help?

This is, of course, easier said than done. How would you control incoming customers or HTTP requests so that their arrival is completely predictable? How would you arrange work so that the time it takes to fulfill each task is devoid of variability? These things may be desireable, but they are definitely not easy to accomplish.

In some cases, eliminating variability may not even be desireable. Reinertsen, in his book, asserts that in product development you can't and don't actually want to eliminate variability completely, no matter what techniques like Six Sigma may tell you. You can instead work to limit the consequences of variability with techniques like substituting one kind of variability for another. You can even make variability work to your advantage when there are asymmetric payoffs involved - when the value of a success is much higher than the cost of a failure.

But, even if you both could and wanted to eliminate variability, how would our queues behave?

A Deterministic Arrival Process

As we see here, even if we eliminate variability from the arrival process completely so that it becomes deterministic, queues still form unless there's significant excess capacity:

A Deterministic Service Process

If we flip the processes and make the service process deterministic (something that may be intuitively easier to imagine in the real world), a similar shape appears:

Reducing or eliminating variability on one process or other does not seem to make our problem go away. The size of the problem may decrease, but at what cost?

In fact, the relationship between variability and queue size is understood to be linear (whereas the relationship between capacity utilization and queue size is exponential). If you eliminate variability completely from one side, all you can expect to accomplish is to halve the queue size.

A Fully Deterministic Queue

It is only when we eliminate variability completely from both the service and arrival processes that we can achieve the holy grail of 100% capacity utilization and zero queue time:

However, such deterministic processes are beyond our reach in most real-world situations where queues occur - certainly those with humans involved. Also, as discussed earlier, in contexts such as product design and development, fully deterministic processes aren't even desireable.

Limiting Work in Progress

If there's an exponential relationship between queue size and capacity utilization, what if we simply choose to limit the queue size, or in other words, limit Work in Progress? What happens to queueing times and to capacity utilization?

This seems to be a reasonably effective strategy. If we combine a tight Work-in-Progress limit with a frequent arrival process, we can effectively break the relationship between capacity utilization and queue time. We can operate at high capacity, and still serve the work items that we serve with a low queueing time.

Of course, we do this at the expense of rejecting some of the incoming demand. When work-in-progress is at the limit, we simply drop any incoming work items. Those customers leave unserved. Those HTTP requests receive a gateway timeout error.

Simply dropping demand on the floor is certainly not the only thing you can do when you limit work-in-progress. Instead, this WIP limitation technique is a building block of several more advanced strategies:

You can introduce part time resources to essentially boost the service process (or add additional service processes for the same queue) after some queue size is reached. You can then remove the additional resources when the queue size goes back down. This strategy is often used in supermarkets, where an additional cashier is called in when queues form. Many people in IT are also familiar with the situation of being “called in” to help with a project that's behind schedule.
You can send a signal upstream that you are not ready to receive more work. Upstream processes can then halt their work or choose to pool work-in-progress in a cheaper queue. This kind of WIP signaling is the basis for Eli Goldratt's Theory of Constraints, where the WIP constraint of the a bottleneck is used to constrain the operating rate of the whole system. It is also the basis of the kanban system used in the Toyota Production System, where the WIP between processes is locally controlled and signals are sent upstream whenever WIP limits are reached.

Conclusion

Since queues are so prevalent and so important, it makes sense to try to reason about them quantitatively and model them as the probabilistic systems that they are. In this article I hope to have given you a flavor of this.

The descriptions I've given are incomplete and simplistic, given the breadth and depth of the topic. Perhaps most glaring omission is how I've ignored most of the economic effects of queues: It may sound like queues are always bad and should always be minimized. This, however, is as simplistic as saying “capacity utilization should always be maximized”. In fact, the trade-offs between queue size and capacity utilization can be made based on economic decisions. You can make such decisions if you can quantify the cost of both in the same terms.

This, and many other aspects of queues, variability, and WIP, are covered in The Principles of Product Development Flow, and I heartily recommend picking it up if these topics interest you.

Know Your AngularJS Inside Out

Build Your Own AngularJS helps you understand everything there is to understand about AngularJS (1.x). By creating your very own implementation of AngularJS piece by piece, you gain deep insight into what makes this framework tick. Say goodbye to fixing problems by trial and error and hello to reasoning your way through them

eBook Available Now