Introduction

A few weeks ago my colleague and I were talking about the pros of using a queue to take some work out of the request cycle. He read an article that suggested that queues should be used for all external calls. I disagreed. Making broad recommendations like that generally don't pan out because there are always tradeoffs. It's bound to be much nuanced than that. In this article I want to explore a few considerations worth exploring before you jump to using a queue.

Pushing work off to a queue isn't free

The first thing worth considering is that pushing work off to some queue isn't without implications. There are many operational considerations. Firstly, how long does it actually take to push the work on to a queue? In the best case, it takes less time than the actual work you're trying to remove from the request cycle. However, that might not always be the case. It's possible that the call you're aiming to remove from the request has comparable latency. Say it takes 10ms to enqueue the job but it takes 20ms to make your API call, is it really worth pushing that work off to a queue?

Next, we need to remember that queues aren't faultless mechanisms for getting work done. Queues have operational issues we also need to cater to. The workers that service requests could be inundated, there could be issues with the broker responsible for relaying messages, or other issues that cause delays or other problems. We'll need to monitor the work done by our workers, implement sufficient retries and ensure we have mechanisms to process work that was unsuccessful (e.g. by using a dead letter queue). By pushing work out of the request to an entirely separate mechanism, we will inherit some operational overhead we need to cater to.

Consider the total latency

As a follow on from the previous point, it's worth considering the total cost. The main benefit of removing work from the request cycle is to provide a better experience for our customers. Ideally, we don't want our customers waiting if they don't need to. Using a queue doesn't necessarily automatically give us the time savings we're in search of. The only way to confirm we're getting the savings is to examine it!

How long does it take to enqueue a request to the broker? What does the latency distributions look like? What's the p99, p95, p90, etc... Does the queue have an uninteresting latency distribution? By uninteresting, I mean it's a unimodal distribution with a short tail. In other words, ideally it has one thin peak with little variance. The peak tells you the most likely range of latency you should expect. A thin peak (low variance) tells you the latency won't vary too far from that peak. The tail length indicates the upper bound of the latency. A shorter tail should mean requests shouldn't take too long in the worst case. You want your queue to be as boring as possible. It should be really quick to enqueue requests and the time to enqueue shouldn't vary significantly between different requests. You also want to consider the time it takes for a worker to pick up the request. You want workers to pick up requests quickly. Finally, how long does the worker actually take to do the work?

Now consider the latency distribution of the API call you're thinking of performing asynchronously. What does the latency distribution look like for that? In the worst case, it'll have an entirely different latency profile. It might not be as optimized as your queue, but you should confirm it! In the best case, it could be just as boring and unremarkable. If it is just as performant or more than enqueuing and servicing your request, then you haven't saved any time!

So in essence, how long does everything actually take? If it's negligible, it's quite possible a queue only provides additional overhead without considerably improving customer experience. If it does take long, quite likely the additional overhead is going to be necessary to provide a palatable customer experience.

Queues break the feedback cycle

Another very important consideration is that using queues breaks the organic feedback cycle afforded by the original request. When a request comes in from a customer's browser, we can leverage that request to provide feedback. Whether that feedback is informational or exceptional, the request provides a channel of communicating with the customer. Once we push work to a queue we need to establish an alternate mechanism of communicating with the customer.

It's important to note that feedback can take different forms. Sometimes we only need to provide unactionable feedback like "we're working on it!", however, there are other circumstances where we need to communicate with the customer. There could be a validation problem, or some other failure or exceptional problem. If the original request has been completed how do we communicate with the cuatomer?

Let's say you enqueue a request to send an email (by calling some external provider like MailGun). What happens if MailGun returns an error. How do you notify the customer of the failure? You'll likely need some additional APIs to provide the feedback cycle the original request afforded you. Whether that takes the form of additional REST, GraphQL or even socket based. You'll need some other way of establishing a communication channel.

Conclusion

You don't always need a queue! If the work you're offloading doesn't require significant time, a queue might be overkill. Don't assume always using a queue is what you want to do. Consider the operational costs of running a queue, look at the total latency savings, and ensure you have a formidable feedback mechanism for communicating with the customer.

You don't always need a queue

Sometimes you should do the work instead of throwing it over the wall

Table of contents

Introduction

Pushing work off to a queue isn't free

Consider the total latency

Queues break the feedback cycle

Conclusion