Stop Making Users Wait: Async Queues Explained
Async processing as a handoff: Django accepts the request, Redis holds the job, and a Celery worker finishes the slow work after the response is already back. Queues, workers, backlogs, retries, and when not to bother.
Watch (13:50)
Overview
Async processing as a handoff: Django accepts the request, Redis holds the job, and a Celery worker finishes the slow work after the response is already back. Queues, workers, backlogs, retries, and when not to bother.
Full transcript (from the video)
If you want a web app to feel instant, stop trying to finish every job inside the request. But most async architecture explanations start in the wrong place. The real move is a handoff. Django accepts the work.
Redis holds it and Celery finishes it after the user is already gone. Once that clicks, the queue, the broker, and the worker are not separate lessons anymore. They are one system. Here is the problem in its simplest form.
A request arrives. The server starts working on it and not quick work, but something genuinely slow. Maybe it is sending a welcome email through an email provider that takes a full second to answer. Maybe it is resizing an image or transcoding a video or calling a third-party service that is having a slow day.
The server handles it synchronously either way, step by step, while the user sits and waits. The page is frozen and the little spinner just keeps turning. And the part that really hurts is invisible. That request is holding a worker thread the entire time.
A web server only has so many threads while one is stuck on slow work. It cannot serve anyone else. So under real load, the threads fill up. New requests pile in behind them and the whole site feels slow.
Not because the computer is busy, but because everyone is waiting on a handful of stuck threads. That is the wall. Now let us get past it. The fix is to split the request in two.
When the slow work arrives, you do not do it on the spot. Instead, you accept it, write down that it needs doing, and immediately send the user a response. Usually something like, "Got it. We are working." The user is free in milliseconds.
The slow work has not happened yet, but it no longer lives on the request thread. It has been handed off. This is the whole heart of asynchronous processing. And everything else in this video is just the machinery that makes the handoff reliable.
Notice what changed. The user experience went from frozen. The thread that used to be stuck is now free to serve the next request. And the expensive work still gets done just somewhere else by something whose only job is to chew through that kind of work.
The question now becomes where does the work wait and who does it? That is where Redis and Celery come in. Three pieces make this work and each has exactly one job. First, Django, your web application.
It receives the request, does the fast part, and instead of doing the slow part, it writes a job description and drops it into a queue, then it responds. Second, Redis, an in-memory data store that here plays the role of the broker. It holds the queue. Every job Django creates waits in Redis until someone is ready to run it because Redis lives in memory.
Putting a job on the queue and taking one off are both extremely fast. Third, celery the worker. A celery worker is a separate process whose entire life is a loop. Ask Redis for the next job.
Run it. Ask for the next one. It never touches the web request. Django produces, Redis holds, Celery consumes.
Keep those three roles clear in your head because every pattern we are about to name is just a different way of looking at these same three parts. The first and most direct name for this design is the task queue and the name itself says almost everything. There is a queue and it holds tasks. When a request needs slow work done, it does not do the work.
It enqueues a task, a small message that says what to do and with what data. That task sits in line. Later, a worker takes it off the front and runs it. The crucial word is later.
The moment a job is created, and the moment it actually runs are completely separated, Django enqueues in milliseconds and moves on to the next request. The worker runs the job whenever it gets there. That separation is what unblocks your threads. It also means a sudden burst of work does not crash anything.
The jobs simply stack up in the queue and drain as fast as the workers can manage. The task queue is the foundation. Every other pattern we name is built on top of it. Let us make this concrete with the classic example, a video upload.
A user uploads a video to your site. Transcoding that video into different sizes and formats can take minutes. There is no world where the user should wait minutes for the upload page to respond. So here is what happens instead.
Django receives the upload, saves the raw file, and creates one task. transcode this video. That task goes into the Redis queue. Django immediately responds.
Upload complete. Your video is processing. The site feels instant to the user. Meanwhile, a celery worker running quietly on its own asks Redis for the next job, receives the transcode, and starts the slow work.
When it finishes, it can update the database, send a notification, whatever you like. The user got an instant response, the heavy lifting happened in the background and your web threads were never tied up. That is the entire pattern working from end to end. Now look at the same system through a different lens and you get the second name producer and consumer.
This is a classic pattern from distributed systems and it maps onto our stack perfectly. Django is the producer. It produces jobs. The celery workers are the consumers.
They consume and Redis is the queue that sits between them. The beautiful part is what this decoupling buys you. The producer does not know or care which worker runs its job or when or even how many workers exist. The consumer does not know who created the job.
Neither one calls the other directly. They only ever talk to the queue. That means you can scale them independently. If you need to accept more uploads, you add web servers which are more producers.
If you need to process the backlog faster, you add workers which are more consumers. The queue in the middle lets each side grow on its own without the other side ever needing to know. Let us give the thing in the middle its due because it is doing more than it looks. Redis here is the broker and a broker decouples the producer from the consumer in two dimensions.
In time, a job is created now. It runs 10 seconds later when a worker is free. The producer does not wait and in space. The producer and the consumer can be different processes on different machines and they never need each other's address.
They both just need the queue. This middle layer is also what makes the system resilient under pressure. When every worker is busy, new jobs do not fail and they do not block. They simply wait in Redis and the queue absorbs the backlog.
When the workers catch up, they drain it. So, a traffic spike becomes a queue that is briefly longer. Instead of a pile of errors, the broker turns bursts into patience. Here is a third lens, event driven.
So far, we have talked about slow work. But tasks do not have to be slow to belong on a queue. They can be reactions to things that happen. Think about the events in a typical application.
A user registers, a payment completes, or a file is uploaded. Each of these is a moment, and each one might need some extra work, like sending a welcome email or kicking off a longer process. In an event-driven style, the thing that happens simply announces it, and a task is dispatched to handle it. The event names what occurred, the queue carries out the response.
What is nice is that the code which detects the event does not need to know everything that should happen next. It just fires the task. You can add a second handler later. say also notify the analytic system without touching the original code.
The event is the trigger and the queue is how the reaction travels. Now an honest word because it is easy to oversell this. A Celery and Redis queue gives you a lightweight slice of event-driven design enough to react to events and fan out work. But it is not a full event driven architecture and you should not pretend it is. Real event-driven systems usually bring more machinery.
a dedicated publish and subscribe layer where many services listen to the same stream and sometimes event sourcing where the events themselves become the source of truth that is a much bigger commitment and most applications never need it. So here is the guidance. Start with the queue. It will carry you a remarkably long way.
Reach for the heavier event-driven tools only when the coupling between your services genuinely starts to hurt. When you have many services that all need to react to the same things until then the queue is exactly the right amount of power. The fourth lens is about scale. The distributed worker arc.
So far you might have pictured one worker but nothing stops you from running many. And this is where the design really pays off. Picture one shared queue in Redis and say five celery workers all watching it. When a job appears, whichever worker is free grabs it.
There is no central dispatcher handing out assignments, no manager deciding who does what. Each worker simply pulls the next available job and runs it. This is what distributed means here. The work is spread across many independent workers that coordinate only through the shared queue.
And because there is no central brain, scaling is almost embarrassingly simple. You do not rewrite anything. You do not reconfigure anything. You just start more workers on the same machine or on 10 different machines and they immediately start pulling from the same queue.
More workers, more throughput. Let us sit with that scaling story for a second because it is the real reward. Your total throughput, how many jobs you can finish per minute grows almost directly with the number of workers because the queue is the one shared handoff point and every worker feeds from it. When a spike of work arrives, the queue gets longer for a while and then drains.
No single request is ever dropped. It just waits its turn. And because workers coordinate only through the queue, you can place them wherever it makes sense. Put the video workers on machines with fast disks and plenty of memory.
Put the email workers somewhere cheap. Put heavy machine learning jobs on a machine with a graphics card. They all read from the same Redis queue but they can live in completely different places sized for completely different work. The queue is shared.
The workers are free to specialize and multiply around it. We do have to talk about failure because asynchronous work fails differently. When a normal request fails, you tell the user right then and they can retry. But an async job runs after the user has gone.
There is no one to show an error to. The system has to cope on its own. The good news is that the queue gives you the tools. If a job fails, Celery can retry it automatically after a short delay, which is perfect for a flaky third-party service that just needed a second chance.
You can set timeouts so a stuck job does not hang forever, and jobs that keep failing can be moved to a dead-letter queue. A side queue of failures you can inspect later instead of losing them. One principle ties this together. Make your tasks idempotent.
Running the same task twice does no harm. Once a task is safe to repeat, retries become safe. A huge category of failure simply stops being scary. So when should you actually reach for a queue?
Four situations are the clear yes. When the work is slow, anything a user should not wait on. When it is external, calling another service that you do not control and cannot trust to be fast. When it is scheduled, work that should happen later or on a regular rhythm.
and when it is bursty work that arrives in sudden waves you want to smooth out. If your work is none of those. If it is fast and the user needs the result on the very next screen, then keep it in line. Doing simple work through a queue just adds latency and moving parts for no reason.
That is the honest trade. A task queue buys you responsiveness, resilience, and scale too, but it costs a broker to run, workers to monitor, and a complex mental model. spend that complexity where it earns its keep and not a moment sooner. Step back.
Here is what is worth remembering. Event driven the task queue producer and consumer and distributed workers are not four different systems you have to learn. They are four names for one system. Each highlighting a different angle.
Task queue points at the handoff. Work goes in a line and is run later. Producer and consumer points at the roles. Who makes the work and who does it?
Event-driven points at the trigger, what causes a job to be created, and distributed workers points at the scale, how the work spreads across machines. Underneath all four is the exact same picture. We have been drawing the whole time. Django dropping a job into Redis and celery workers pulling it out.
Learn that one setup well, and you have not learned one pattern. You have quietly learned all four, and you will start to recognize them everywhere. So, where should you start? You start small with one queue in Redis, one celery worker and Django and queuing its slowest piece of work.
That alone, just moving one slow task off the request thread will make your application feel dramatically more responsive. And it is an afternoon of work, not a rewrite. Then let it grow with you. As your load rises and your jobs multiply, add more workers.
Give different jobs their own queues and lean on retries and dead-letter queues to stay resilient. The architecture is designed to grow exactly this way. You never have to throw the simple version away. Responsiveness, resilience, and room to scale, all from one small idea.
Respond now and do the slow work later. Go move one task to a queue and