How Python async exposes hidden coupling and backpressure

Asynchronous IO addresses the hidden waits that cause synchronous systems to more easily drift into backpressure.

Long operations yield instead of blocking, so unrelated work continues while the wait completes. The moment two steps depend on each other, the dependency becomes visible. This is the value of asynchrony. Coupling is exposed, as is load behaviour, and the early signs of saturation become more obvious before they turn into blocked threads and stalled pipelines.

The example code below shows how asynchrony changes the shape of time in a system and reveals the operational behaviours that synchronous code can hide.

The code also includes a small OpenTelemetry span, making behaviour observable from client through to server.

As the Conclusion shows, asynchrony has structural effects on execution flow and affects system‑level behaviour and has operational consequences.

Asynchrony: the fundamental idea

The fundamental idea is to put long-lived operations into a separate execution context so that other work can be progressed while waiting for the long-lived operation to complete.

In this way we overlap the completion of tasks. The long-lived execution is started, but the call to it returns immediately. The other work can then start.

It is important that the other work has no dependence on the long-lived work. This is necessary because if the second was dependent on the first, the first would have to complete before the second could be started. In this case, the two bodies of work would have to operate serially, not concurrently.

For concurrency to work, the work items must be independent of one another.

Our example

Our example is a long-lived RESTful call that receives client JPG data. During the long server wait, the client performs useful work preparing the next image file for upload.

In addition, observability support is shown by using OpenTelemetry, using a tracer and span to capture a timed, structured record of one operation. The observability for the client and server sides of the JPG processing are built by linking a span together.

A span is a timed, structured record of one operation. Client and server observability is built by linking many spans together.

Input/Output

Input/Output (IO) is a common operation that takes time. Your code might have to wait for:

a remote service to return such as a payment provider or cloud storage
a database query
a large file to be read into memory
machine learning inference
Kafka partitions to respond

A Synchronous Approach

When programming synchronously, the thread that is executing must wait (block) for the entire IO operation. In a single threaded program, nothing else is getting done while you wait.

In a threaded program using a thread pool, long IO wait times makes it more likely that all threads block.

For example, in a pool of 32 threads, if the average blocking time is 2s and 32 requests are received within 2s, all 32 threads will be blocked. A 33rd request will have to wait on a thread becoming available. Queued requests inherit the delay.

If you have a downstream system waiting on the 32 requests completing, you now have back pressure building and being communicated around your distributed system.

The slow remote service is slowing down your downstream service which is likely to slow any other downsteam services, which could be a user interface, giving your end user a degraded experience.

In a synchronous system you want to give yourself a heads-up when production moves towards backpressure. Observability of wait times, configuration (the use of 32 threads as opposed to another number), and resource usage --- all 32 threads being occupied --- are important signals to track.

Using asynchrony

Python's asynchrony provides:

The event loop — a scheduler within Python
Coroutines — functions that can suspend and resume
Tasks — scheduled coroutines managed by the event loop
Await points — explicit yield boundaries where a coroutine gives up control of the thread to the event loop

The scheduler will resume tasks whenever the condition they were waiting for becomes true. IO readiness is one such condition.

You mark functions as asynchronous coroutines using async. Asynchronous IO tasks are created with asyncio.create_task. await suspends the current coroutine at an explicit yield point, returns control to the event loop, and resumes the coroutine later when the awaited operation has completed.

Python asynchrony gives you concurrency, not parallelism

Concurrency is multiple units of work in progress at the same time. They may or may not run simultaneously. This is what Python async gives you.

Parallelism is multiple units of work executing at the same time on different CPU cores. Threads or processes give you this. async does not give you this.

Python async gives you concurrency, but not simultaneous execution, because:

asyncio runs on one thread
the Python global interpreter lock (GIL) allows only one Python bytecode stream at a time
the event loop schedules tasks cooperatively

async tasks overlap in time, but never execute at the same instant.

Python asynchrony is concurrency without parallelism.

You use create_task when you want concurrency. You use await when you want sequencing.

Operations that overlap in time

As we can overlap operations in time, we can reduce the total time it would take for a number of operations by performing some of these operations while a coroutine is waiting for IO to complete.

As a diagram, we have:

Descriptive alt text

In the asynchronous case, the current coroutine executes upload() which calls an asynchronous httpx.post operation. This coroutine suspends and it will not run again until the awaited IO completes; meanwhile, the event loop may run other coroutines if any are ready.

In this case, the awaited IO is the result from the server.

The Code

The example code consists of server.py and send.py.

The Server

The server code runs a FastAPI RESTful server that executes within Uvicorn.

Uvicorn is an ASGI server written in Python, used to run asynchronous web frameworks like FastAPI and Starlette.

ASGI is the Asynchronous Server Gateway Interface. This is the modern, async‑native standard that defines how Python web servers (like Uvicorn) talk to async web frameworks, such as FastAPI.

app = FastAPI(lifespan=lifespan)

@app.post("/receive_upload")
async def receive_upload(request: Request,
                         file: UploadFile = File(required=True)):
    tracer = request.app.state.tracer
    with tracer.start_as_current_span("upload",
                                      context=extract(request.headers)):
        # 'file' has a reference to the underlying file, in a temporary file.
        # Reading this could be a lot of bytes. Recommendation is to read
        # with: contents = await file.read()

        print(file.filename)

        wait_time = random.randint(5, 15)
        await asyncio.sleep(wait_time)
        return {"status": wait_time}

if __name__ == "__main__":
    uvicorn.run("server:app", host="127.0.0.1",
                port=int(sys.argv[1]), reload=False)

We start the webserver by calling run on uvicorn with the FastAPI application, known as server:app. server is the name of the module.

The port that uvicorn will listen on (that will form the endpoint for the client) is passed at run-time appearing in sys. Uvicorn reload is explicitly set to false. If this is true, uvicorn will restart the server whenever it detects that the server.py file has changed. Automatic reload is convenient when testing but a risk when we want to create a stable environment.

A long-lived server call is simulated by waiting a random number of seconds, between 5 and fifteen.

The wait calls asyncio.sleep because a call to time..sleep would suspend not only the coroutine, but the event loop and all IO notification. Nothing would happen for the duration of the sleep. There would be no concurrency.

Once the asynchronous sleep has returned, `receive_upload returns a simple JSON status result.

The Client

The client is where time is saved by overlapping the previous upload with work processing data for the next upload.

async def main():
    tasks = []
    start = time.perf_counter_ns()
    server_port = sys.argv[1]
    containing_zip = sys.argv[2]

    for filen in range(1, 11):
        print(filen)
        filename, buffer = get_file(containing_zip, filen)
        if is_jpeg(buffer):
            img = to_image(buffer)
            img = resize_image(img)
            img = centre_crop(img)
            img = strip_metadata(img)
            preview = generate_preview(img)

            compressed_bytes = compress_image(img, fmt="JPEG", quality=80)

            img_data = to_bytes(img)
            meta = read_exif(img_data)
            tasks.append(asyncio.create_task(upload(filename, img_data, server_port)))
            # await upload(filename, img_data, server_port) # synchronous call

main is marked async as it contains asynchronous code: the call to upload.

To demonstrate the time that can be saved asynchronously, the loop uploads the first 10 files from the containing ZIP file. This ZIP and its contents are from https://github.com/yavuzceliker/sample-images.

File 1 is processed and uploaded asynchronously, so the loop immediately returns to the top to start loading file 2 into buffer.

The buffer is checked to confirm the file format is JPG before processing that image with resize_image and others.

The image is converted to a preview to show that such work can be done at the client and the image is compressed to highlight the difference in size as an 80% quality image may be appropriate to upload for some. Any EXIF tags are read as a client-side operation, ready for upload, should that be required: demonstrating how work can be offloaded from the server to an asynchronous client.

Once processed, img_data is uploaded asynchronously.

Asynchrony is achieved with:

tasks.append(asyncio.create_task(upload(filename, img_data, server_port)))

We retain a reference to the Task that is created by asyncio.create_task. This is so we can subsequently wait for all tasks to complete via asynio.gather.

The line await upload is the synchronous call. This is used to compare performance below.

Upload

infile is the name of the JPG file from the ZIP. data is the processed bytes and server_port is the port the Univcorn server is listening on.

async def upload(infile, data, server_port):
    tracer = trace.get_tracer("send.py")

    with tracer.start_as_current_span("upload"):
        upload_url = f'http://127.0.0.1:{server_port}/receive_upload'

        files = {"file": (infile, data, "application/octet-stream")}

        async with httpx.AsyncClient(timeout=None) as client:
            try:
                headers = {}
                inject(headers)
                response = await client.post(upload_url,
                                             files=files,
                                             headers=headers)

                return True
            except httpx.HTTPError as exc:
                return False

To POST to the server, upload_url is formed, as is files which is the Octet-stream of the JPG data.

Observability

The client upload code uses a tracer to create a span for the upload operation. A span is the fundamental unit of work that you want to observe. You can use a span in multiple processes.

The mechanism that supports this is the call to OpenTelemetry's inject that loads observability span data into a dictionary (headers) that is passed as HTTP headers during the POST.

This is accessed like this on the server side:

@app.post("/receive_upload")
async def receive_upload(request: Request,
                         file: UploadFile = File(required=True)):
    tracer = request.app.state.tracer

tracer is loaded from the request. It contains the information provided at the client with inject.

Running the Code

First of all we start the server with:

python3.9 server.py 8000
...
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Then we run the client to send the image files.

# python3.9 send.py 8000 images/main.zip
1
2
3
4
5
6
7
8
9
10
14555048979

You will see 1 to 10 printed out, representing the 1st, 2nd, 3rd, ..., file uploaded. The large number at the end is the elapsed time to upload all ten files in nanoseconds. 14555048979ns is 14.5 seconds.

Note, the client does not terminate on its own.

Preparing to run a synchronous version

Update async def main in server.py so that the line:

#tasks.append(asyncio.create_task(upload(filename, img_data, server_port)))

is commented out, and that the line below it, which is:

await upload(filename, img_data, server_port)

is not commented out: it is initially.

Running the client

There is no need to rerun the server.

This time, when you run the client:

# python3.9 send.py 8000 images/main.zip

The 1, 2, 3 will appear but it will take longer, as 1 has to complete before 2 is started.

The full time to upload ten files in the synchronous case is 97603426760ns which is 97.6 seconds or 1 minute 37.6 seconds. This is 6.7 times slower that the asynchronous version for the same amount of work, processing and uploading 10 images.

Conclusion

Structural Effects on Execution Flow

When you introduce concurrency, any unseen dependency between steps becomes immediately visible because you cannot use it. This means asynchronous design is a diagnostic tool: it reveals coupling you did not know you had.

Once IO wait time is removed, the limiting factor becomes how efficiently you schedule, batch, and pipeline work. The system’s ceiling moves from "how fast is the remote services" to "how well do you structure the work".

When using await, you are marking a cooperative yield point. This forces you to identify which parts of your pipeline are IO‑bound and which are CPU‑bound. This separation is rarely visible in synchronous code, where everything looks uniform.

Because each await is a boundary, asynchronous code decomposes into smaller operations. This improves testability and reduces how far errors can travel.

The client performs image processing before upload. This is not just an optimisation; it is a design pattern. Asynchronous architectures naturally push work outward, reducing server load and improving system resilience.

System‑Level Behaviour and Operational Consequences

The example shows a 6.7× speedup, but the deeper effect is that latency variance is masked. Asynchronous systems degrade more gracefully under load because they do not accumulate blocked threads.

Because Python cooperative concurrency code already has explicit yield points and structured tasks, observability spans map naturally onto the program’s execution model. The tracing model aligns with the concurrency model, reducing instrumentation friction.

Once work is concurrent: cancellation, retries, and partial progress become normal. Asynchronous design implicitly pushes you toward idempotent operations and stateless boundaries. Idempotent means an operation that is "safe to run more than once". If an operation is not safe in this way you will produce corruption, duplication, or inconsistent system state.

In synchronous systems, backpressure appears as blocked threads. In asynchronous systems, it appears as queued tasks. This makes it measurable, observable, and tunable rather than a pathological failure mode where blocked threads you cannot reason about make the program appear stalled.

Failures no longer cascade through blocked threads; they propagate through task graphs. This changes how you reason about retries, timeouts, and cancellation.

Download the code

A ZIP of the code.

Read next: Latency is architectural
Most latency comes from retrieval hops and orchestration.

If this was useful, you can get more pieces like it in the Phroneses newsletter.

Subscribe →

I work with leaders and teams on clarity, capability, and momentum. Work with me →

How Python async exposes hidden coupling and backpressure

Jh Evans

Asynchrony: the fundamental idea

Our example

Input/Output

A Synchronous Approach

Using asynchrony

Python asynchrony gives you concurrency, not parallelism

Operations that overlap in time

The Code

The Server

The Client

Upload

Observability

Running the Code

Preparing to run a synchronous version

Running the client

Conclusion

Structural Effects on Execution Flow

System‑Level Behaviour and Operational Consequences

Download the code

Table of Contents

Asynchrony: the fundamental idea

Our example

Input/Output

A Synchronous Approach

Using asynchrony

Python asynchrony gives you concurrency, not parallelism

Operations that overlap in time

The Code

The Server

The Client

Upload

Observability

Running the Code

Preparing to run a synchronous version

Running the client

Conclusion

Structural Effects on Execution Flow

System‑Level Behaviour and Operational Consequences

Download the code

Related Articles

Table of Contents