Don't make this asyncio mistake

Posted on 2016-06-16 by Vlad Călin

Async programming in Python is still somewhat a new concept and although it gets more and more adopted by the community (Django 3 got asynchronous views support), it will still be a long time before it will become the preferred way of doing things.

A common mistake I see people making when trying to convert sync code to async is just converting sync code into async code, without adjusting the program workflow.

For example, this is the reference python code that does "work" in a blocking manner. We use time.sleep to simulate some blocking I/O in a very easy way. You can think about it as either some network activity such as making HTTP requests or disk operations such as reading large chunks of data from a file.

import time

def do_work(i):
    print('before', i)
    time.sleep(1)
    print('after', i)

for i in range(100):
    do_work(i)

Here, we will print each number from 0 to 99 with a one second delay. Nothing too fancy or even useful, we need something simple just to demonstrate the idea itself.

Next, we want to do those prints asynchronously. Meaning that we want to print all those numbers as fast as possible, not with one second delays between them (let's imagine that the sleep function actually does some I/O blocking processing such as making HTTP requests or reading from the disk).

The most obvious way to do this and how a lot of people will do it is like this:

import asyncio

async def do_work(i):
    print('before', i)
    await asyncio.sleep(1)
    print('after', i)

async def main():
    for i  in range(100):
        await do_work(i)

asyncio.get_event_loop().run_until_complete(main())

When we run it, we see that it runs exactly like before. Why?

Because although the code uses async features, we have still have a single execution path for the code: for each number, print the first message and then suspend until one second passes, then print the second message. But when we suspend the execution, the current event loop has nothing else to execute and just waits for asyncio.sleep to resume the execution.

When we adapt the code to use asyncio to it's full potential, we need to also restructure the code to parallelize the pieces of code that not need to wait input from somewhere else.

In this example, the whole block inside the for loop can be parallelized. There is no need for each iteration to wait for the previous iteration to execute. After we refactor the code we have

import asyncio

async def do_work(i):
    print('before', i)
    await asyncio.sleep(1)
    print('after', i)

async def main():
    await asyncio.gather(*[do_work(i) for i in range(100)])

asyncio.get_event_loop().run_until_complete(main())

Now, by using asyncio.gather, all the do_work calls are being fired at once, and they are scheduled inside the event loop to run. Once one hits asyncio.sleep, the next one will fire automatically and stop at its own asyncio.sleep and so on. Soon enough all do_work coroutines are waiting on their own asyncio.sleep. As soon as the asyncio.sleep call finishes and allows each coroutine to resume, they will all resume immediately.

So, if you are planning on migrating a project to asyncio or just playing around, don't forget to create a parallel execution paths for coroutines to run, so that pieces will actually run in parallel. We are so used to synchronous programming that we often miss this little detail and then are stuck wondering why our programs still waste time and resources doing nothing.

Remember, using asyncio.gather is a nice way to fire up multiple coroutines in parallel and wait until they are all finished. Another alternative is using asyncio.as_completed if you don't want to wait for all the coroutines to finish before continuing and scheduling other coroutines to run.

coroutines = [do_work(i) for i in range(100)]
for finished_coroutine in asyncio.as_completed(coroutines):
     # we still need to complete the coroutine invocation
     # otherwise we get warnings that this coroutine was
     # not awaited.
     # In this case, the await call returns instantly
     # because the coroutine is already completed.
     await finished_coroutine

If you want to read more about asyncio, the official Python documentation is the best source of information: https://docs.python.org/3/library/asyncio.html and more information about asyncio.gather: https://docs.python.org/3/library/asyncio-task.html#asyncio.gather