Fast api is blocking long running requests when using asyncio calls #8842

mindej · 2021-04-16T07:12:24Z

mindej
Apr 16, 2021

Do you have an example of how to process long-lasting processes and not block main thread.

We have some AI workload and processing take from 30s to 5 mins depending on the input. Results should be returned to the front end for the user to analyze.

Can you advise or prove an example how to handle the call and not block main thread/uvloop?

Answered by sm-Fifteen

Apr 16, 2021

Here are the three main cases. The first one (async_will_block) is what you want to avoid.

from fastapi import FastAPI
from time import sleep
from asyncio import sleep as async_sleep

app = FastAPI()

# Blocking call in async route
# Async routes run on the main thread and are expected
# to never block for any significant period of time.
# sleep() is blocking, so the main thread will stall.

@app.get("/async_will_block")
async def async_will_block():
    sleep(10)
    return []

# Blocking calls on sync route
# Sync routes are run in a separate thread from a threadpool,
# so any blocking will not affect the main thread.

@app.get("/sync_no_block")
def sync_no_block():
    sleep(10)
    re…

View full answer

sm-Fifteen · 2021-04-16T19:45:37Z

sm-Fifteen
Apr 16, 2021

Here are the three main cases. The first one (async_will_block) is what you want to avoid.

from fastapi import FastAPI
from time import sleep
from asyncio import sleep as async_sleep

app = FastAPI()

# Blocking call in async route
# Async routes run on the main thread and are expected
# to never block for any significant period of time.
# sleep() is blocking, so the main thread will stall.

@app.get("/async_will_block")
async def async_will_block():
    sleep(10)
    return []

# Blocking calls on sync route
# Sync routes are run in a separate thread from a threadpool,
# so any blocking will not affect the main thread.

@app.get("/sync_no_block")
def sync_no_block():
    sleep(10)
    return []

# Awaiting coroutines on async routes
# Awaiting an async function causes it to yield the main thread
# while it's waiting for an operation to complete, so it's not blocking the thread.
# asyncio.sleep(), unlike time.sleep(), is an async function, so it can be awaited.

@app.get("/async_no_block")
async def async_no_block():
    await async_sleep(10)
    return []

Depending on how you're running your AI workload, you might run into issues with Python threads competing with each other for access to the GIL, which is how the CPython interpreter ensures that its memory can't be corrupted by two Python threads manipulating the same memory structures at the same time. This means two Python threads cannot run Python code at the same time (but most numpy number-crunching operations are fair game). This is important because it means your main thread could still get stalled by another thread that's doing is calculations in Python. All Python frameworks share this issue, not just FastAPI, but it's still a thing to consider when running long loops.

Issue #1224 also comes to mind. If you have very large data payloads that you're encoding in JSON, they could be stalling the encoder, which cannot free the GIL since it needs access to Python data structures to generate its output.

0 replies

raphaelauv · 2021-04-21T16:24:47Z

raphaelauv
Apr 21, 2021

1 configure uvicorn (hypercorn ... ) for long time reponse , otherwise it will timeout.

2 put the task in run_in_threadpool

from starlette.concurrency import run_in_threadpool

@app.get("/long_answer")
async def long_answer():
    rst = await run_in_threadpool(my_model.function_b, arg_1, arg_2)
    return rst

run_in_threadpool is based on threads , if you want use an other process for that then you gotta do it yourself. -> encode/starlette#1094

close your issue if my answer is clear , thank you.

0 replies

chansonzhang · 2022-06-09T05:32:14Z

chansonzhang
Jun 9, 2022

1 configure uvicorn (hypercorn ... ) for long time reponse , otherwise it will timeout.

2 put the task in run_in_threadpool
from starlette.concurrency import run_in_threadpool

@app.get("/long_answer")
async def long_answer():
    rst = await run_in_threadpool(my_model.function_b, arg_1, arg_2)
    return rst
run_in_threadpool is based on threads , if you want use an other process for that then you gotta do it yourself. -> encode/starlette#1094

close your issue if my answer is clear , thank you.

thanks, it helps

0 replies

antoniogomezalvarado · 2022-07-03T19:34:53Z

antoniogomezalvarado
Jul 3, 2022

1 configure uvicorn (hypercorn ... ) for long time reponse , otherwise it will timeout.

2 put the task in run_in_threadpool
from starlette.concurrency import run_in_threadpool

@app.get("/long_answer")
async def long_answer():
    rst = await run_in_threadpool(my_model.function_b, arg_1, arg_2)
    return rst
run_in_threadpool is based on threads , if you want use an other process for that then you gotta do it yourself. -> encode/starlette#1094

close your issue if my answer is clear , thank you.

Not understand 100% why I should be doing this if i'm doing the following

@app.get("/async_no_block")
async def async_no_block():
    await async_sleep(10)
    return []

as suggested by @sm-Fifteen. What is the benefit of running this in a thread pool? Could anyone clarify? Thanks in advance 🙏

0 replies

JarroVGIT · 2022-07-03T20:14:56Z

JarroVGIT
Jul 3, 2022

There isn't a benefit. One could argue that there is even a cost to it (threads are expensive to manage). You should use run_in_threadpool for synchronous functions, so it becomes "asynchronous". If your function call is asynchronous, then you don't have to put it into a separate thread.

Note that when you have a long running task and you don't want it to block the main thread, you should run it in a separate process. Python only executes 1 thread at a time, so if your thread is not releasing the GIL you are still blocking the main thread.

0 replies

chansonzhang · 2022-07-16T02:23:39Z

chansonzhang
Jul 16, 2022

If your function call is asynchronous, then you don't have to put it into a separate thread.

@JarroVGIT Is it possible to make the "long running task" asynchronous, if I neither want to put it into a seperate thread nor want it block the main thread? How can we do that?

0 replies

JarroVGIT · 2022-07-16T07:17:57Z

JarroVGIT
Jul 16, 2022

Well that depends on what kind of task it is. Is it CPU-bound (e.g. lots of calculations)? Then your could use subprocess. Is it IO bound (e.g. lots of waiting for a long query to return a result, or a large file you need to download)? Then you can just await the long running task so it won't block.

3 replies

john-jaraceski Mar 17, 2023

I've had the same problem and I'm using subprocess strategy to deal with this situation. And the FastAPI seems to no longer freezing.

jayapratha111998 Apr 23, 2024

i too have that same problem...how can u addressing this issue by using subprocess. Share in detail

sm-Fifteen Apr 23, 2024

Doing heavy calculations on the API server directly is usually not a good idea. If you have to, you probably should use a library for doing math on large datasets, like Numpy or Pandas or Pola-rs instead of doing math in Python. Like I said in the selected answer, CPU work prevent threads in FastAPI's internal threadpool from freeing up the Python Global Interpreter Lock (which ensures Python threads can't corrupt Python objects by modifying them at the same time), which stops other threads from running.

Math libraries like Numpy and Pola-rs work around this by moving all the data into a single ad-hoc PyObject. Python can't operate on that object directly, all the operations on it are performed in C/Fortran/Rust, and usually don't need access to any Python values and don't have to run any Python functions, so they can release the GIL while doing CPU-intensive maths, which lets other Python threads (like your main Uvicorn/FastAPI thread, which is the one you really don't want to get blocked) run without problem. These libraries are also more efficient than doing math in Python, since they don't need to account for the fact that any of the values you're manipulating could be of any type; ndarrays and dataframes are always strongly typed, so they don't need to check the type of every value and encapsulate them like the reference Python interpreter does.

Subprocesses don't share memory, so there's no issue around the GIL, but you end up with a different problem where data needs to be serialized into Pickle/JSON/Protobuf/CSV/"a file" in order to be passed around processes, and a lot of Python tools can get tricky to use and will fail in interesting ways if you start using them across process boundaries.

I'd really just reccomend either:

Avoid CPU-bound calculations
- Don't do any big calculations on the server if you can avoid it, and either pre-compute the things you'll need or leave it to an external service, similar to how you would use a database.
- Use array-native math libraries for computations, so you can free up the GIL while running those calculations and let non-math threads run in parallel
- Cache the return values of your number-crunching functions if you expect multiple clients to be asking for data with the same parameters
Use worker processes to have your FastAPI app run in multiple processes in parallel and have something take care of dispatching between them.

gruffdavies · 2024-06-12T09:44:05Z

gruffdavies
Jun 12, 2024

There are a few contributings issues to the class of problem I have (which many others seem to struggle with too - especially those of us with reasonable engineering experience, but insufficient specific experience with python, fastAPI and the many libraries and patterns for dealing with async. I have a fastAPI which loads a number of "heavy" models on startup, which until now, I just let complete before testing, but the code was written to check the model states and start returning responses with the model states immediately, so the API at least works. Howevever, getting that to work has proven very difficult (I'm sure it's not, I'm just lacking experience).

If I use any async approach and await a task in the startup lifecycle, then the app startup won't complete.
I can't move the await to after yield as this is now in the teardown part of the lifescyle, not the absent "running" state.

The only solution that works for me, seems to be to create BackgroundTask outside the app lifecycle, add a task but not await it, and just yield:

background_tasks = BackgroundTasks()
transformer_model_container = TransformerModelContainer()


async def lifespan(app: FastAPI):
    logger.warning("Starting FastAPI app...")
    # This task loads two very large models in the background, whch can take up to a minute or more to load
    # TransformerModelContainer has metnods which allow the API to check its state and wait until the models are loaded
    # before sending back anything more than information messages.
    # irrespective of which async approach to use here, this must not be awaited, or the call will block the app startup
    # there are no pre or post or other events in the lifecycle of FastAPI, so this is the only way to run code
    background_tasks.add_task(transformer_model_container.load_models)
    logger.info("Yield will run next and signal the end of the startup phase...")
    yield
    logger.info("FastAPI has continued passed yield which means shutdown is happening...")
    logger.warning("FastAPI app shutting down...")
    transformer_model_container.clear()

app = FastAPI(  title="Kwizbot AI API"
              , version=__version__
              , description="Kwiziq API for access to Kwizbot's advanced AI features"
              , servers=get_open_api_servers()
              , lifespan=lifespan
              )

This works but the task doens't seem to complete and although the routes return correct information about the models not being loaded, it's clear they need to be awaited or similar somewhere in order to run, so I'm trying to figure out where the best place for such code would be that would make most sense,.

Having more than two states for the lifecycle as with other state based systems, might be a helpful addition (post-startup etc) or a pattern for handling this and similar issues. Unfortunately async/await is a bit of an antipattern in every language that leads to unintuitive behaviour (I'm more familiar with it in C#), which doesn't help, but I've not come across a siutation like this one and being new to the framework, libraries and fairly new to python (and with the ususual AIs suggesting answers that just don't work and aren't helpful makes this one of those tricky cases where nothing trumps an experienced human to get the right workaround...)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast api is blocking long running requests when using asyncio calls #8842

{{title}}

Replies: 8 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Fast api is blocking long running requests when using asyncio calls #8842

Replies: 8 comments · 3 replies

Replies: 8 comments 3 replies