Fast api is blocking long running requests when using asyncio calls #8842
-
Do you have an example of how to process long-lasting processes and not block main thread. We have some AI workload and processing take from 30s to 5 mins depending on the input. Results should be returned to the front end for the user to analyze. Can you advise or prove an example how to handle the call and not block main thread/uvloop? |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 3 replies
-
Here are the three main cases. The first one ( from fastapi import FastAPI
from time import sleep
from asyncio import sleep as async_sleep
app = FastAPI()
# Blocking call in async route
# Async routes run on the main thread and are expected
# to never block for any significant period of time.
# sleep() is blocking, so the main thread will stall.
@app.get("/async_will_block")
async def async_will_block():
sleep(10)
return []
# Blocking calls on sync route
# Sync routes are run in a separate thread from a threadpool,
# so any blocking will not affect the main thread.
@app.get("/sync_no_block")
def sync_no_block():
sleep(10)
return []
# Awaiting coroutines on async routes
# Awaiting an async function causes it to yield the main thread
# while it's waiting for an operation to complete, so it's not blocking the thread.
# asyncio.sleep(), unlike time.sleep(), is an async function, so it can be awaited.
@app.get("/async_no_block")
async def async_no_block():
await async_sleep(10)
return [] Depending on how you're running your AI workload, you might run into issues with Python threads competing with each other for access to the GIL, which is how the CPython interpreter ensures that its memory can't be corrupted by two Python threads manipulating the same memory structures at the same time. This means two Python threads cannot run Python code at the same time (but most numpy number-crunching operations are fair game). This is important because it means your main thread could still get stalled by another thread that's doing is calculations in Python. All Python frameworks share this issue, not just FastAPI, but it's still a thing to consider when running long loops. Issue #1224 also comes to mind. If you have very large data payloads that you're encoding in JSON, they could be stalling the encoder, which cannot free the GIL since it needs access to Python data structures to generate its output. |
Beta Was this translation helpful? Give feedback.
-
1 configure uvicorn (hypercorn ... ) for long time reponse , otherwise it will timeout. 2 put the task in run_in_threadpool from starlette.concurrency import run_in_threadpool
@app.get("/long_answer")
async def long_answer():
rst = await run_in_threadpool(my_model.function_b, arg_1, arg_2)
return rst run_in_threadpool is based on threads , if you want use an other process for that then you gotta do it yourself. -> encode/starlette#1094 close your issue if my answer is clear , thank you. |
Beta Was this translation helpful? Give feedback.
-
thanks, it helps |
Beta Was this translation helpful? Give feedback.
-
Not understand 100% why I should be doing this if i'm doing the following
as suggested by @sm-Fifteen. What is the benefit of running this in a thread pool? Could anyone clarify? Thanks in advance 🙏 |
Beta Was this translation helpful? Give feedback.
-
There isn't a benefit. One could argue that there is even a cost to it (threads are expensive to manage). You should use Note that when you have a long running task and you don't want it to block the main thread, you should run it in a separate process. Python only executes 1 thread at a time, so if your thread is not releasing the GIL you are still blocking the main thread. |
Beta Was this translation helpful? Give feedback.
-
@JarroVGIT Is it possible to make the "long running task" asynchronous, if I neither want to put it into a seperate thread nor want it block the main thread? How can we do that? |
Beta Was this translation helpful? Give feedback.
-
Well that depends on what kind of task it is. Is it CPU-bound (e.g. lots of calculations)? Then your could use |
Beta Was this translation helpful? Give feedback.
-
There are a few contributings issues to the class of problem I have (which many others seem to struggle with too - especially those of us with reasonable engineering experience, but insufficient specific experience with python, fastAPI and the many libraries and patterns for dealing with async. I have a fastAPI which loads a number of "heavy" models on startup, which until now, I just let complete before testing, but the code was written to check the model states and start returning responses with the model states immediately, so the API at least works. Howevever, getting that to work has proven very difficult (I'm sure it's not, I'm just lacking experience).
The only solution that works for me, seems to be to create BackgroundTask outside the app lifecycle, add a task but not await it, and just yield: background_tasks = BackgroundTasks()
transformer_model_container = TransformerModelContainer()
async def lifespan(app: FastAPI):
logger.warning("Starting FastAPI app...")
# This task loads two very large models in the background, whch can take up to a minute or more to load
# TransformerModelContainer has metnods which allow the API to check its state and wait until the models are loaded
# before sending back anything more than information messages.
# irrespective of which async approach to use here, this must not be awaited, or the call will block the app startup
# there are no pre or post or other events in the lifecycle of FastAPI, so this is the only way to run code
background_tasks.add_task(transformer_model_container.load_models)
logger.info("Yield will run next and signal the end of the startup phase...")
yield
logger.info("FastAPI has continued passed yield which means shutdown is happening...")
logger.warning("FastAPI app shutting down...")
transformer_model_container.clear()
app = FastAPI( title="Kwizbot AI API"
, version=__version__
, description="Kwiziq API for access to Kwizbot's advanced AI features"
, servers=get_open_api_servers()
, lifespan=lifespan
) This works but the task doens't seem to complete and although the routes return correct information about the models not being loaded, it's clear they need to be awaited or similar somewhere in order to run, so I'm trying to figure out where the best place for such code would be that would make most sense,. Having more than two states for the lifecycle as with other state based systems, might be a helpful addition (post-startup etc) or a pattern for handling this and similar issues. Unfortunately async/await is a bit of an antipattern in every language that leads to unintuitive behaviour (I'm more familiar with it in C#), which doesn't help, but I've not come across a siutation like this one and being new to the framework, libraries and fairly new to python (and with the ususual AIs suggesting answers that just don't work and aren't helpful makes this one of those tricky cases where nothing trumps an experienced human to get the right workaround...) |
Beta Was this translation helpful? Give feedback.
Here are the three main cases. The first one (
async_will_block
) is what you want to avoid.