Skip to content

Using Dask Dataframes instead of Pandas Dataframe in prefect tasks #3022

Answered by jcrist
Sinha-Ujjawal asked this question in Q&A
Discussion options

You must be logged in to vote

Yes, but with a few caveats:

  • You'll need to use a worker_client to submit work: https://distributed.dask.org/en/latest/task-launch.html#connection-with-context-manager

  • You shouldn't persist dask objects to prefect Results, since those objects refer to resources stored on a dask cluster, not actual final values. You could either

    • set checkpoint=False on functions that return dask objects (like a dask.dataframe)
    • or always ensure you call df.compute() before returning from a prefect task, so that the result is a pandas object not a dask object.
from dask.distributed import worker_client


@task
def calling_compute_in_a_task(filepath):
    with worker_client():
        df = dd.read_csv(f…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@emkademy
Comment options

@mlessig
Comment options

Answer selected by jcrist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants