Using Dask Dataframes instead of Pandas Dataframe in prefect tasks #3022
Answered
by
jcrist
Sinha-Ujjawal
asked this question in
Q&A
-
Hi, Thx |
Beta Was this translation helpful? Give feedback.
Answered by
jcrist
Jul 23, 2020
Replies: 1 comment 2 replies
-
Yes, but with a few caveats:
from dask.distributed import worker_client
@task
def calling_compute_in_a_task(filepath):
with worker_client():
df = dd.read_csv(filepath)
return df.describe().compute()
@task(checkpoint=False)
def using_checkpoint_false(filepath):
with worker_client():
return dd.read_csv(filepath)
@task
def compute_describe(df):
with worker_client():
return df.describe().compute() |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
jcrist
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yes, but with a few caveats:
You'll need to use a
worker_client
to submit work: https://distributed.dask.org/en/latest/task-launch.html#connection-with-context-managerYou shouldn't persist dask objects to prefect
Result
s, since those objects refer to resources stored on a dask cluster, not actual final values. You could eithercheckpoint=False
on functions that return dask objects (like adask.dataframe
)df.compute()
before returning from a prefect task, so that the result is a pandas object not a dask object.