Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'multiprocessing' does not work from within the data function flow #69

Open
bbassett-tibco opened this issue Jun 11, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@bbassett-tibco
Copy link
Collaborator

It is not possible to use the multiprocessing module from within a data function.

from spotfire import data_function as df

outs = [df.AnalyticOutput("out", "/tmp/out.sbdf")]
spec = df.AnalyticSpec("script", [], outs, """
import pandas as pd
import multiprocessing as mp
import time

def _task(n):
    time.sleep(0.5)
    return f'task {n}'

if __name__ == '__main__':
    with mp.Pool(2) as pool:
        args = [(i,) for i in range(20)]
        results = [pool.apply_async(_task, arg) for arg in args]
        out = pd.DataFrame({'x': [x.get() for x in results]})
""")
result = spec.evaluate()
print(result.summary)

The above code is expected to result in a data frame with one column (x) with twenty rows (from "task 0" through to "task 19"), but is instead failing on both Windows and Linux hosts with an error from pickle:

Error executing Python script:

_pickle.PicklingError: Can't pickle <function _task at 0x0000024C4DAD7380>: attribute lookup _task on __main__ failed

Traceback (most recent call last):
  File "data_function.py", line 417, in _execute_script
    exec(compiled_script, self.globals)
  File "<data_function>", line 14, in <module>
  File "pool.py", line 774, in get
    raise self._value
  File "pool.py", line 540, in _handle_tasks
    put(task)
  File "connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)

Pickle is failing since the data function is executing under the auspices of compile and exec (in the AnalyticSpec class), such that is not able to find the definition of the function to marshall to the multiprocessing pool interpreters. (An interesting discussion of the underlying issue can be found at https://stackoverflow.com/questions/31191947/pickle-and-exec-in-python, but the accepted solution appears to be based on the imp module, which has been deprecated since 3.4.)

@bbassett-tibco bbassett-tibco added the bug Something isn't working label Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant