Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with numpy threading and multiprocessing #49

Open
danieljfarrell opened this issue Jan 26, 2021 · 1 comment
Open

Issues with numpy threading and multiprocessing #49

danieljfarrell opened this issue Jan 26, 2021 · 1 comment

Comments

@danieljfarrell
Copy link
Owner

I've run your script a few times on different setup. Interestingly, I obtained different results :)

The main reason is that numpy already has some multi-threaded function (and it spawns a number of workers that depends on python version and platform).1

Once numpy is forced to a single thread2 the best performance are obtained.
Multiprocessing is always slightly better than pathos (reasonable, as pathos uses multiprocessing as backend).3

With 12 cores and numpy set to 1 thread I got throughput_rays_per_sec around 430 with both Py 3.7.9, Py 3.8.5 and P 3.9.1 4

Default
image
Setting numpy to 1 thread
image


[1] This is evident by CPU usage that is double or 4 times the expected. How many workers are present depends on numpy compile settings for accelerated algebra libraries (BLAST and friends) and environmental variables. More info with: ```python import numpy as np np.show_config() ```

[2] i.e. by setting the following environmental variable before importing numpy

import os
NUMPY_THREADS = 1
os.environ["MKL_NUM_THREADS"] = str(NUMPY_THREADS)
os.environ["NUMEXPR_NUM_THREADS"] = str(NUMPY_THREADS)
os.environ["OMP_NUM_THREADS"] = str(NUMPY_THREADS)

[3] If pathos.pools.ProcessPool is used, performance are further reduced (roughtly 10%).

[4] For Py>=3.8 the following line is needed to prevent an AttributeError in multiprocessing (see: https://bugs.python.org/issue39414)

atexit.register(pool.close)

Originally posted by @dcambie in #48 (comment)

@danieljfarrell
Copy link
Owner Author

To-do

  1. Add scaling notes to docs Replace multiprocessing with pathos #48 (comment)
  2. Repeat using python script

They will have different scaling behaviour because the script retains full history - very slow! The CLI will saved reduced data using the --end-rays option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant