-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory management improvements #15
Comments
This is a tracking issue for changes we are implementing regarding allowing the user to specify how much memory an instance of On modern architectures (Pascal / 10x0 or later), unified virtual memory is generally not much slower, if at all, than explicitly managed memory. The GPU handles page faults and migrates those pages from CPU memory. This does make the kernel wait for those pages to be moved, but if you explicitly copy the memory in advance, you have to wait for those pages to be moved anyway. In some cases UVM could be faster: for example, suppose you have a transmission loss run, where all the rays go in one direction and never influence receivers on the other side. The pages holding the memory for the part of the field which is never touched by the GPU never have to be moved to the GPU at all! If they were explicitly managed, they'd have to be moved to the GPU once at the beginning and then back to the CPU at the end. The reason we chose unified memory for this project is because it is one codebase which can be compiled as C++ or as CUDA. Simply changing the allocation between Regarding your observed performance: SM utilization of 3.8% will never be caused by unified virtual memory. At worst UVM might reduce performance in a particular case from 100% to 50% or something like that, but it will not lose 96% of the performance. Furthermore, how the memory was allocated doesn't change anything about how often DRAM is read and written. Too much DRAM activity means the data access patterns of the kernel itself are poor relative to the GPU caches, for example because a large region of the field is being read and written to, and this can't fit in the cache. How many rays are being traced by the environment file you're using? What GPU are you using? |
Thank you for your patience. Maybe I'm using 1750 graphics card, which is not so good for new technology support. |
Allow the user--
--to specify parameters controlling the amount of memory used by
bellhopcxx
/bellhopcuda
in all run types.Currently:
BELLHOP
/BELLHOP3D
had an unacceptable default for the amount of memory to use (16 GB). This has been changed to a user-settable parameter and the default reduced.There should be one parameter for how much memory the user wants to allow the program to use. It should count the SSP, bathymetry, etc. against this value too. Things like the maximum number of eigenray hits should also be adjustable.
The text was updated successfully, but these errors were encountered: