Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributing tensors across NUMA nodes #207

Open
shg8 opened this issue Apr 6, 2024 · 3 comments
Open

Distributing tensors across NUMA nodes #207

shg8 opened this issue Apr 6, 2024 · 3 comments
Assignees

Comments

@shg8
Copy link

shg8 commented Apr 6, 2024

I'm wondering how much support Neural Speed has for NUMA systems. The Advanced Usage page suggests that all tensors should be allocated on the first NUMA node numactl -m 0 -C 0-<physic_cores-1>. Is there any benefit to doing this?

@kevinintel
Copy link
Contributor

Without numa, the performance will drop a lot

@shg8
Copy link
Author

shg8 commented Apr 15, 2024

Without numa, the performance will drop a lot

I previously thought that this binds all memory allocations to the first NUMA node. However, this would increase internode traffic significantly. Additionally, each thread isn't able to fully utilize the memory bandwidth if the topology has different memory affinities for different nodes. Is my understanding correct? Could you kindly add a bit more to why we're not interleaving the memory allocations?

@kevinintel
Copy link
Contributor

Intel Xeon offen has 2 sockets, -m 0 aimed to bind the memory in first socket.
There are overhead of communcation between 2 sockets, if you want to reduce internode, you can try our TP.

@kevinintel kevinintel self-assigned this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants