Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling eRPC use without hugepages #23

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

vsbenas
Copy link

@vsbenas vsbenas commented Mar 19, 2019

Our machines have 2 NUMA nodes, but only one is connected to the network. Hence, running eRPC on half of the cores is efficient, but the other half experience significant performance issues.

This pull request is inspired by the HERD architecture to use heap memory, when numa_node is set to -1.
https://github.com/efficient/rdma_bench/blob/master/libhrd/hrd_conn.c#L117

It is now possible to set numa_node as erpc::kNoNumaNode on the Nexus constructor so that NUMA memory is not used on eRPC. Obviously this ends up being slightly slower for when such nodes are available, but in our configuration this achieved a 20.6% 23.1% performance increase, so I believe it's a good option to have for eRPC.

TODO:

  • Scaling up to kHugePageSize is unnecessary in this configuration, so to save memory it might be useful to avoid it.
    Update: It is still better to scale up in cases where buffer size varies between requests due to eRPC reusing buffers. This increased performance by 3%.

  • Add a test for this configuration.

@anujkaliaiitd
Copy link
Collaborator

Thank you for the pull request!

Is it good enough to instead allow each Rpc object to choose its NUMA node, instead of inheriting the Nexus's NUMA node? That can be easily done by adding an optional argument to the Rpc constructor.

The ability to work without hugepages is nice, but I would like to avoid the additional complexity unless we really need it.

@vsbenas
Copy link
Author

vsbenas commented Mar 19, 2019

That depends if Rpc objects register the node's memory with the NIC. I could not create a Nexus using the second numa node Failed to register mr. It is not connected to the network.

I don't know if such a scenario is at all common (two nodes, only one on the network), but the performance is much better using regular memory in our case. So "do we need it" really depends on how common such setup is.

About the complexity, it adds an extra branch inside the ~HugeAlloc() loop and one branch in HugeAlloc(). In terms of performance it should be negligible, but I understand that the code becomes more cluttered.

@anujkaliaiitd
Copy link
Collaborator

Thanks for the details. It's common to have only one CPU connected to the network, so it's important that eRPC works in this setting.

I'm unsure why registration fails with the second NUMA node. I'll look at this over the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants