Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eRPC setup between two docker containers in two different machines #101

Open
repo4work opened this issue Sep 19, 2023 · 4 comments
Open

Comments

@repo4work
Copy link

I am trying to use and setup eRPC between two machines and I’m having issues when calling the RPC constructor. I seem to run into it even in the Hello World application included with the source code and outlined in the README.

I have tried to figure out why, but I haven’t come out with any solutions.

Here is information about the setup:

  • There are two machines running RHEL 8.6 with Intel 100GbE Network Adapter (E810-CQDA1), rdma-core version 37, and using Infiniband for transport.
  • eRPC is contained in two docker containers, one in each machine, with the following information: CentOS 8 for OS and rdma-core version 26.
  • One instance is meant to be the server and the other a client

The error I’m running into says the following and it seems to trigger when the RPC constructor is called:
“terminate called after throwing an instance of ‘std::runtime_error’
What(): Failed to open dev 0
Aborted (core dumped)”

Thanks

@ankalia
Copy link

ankalia commented Sep 22, 2023

Hi. I haven't tried eRPC on E810 NICs, but here are some thoughts.

  • AFAIK E810 does not support InfiniBand, though it supports RoCE. Have you tried building eRPC with DTRANSPORT=ROCE -DLOG_LEVEL=trace? The latter will enable some verbose diagnostics.
  • What's the output of ibv_devinfo from the container? Can you successfully run ib_read_bw between the two machines?

@repo4work
Copy link
Author

Hi Anuj,

  • Is there a difference between doing "-DTRANSPORT=Infiniband -DROCE=on" (which is what I'm currently doing) vs "-DTRANSPORT=ROCE"?
  • Output of ibv_devinfo in the containers are: Failed to open device
  • ib_read_bw works on the two machines

@ankalia
Copy link

ankalia commented Sep 26, 2023

Hi.

  • DTRANSPORT=Infiniband -DROCE=on is correct.
  • It wasn't clear to me if RDMA is working from within the containers. The message says that ib_read_bw works, but is it from within the containers? If so, it's surprising that ibv_devinfo fails but ib_read_bw works.
  • If ib_read_bw does not yet work from within the containers, that'll have to be fixed first independently of eRPC.

@repo4work
Copy link
Author

Hi,

To clear it up: The ib_read_bw works out of the containers, but does not work inside, along with ibv_devinfo. I am currently looking into the issue within the containers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants