Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Reap a peer when it trashes connection errors. #165

Open
Raynos opened this issue Nov 2, 2015 · 1 comment
Open

Reap a peer when it trashes connection errors. #165

Raynos opened this issue Nov 2, 2015 · 1 comment

Comments

@Raynos
Copy link
Contributor

Raynos commented Nov 2, 2015

In production we have a steady stream of connection errors on a small number of peers.

It would be nice if a peer self destructed itself once it hits a threshold of connection failures.

Got a connection error  9977     
destroying due to init timeout  9976     
resetting connection    9975     
6.6.6.6:59371   10416     
6.6.6.6:54872   8820     
6.6.6.6:50794   5940     
6.6.6.6:50568   2593     
6.6.6.6:49409   2084     

I presume we see so many connection errors because this peer exists and choosePeer() selects it even though its unconnected.

I cannot find out who triggers peer.connect() from the logs.

This could also be a bug in hyperbahn unadvertise or even dead peer reaper.

@Raynos
Copy link
Contributor Author

Raynos commented Nov 2, 2015

On second thought.

Maybe this is a genuine network partition.

I.e. some host A is able to make outgoing connections to hyperbahn but hyperbahn cannot connect back to it. i.e. some host A is half available inside the datacenter.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant