Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpecPaxos replica crashes upon roll backs and when merging logs #4

Open
ramanala opened this issue Oct 7, 2019 · 0 comments
Open

Comments

@ramanala
Copy link

ramanala commented Oct 7, 2019

Hello, I have been trying to play around with the SpecPaxos implementation. The scenario I'm trying is simple: I run five replicas on a single machine (listening on different ports) and I have five clients sending requests in a closed loop. I understand that for SpecPaxos to deliver high throughput and low latency, the network needs to provide ordered delivery (at least for most of the time). If not, there will be many conflicts, leading to many roll backs that can hurt performance, but the system must keep making progress.

However, in the above scenario, I see that the replicas start to crash after a while. Once two replicas crash (in a five-node cluster), the clients block indefinitely.

Details:

I compiled with the paranoid flag on.
Here is how I start the servers:

./bench/replica -c ./conf -i 0 -m spec >rep0 2>&1 &
./bench/replica -c ./conf -i 1 -m spec >rep1 2>&1 &
./bench/replica -c ./conf -i 2 -m spec >rep2 2>&1 &
./bench/replica -c ./conf -i 3 -m spec >rep3 2>&1 &
./bench/replica -c ./conf -i 4 -m spec >rep4 2>&1 &

Here is how start the clients:

./bench/client -c ./conf -n 1000 -m spec >cli-0 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-1 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-2 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-3 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-4 2>&1 &

Here is the stack trace of a replica that is crashing:

20190907-154417-2122 17865 * MergeLogs (replica.cc:820): [2] Merging 3 logs
20190907-154417-2124 17865 PANIC MergeLogs (replica.cc:1060): Assertion `newEntry.viewstamp.view == entry.view()' failed
20190907-154417-2124 17865 ! Backtrace (message.cc:169): Backtrace:
20190907-154417-2128 17865 ! Backtrace (message.cc:220): 0: _Z6_Panicv+0x9 [0x440314]

20190907-154417-2130 17865 ! Backtrace (message.cc:220): 1: _ZN9specpaxos4spec11SpecReplica9MergeLogsEmmRKSt3mapIiNS0_5proto19DoViewChangeMessageESt4lessIiESaISt4pairIKiS4_EEERSt6vectorINS_3Log8LogEntryESaISG_EE+0x1a19 [0x40bc9f]

20190907-154417-2132 17865 ! Backtrace (message.cc:220): 2: _ZN9specpaxos4spec11SpecReplica18HandleDoViewChangeERK16TransportAddressRKNS0_5proto19DoViewChangeMessageE+0x965 [0x40e5a3]

20190907-154417-2134 17865 ! Backtrace (message.cc:220): 3: ZN9specpaxos4spec11SpecReplica14ReceiveMessageERK16TransportAddressRKSsS6+0x5bd [0x4077c7]

20190907-154417-2136 17865 ! Backtrace (message.cc:220): 4: _ZN12UDPTransport10OnReadableEi+0xb84 [0x4489fc]

20190907-154417-2138 17865 ! Backtrace (message.cc:220): 5: _ZN12UDPTransport14SocketCallbackEisPv+0x39 [0x448f2b]

20190907-154417-2140 17865 ! Backtrace (message.cc:220): 6: event_base_loop+0x754 [0x7f684a341f24]

20190907-154417-2142 17865 ! Backtrace (message.cc:220): 7: _ZN12UDPTransport3RunEv+0x1f [0x447bff]

20190907-154417-2144 17865 ! Backtrace (message.cc:220): 8: main+0x94f [0x40610f]

20190907-154417-2146 17865 ! Backtrace (message.cc:220): 9: __libc_start_main+0xf5 [0x7f684938af45]

20190907-154417-2148 17865 ! Backtrace (message.cc:220): 10: _start+0x29 [0x4056c9]

20190907-154417-2150 17865 ! Backtrace (message.cc:220): 11: ???+0x29 [0x29]

I can attach the full logs if needed. Ideally, the replicas should not crash but resolve the conflicts and make progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant