Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing TCP with UDP for internodal communication #81

Open
JustAnotherArchivist opened this issue Jun 10, 2018 · 2 comments
Open

Comments

@JustAnotherArchivist
Copy link
Contributor

As discussed in #4, it may be worth replacing TCP with UDP for a lower networking overhead.

Raft does not require that packets are delivered in order. The end of chapter 3.3 in Diego Ongaro's thesis mentions this (emphasis mine):

Raft assumes RPC requests and responses may be lost in the network; it is the requester’s responsibility to retry the RPC if it does not receive a response in a timely manner. Servers issue RPCs in parallel for best performance, and Raft does not assume the network preserves ordering between RPCs.

So if Raft is implemented according to the thesis, UDP should work just fine.

@herry13
Copy link

herry13 commented Oct 28, 2018

It's a really good idea to replace TCP with UDP since the latter has lower overheads. In addition, two or more nodes behind NATs can talk each other using UDP hole-punching. From PR #92, it looks like we just need to implement UDPTransport, UdpConnection, and UdpServer.

@meteam2022
Copy link

Late for the party. I found some optimizations in the code that cause it to be unsafe to change to UDP.

The first place is that the Candidate adds vote counts directly on receiving response_vote msg. Since UDP can duplicate msgs, the votes can be wrongly counted in UDP network.

if message['type'] == 'response_vote' and message['term'] == self.__raftCurrentTerm:
self.__votesCount += 1

If changing to UDP, the sender of the response_vote msg should be stored in a set.

The second place is that, when sending a big append entry, the msg is fragmented, and each fragment needs to be received in order and without loss and duplication. Otherwise pickle.loads might fail or cause data inconsistency. UDP cannot guarantee the packet delivery in order, no packet loss or duplication.

PySyncObj/pysyncobj/syncobj.py

Lines 1165 to 1186 in 3ae622a

if len(entries) == 1 and len(entries[0][0]) >= batchSizeBytes:
entry = pickle.dumps(entries[0])
for pos in xrange(0, len(entry), batchSizeBytes):
currData = entry[pos:pos + batchSizeBytes]
if pos == 0:
transmission = 'start'
elif pos + batchSizeBytes >= len(entries[0][0]):
transmission = 'finish'
else:
transmission = 'process'
message = {
'type': 'append_entries',
'transmission': transmission,
'data': currData,
'term': self.__raftCurrentTerm,
'commit_index': self.__raftCommitIndex,
'prevLogIdx': prevLogIdx,
'prevLogTerm': prevLogTerm,
}
self.__transport.send(node, message)
if node not in self.__connectedNodes:
break

If changing to UDP, the fragmented msgs should add sequence number and total size. And a retrying mechanism is need in case the packet is lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants