Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Stream Migration spec #406

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
238 changes: 238 additions & 0 deletions connections/stream-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
# Stream migration

| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|---------------|--------|-----------------|
| 1A | Working Draft | Active | r0, 2022-04-13 |

Authors: [@marcopolo]

Interest Group: TODO
## Introduction

A peer may have many connections open for another peer and may be transmitting
data on less optimal connections. For example a peer could be connected to
another peer both directly and via a relay. In that case we'd like to move any
streams from the relay over to the the better direct connection. A similar
argument can be made with QUIC and TCP.

This protocol attempts to solve the problem of how to seamlessly move a stream
from one connection to another. This protocol also enables the peer to prune
excess connections since they will no longer be used.

## Requirements

1. Transport agnostic. Really, this means migrating at the stream level.
1. Minimal overhead. Overhead should be at most a small per-stream cost (no additional framing, etc.)
1. No interruption. Reading/writing should be continuous.
1. Transparent. Applications using migratable streams shouldn't notice anything.
1. Correct. There can't be any ambiguity (one side believing the migration happened, the other side disagreeing, etc.).

## The Protocol
The goal of the protocol is to move traffic from one stream to another
seamlessly. The final state of the new stream should be the same as the initial
state of the old stream.

The protocol should only be used when the initiator knows the responder
understands the stream-migration protocol (otherwise you'll waste a roundtrip).

The protocol works as a prefix before another protocol. If we are creating a
stream for some user protocol `P`, we coordinate the stream-migration protocol
first, and then negotiate protocol `P` later. The initial stream-migration
negotiation is so that both sides agree on an ID for the stream. This way when a
peer decides to migrate the stream, it can reference which stream it wants to
migrate and both peers know which stream is being referenced.

![stream-migration](./stream-migration/stream-migration.svg)

<details>
<summary>Instructions to reproduce diagram</summary>

``` plantuml
@startuml stream-migration
skinparam sequenceMessageAlign center
entity Initiator
entity Responder

note over Initiator, Responder: Assume both sides understand stream-migration

Initiator -> Responder: Open connection
Initiator -> Responder: Open multiplexed stream

Initiator -> Responder: Negotiate stream-migration protocol with ""<stream-migration protocol id>""

Initiator -> Responder: Send ""StreamMigration(type=Label, id=A)"" message

Initiator -> Responder: <i> continue negotiating underlying protocol </i>
... <i>Nodes use the stream as normal<i> ...

== Stream Migration ==

note over Initiator, Responder: Migrate <b>Stream A</b> to <b>Stream B</b>

Initiator -> Responder: Open new stream
Initiator -> Responder: Negotiate stream-migration protocol with ""<stream-migration protocol id>""

Initiator -> Responder: <b>Stream B:</b> Send ""StreamMigration(type=Migrate, id=B, from=A)"" message

Initiator <- Responder: <b>Stream B:</b> Send AckMigrate message

note over Responder
Treat any ""EOF"" on <b>stream A</b> as a signal
that it should continue reading on <b>stream B</b>
end note


note over Initiator
Close <b>stream A</b> for writing.
Will only write to <b>stream B</b> from now on.
end note

Initiator -> Responder: <b>Stream A:</b> ""EOF""

note over Responder
When <i>Responder</i> reads ""EOF"" on <b>stream A</b>
it will close <b>stream A</b> for writing.
It will only write to <b>stream B</b> from now on.
end note

Initiator <- Responder: <b>Stream A:</b> ""EOF""

note over Initiator
Treat any ""EOF"" on <b>stream A</b> as a signal
that it should continue reading on <b>stream B</b>
end note

note over Initiator, Responder
At this point <b>stream A</b> is closed for writing on
both sides, and both sides have read up to ""EOF"".
<b>stream A</b> has been fully migrated to <b>stream B</b>
end note

@enduml
```

To generate:
```bash
plantuml stream-migration.md -o stream-migration -tsvg
```
</details>

Note: some of these steps may be pipelined.


### Stream IDs

In the above diagram stream IDs have the labels `A` and `B`. In practice this
ID will be represented as a uint32 defined by the initiator.

### Stream migration protocol id

The stream migration protocol id should follow be `/libp2p/streamMigration`.


### Resets

If either stream is "reset" before both ends are closed, both streams must be
reset and the stream as a whole should be considered "aborted" (reset).

### Half closed streams

The final migrated stream should look the same as the initial stream. If the
initial stream `A` was half closed, then the final migrated stream `B` should
also be half closed. Note this may involve an extra step by one of the nodes.
If a node, when trying to close writes to its old stream, notices that it was
already closed, it should also close the new stream for writing. Specifically
imagine the following case.


![stream-migration-half-closed](./stream-migration/stream-migration-half-closed.svg)

<details>
<summary>Instructions to reproduce diagram</summary>
``` plantuml
@startuml stream-migration-half-closed
skinparam sequenceMessageAlign center
entity Initiator
entity Responder

Initiator <- Responder: <b>Stream A:</b> ""EOF""
note over Responder: <b>Stream A</b> is closed for writing
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

== Stream Migration ==

note over Initiator, Responder: Migrate <b>Stream A</b> to <b>Stream B</b>

Initiator -> Responder: Open new stream on <b>Connection 2</b>. Call this <b>Stream B</b>

Initiator -> Responder: <b>Stream B:</b> Negotiate stream-migration protocol with ""<stream-migration protocol id>""
Initiator -> Responder: <b>Stream B:</b> Send ""StreamMigration(type=Migrate, id=B, from=A)"" message

Initiator <- Responder: <b>Stream B:</b> Ack Migrate

note over Initiator
Close <b>stream A</b> for writing.
Will only write to <b>stream B</b> from now on.
end note

note over Initiator
We have already seen the ""EOF"" on
<b>Stream A</b> from <i>Responder</i>
So we continue reading on <b>stream B</b>
end note

Initiator -> Responder: <b>Stream A:</b> ""EOF""

note over Responder
Treat ""EOF"" on <b>stream A</b> as a signal to close <b>stream A</b> for
writing and continue writing on <b>stream B</b>. However stream A was
already closed (before migration), so we close <b>stream B</b> as well here.
end note
Initiator <- Responder: <b>Stream B:</b> ""EOF""

note over Initiator, Responder: Stream A is now migrated to Stream B

@enduml
```
To generate:
```bash
plantuml stream-migration.md -o stream-migration -tsvg
```
</details>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the peer misbehaves, e.g. when B just doesn’t send an EOF on stream B? While A can “consider” the stream as closed, it still needs to be closed explicitly, otherwise the stream multiplexer can’t garbage-collect the stream after it has been EOFed from both sides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the Responder misbehaves and doesn't close B then the stream migration hasn't finished since the new stream isn't in the same state as the old stream. I'm not sure what else we can do besides consider it not spec-compliant.

The reverse case where the Initiator's stream is closed for writing is the same
as above, but mirrored.

## Picking the best connection

Moving streams from one connection to another involves picking which connection
we should move the streams to. Here are some recommended heuristics the
initiator may use in determining which connection is best.

1. If we have both relayed and direct connections, keep the direct connections
and drop the relay connections.
2. Check for simultaneous connect: If we have both inbound and outbound
connections, keep the ones initiated by the peer with the lowest peer ID. Open
Question: Some protocols behave differently depending on whether they are the
dialer or listener. Can we really consolidate these?
3. Prefer the connection with the most streams.
4. Break ties in the remaining connections by selecting the newest conn, to
match the swarm's behavior in best connection selection.

Note that it's not required that all implementations (and all versions) follow
the same heuristics since the initiator is driving the migration and specifies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below I am assuming that Initiator refers to the connection initiator, not the stream initiator. Please correct me in case I am misunderstanding this @MarcoPolo.

Say there are two nodes A and B. Connection AB is initiated by A to B. Conneciton BA is initiated by B to A.

Say that A and B follow different heuristics to pick the best connection. A chooses AB as the best connection, B chooses BA as the best connection.

If I understand the above correctly, this would result in A moving all the streams it created to AB and B moving all the streams it created to BA. Both connections would thus stay alive.

Potential solution: Instead of allowing both A and B to migrate streams, how about delegating the decision making to the peer with the lower peer ID, e.g. in this case A?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

I was thinking about this this weekend and came up with a similar solution. Glad to see you also came to the same conclusion. I'll update this spec to make this explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few thoughts on this:

  1. Currently each peer chooses its own IDs for streams, i.e. there are two distinct spaces of stream IDs. If we want to allow the receiver of a stream to migrate that stream, we need a single stream ID space. One way to realize this would be to mandate the client (roles as seen by the stream muxer) to use odd and the server to use even stream IDs.
  2. I don't think this document should describe how peers would choose AB over BA. This document should only describe how to migrate one libp2p stream from one muxer stream to another. For all that this spec cares about, those streams might (or might not) live on the same underlying connection. We can then use the stream migration protocol as a building block to converge onto one connection (and the peer ID comparison is quite a neat idea, I like it!), but that should probably be described in a different document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes. I was considering adding a boolean flag that would indicate which peer initially identified the stream it's referencing when migrating (was I the initiator of the from stream?). This is the same as using even and odd numbers, since that scheme effectively encodes this boolean in the least significant bit. I'm fine with either way. Maybe it's a little easier to think about even and odds, so I'll do that.

  2. Agreed that describing how to sort connections is out of the scope of this document (I imagine that spec to iterate more and and possibly have more subtle details). But I do think this spec should define who is responsible for doing the stream migration. If we end up in the situation where we have two identical connections (A dialed B and B dialed A at roughly the same time) we should describe who is in charge of doing the stream migration. By defining which node starts the stream migration we simplify this protocol and also avoid having to handle cases where both sides start stream migration at the same time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Potential solution: Instead of allowing both A and B to migrate streams, how about delegating the decision making to the peer with the lower peer ID, e.g. in this case A?"

Wouldn't this create a biais toward lower peerIDs? Maybe we can hash the concatenation of the two peerIDs (the lower first). If the hash is even, use the lowest. Else use the highest. That way it is deterministic but no peerID is systematically favored over another.

Copy link
Contributor Author

@MarcoPolo MarcoPolo Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's not worth the extra complexity though since for a random ID A there's a 50% odd that it's less than another random id B. (since it's equiprobable that B is smaller).

Copy link
Contributor Author

@MarcoPolo MarcoPolo Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah on second thought I agree with Max. I actually don't see the benefits since it's still 50% odds either way. Updated 49c0597

Copy link
Contributor

@marten-seemann marten-seemann Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that we should specify anything here at all. Stream migration is a general feature, a building block.

The use case we have in mind now is migrating all streams from one connection to another, but we might come up with other use cases in the future. I'd prefer to have stream migration just be a thing that any node, regardless of its peer ID, can use in principle.
For the specific use case of converging on a single connection, comparing peer IDs seems reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to rephrase to see if I understand:

  1. The protocol should only specify how a node would perform a stream migration.
  2. It doesn't define who starts the migration or why.
  3. Consolidating connections would be a layer on top of this that defines which nodes is in charge of migrating streams to empty and close connections.

We just have to make sure that what we design here doesn't block point 3.

If that seems accurate, then I agree we don't need this in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also @marten-seemann to highlight a recent change re:

Currently each peer chooses its own IDs for streams, i.e. there are two distinct spaces of stream IDs. If we want to allow the receiver of a stream to migrate that stream, we need a single stream ID space. One way to realize this would be to mandate the client (roles as seen by the stream muxer) to use odd and the server to use even stream IDs.

I've specced something similar, except the lower peer id node uses even and the higher peer id node uses odds. This let's us avoid having to rely on the stream muxer to give us this role. And it also works across connections (it gets confusing if the stream muxer says we are the client on connection and the server on the other).

where to migrate to.

## Appendix

[Specs Issue](https://github.com/libp2p/specs/issues/328)

### Related Issues:

- <https://github.com/libp2p/go-libp2p/issues/634>

## Open Questions

Some questions that will probably be resolved when a PoC is implemented.

- In simultaneous open how do we pick who's the initiator? I think we can rely
on the `/libp2p/simultaneous-connect` to do the correct thing here.
Loading