Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement logging the LCP RTT #422

Merged
merged 3 commits into from
Sep 30, 2023
Merged

Implement logging the LCP RTT #422

merged 3 commits into from
Sep 30, 2023

Conversation

rfc1036
Copy link
Contributor

@rfc1036 rfc1036 commented May 7, 2023

This patch adds the lcp-rtt-file configuration option, which instructs pppd to add a timestamp to the data section of each LCP echo request frame and then log their round trip time and any detected packet loss to a circular buffer in that file.

Other programs then can asynchronously read the file and report statistics about the line.

I have also added examples of a simple program which dumps the content of the log file and a Prometheus exporter which can be run as a CGI.

This feature was inspired by a similar one in a commercial PPP implementation, but I have never actually seen that and I do not know how it actually works or it was implemented.

Signed-off-by: Marco d'Itri [email protected]

@rfc1036 rfc1036 changed the title implement logging the LCP RTT Implement logging the LCP RTT May 7, 2023
@Neustradamus
Copy link
Member

@paulusmack, @enaess: Have you seen this PR?

@Neustradamus
Copy link
Member

@paulusmack, @enaess: Three month ago, @rfc1036 from Debian has done a PR, what do you think?
It is good?

@enaess
Copy link
Contributor

enaess commented Jul 30, 2023

Just because a third-party vendor of a commercial ppp implementation does this, doesn't make it a requirement for pppd to do so. First question I have is, is this feature documented in a RFC some place; in which I'd be totally fine with taking this as long as it meets the RFC's specs.

I am aware of LQP or Link Quality Protocol as described in https://datatracker.ietf.org/doc/html/rfc1333 which is obsoleted by https://datatracker.ietf.org/doc/html/rfc1989.

Where is the "specs" for this change coming from?

@rfc1036
Copy link
Contributor Author

rfc1036 commented Jul 30, 2023

As explained, I have invented and implemented everything from scratch except than taking inspiration from the general idea of measuring the RTT of LCP echo requests.

LQP has a different purpose, semantics and mechanics: it requires to be implemented by both sides and negotiated using LCP, for a start.

@enaess
Copy link
Contributor

enaess commented Jul 30, 2023

Let me see if I understand it correctly. This change doesn't change the underlying principals of the PPP/LCP protocol, but allows one to monitor the RTT of LCP echo / reply packets. It adds a few extra tools so you can monitor and store these in a binary database?

With my knowledge of the LCP protocol, choosing a "magic" number to track may be a bad idea. These are negotiated at random and also prevent loop detection. You may want to change the implementation to measure the negotiated value of his or peer magic within the echo req/rep packets. Unless, my understanding of this is wrong ...

@rfc1036
Copy link
Contributor Author

rfc1036 commented Jul 30, 2023

Yes, I think that you are misunderstanding it. This works with no cooperation at all from the remote, because the standard requires that the complete optional payload of the LCP echo request frame is returned as is in the echo reply. So the magic number is really only relevant in the context of the peer sending and then receiving this data, not of the remote. It could even be argued it is almost superfluous.

pppd sends as usual the LCP echo requests when configured to do so, but attaches a timestamp to each one. When it gets the timestamp back in the LCP echo replies then it will comput the RTT and write it to a ring buffer which can be read by monitoring systems.

@enaess
Copy link
Contributor

enaess commented Jul 31, 2023

Just for everyone's reference. The wording out of RFC1661 section 5.8 regarding Echo Reply packets:

 Data

      The Data field is zero or more octets, and contains uninterpreted
      data for use by the sender.  The data may consist of any binary
      value.  The end of the field is indicated by the Length.

Unlike the Identifier field where the RFC explicitly says the ID must be copied into the response packet, the use of the data field implies a copy as it is used by the peer (or sender in this case).

I'd say whether or not a particular implementation will support this detail can vary, but essentially this change as it currently stands should not break the specification. Given that this feature isn't enabled by default, the risk of regression is low. The C side of the change seems solid as if it was written by someone that knows the trade, I'd vouch for that.

However, the question that still remains though is would this change be useful to others and if it would be worth our time maintaining it? I don't know, but pppd seems to still support many such changes e.g. EAP-SRP + others. I also don't know Marco's other involvement of e.g. putting pppd in the cloud or behind a CGI server and if this is a common use-case for pppd.

@rfc1036
Copy link
Contributor Author

rfc1036 commented Aug 1, 2023

I do not understand why you mention "cloud" and "CGI servers" and what you think that this has to do with this feature.
The first candidates to use this are systems like OpenWRT which implement dashboards with statistics about the connection.

@enaess
Copy link
Contributor

enaess commented Aug 1, 2023

Don't worry, you have my vote. Also, I am not the one who merges changes ;-)

pppd/lcp.c Outdated Show resolved Hide resolved
pppd/lcp.c Show resolved Hide resolved
lost = lcp_echos_pending - 1;
if (lost > 0xFF)
lost &= 0xFF; /* truncate the lost packets to 256 */
if (rtt > 0xFFFFFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, wouldn't it be better to saturate rather than wrapping? I can imagine there might be cases where the round-trip time is longer than 16.7 seconds. Also, why the choice of microseconds as the unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my intention: I have changed the code.

I choose microseconds because it is the natural unit which fits in three octets, and I thought that an LCP latency of over 16 seconds would be highly uncommon.

If you believe that it useful to represent higher latencies then I can easily rework the patch to use tens of microseconds instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking it would be rare to get latencies greater than 16.7 seconds, but it could happen in pathological situations. I guess we can leave it like this for now and change it later if necessary.

pppd/lcp.c Show resolved Hide resolved
pppd/lcp.c Outdated Show resolved Hide resolved
ring_buffer[next_entry + 1] = htonl((u_int32_t) ((lost << 24) + rtt));

/* update the pointer to the (just updated) most current data element */
lcp_rtt_buffer[2] = htonl(next_entry);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally with a mmaped ring buffer one should really have some sort of memory barrier here (i.e. before the write to lcp_rtt_buffer[2]), since otherwise, on architectures with a weakly-consistent memory model, a consumer that has mmaped the ring buffer might see the write to lcp_rtt_buffer[2] before the write to the ring buffer entry. If all consumers are simply going to read the whole buffer with the read() syscall, and we assume that read() is atomic with respect to msync(), then perhaps there isn't actually a problem. I don't know of portable constructs for memory barriers short of using a pthread mutex, and arguably this data is not so critical that an occasional bad value is not a real problem, so perhaps it is reasonable not to have any barrier. But at least a comment may be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my reasoning as well. I have added a comment.

If this will show to not be true in practice then I have thought of also setting the timestamp of the oldest sample to zero (i.e. the one which would be overwritten the next cycle): this way even if the index is read out of order then no wrong sample is read and the correct one will still be picked up by the reader the next time.

pppd/lcp.c Outdated Show resolved Hide resolved
pppd/lcp.c Show resolved Hide resolved
@@ -67,12 +72,18 @@

static void lcp_delayed_up(void *);

#define LCP_RTT_MAGIC 0x19450425
#define LCP_RTT_HEADER_LENGTH 4
#define LCP_RTT_FILE_SIZE 8192
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are Linux machines which use a 64kB page size, so we need to either make this a multiple of 64kB or else use getpagesize() in lcp_rtt_open_file(). (Hopefully Solaris has getpagesize().)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? The Linux mmap(2) man page explicitly describes what happens when the length is not a multiple of the page size, and I could not find anything forbidding this in the Solaris man page either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's OK if length is less than the page size, so ignore this comment.

pppd/pppd.8 Outdated Show resolved Hide resolved
@Neustradamus
Copy link
Member

@rfc1036: Have you seen @paulusmack comments?

@rfc1036
Copy link
Contributor Author

rfc1036 commented Aug 24, 2023

Yes, but I am on vacation with no access to my testbed, I will get back to you next month.

This change adds the lcp-rtt-file configuration option, which instructs
pppd to add a timestamp to the data section of each LCP echo request
frame and then log their round-trip time and any detected packet loss
to a circular buffer in that file.

Other programs then can asynchronously read the file and report
statistics about the line.

Signed-off-by: Marco d'Itri <[email protected]>
@rfc1036
Copy link
Contributor Author

rfc1036 commented Sep 24, 2023

I have updated the as discussed in the review.

@paulusmack paulusmack merged commit edc1926 into ppp-project:master Sep 30, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants