-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement logging the LCP RTT #422
Conversation
@paulusmack, @enaess: Have you seen this PR? |
@paulusmack, @enaess: Three month ago, @rfc1036 from Debian has done a PR, what do you think? |
Just because a third-party vendor of a commercial ppp implementation does this, doesn't make it a requirement for pppd to do so. First question I have is, is this feature documented in a RFC some place; in which I'd be totally fine with taking this as long as it meets the RFC's specs. I am aware of LQP or Link Quality Protocol as described in https://datatracker.ietf.org/doc/html/rfc1333 which is obsoleted by https://datatracker.ietf.org/doc/html/rfc1989. Where is the "specs" for this change coming from? |
As explained, I have invented and implemented everything from scratch except than taking inspiration from the general idea of measuring the RTT of LCP echo requests. LQP has a different purpose, semantics and mechanics: it requires to be implemented by both sides and negotiated using LCP, for a start. |
Let me see if I understand it correctly. This change doesn't change the underlying principals of the PPP/LCP protocol, but allows one to monitor the RTT of LCP echo / reply packets. It adds a few extra tools so you can monitor and store these in a binary database? With my knowledge of the LCP protocol, choosing a "magic" number to track may be a bad idea. These are negotiated at random and also prevent loop detection. You may want to change the implementation to measure the negotiated value of his or peer magic within the echo req/rep packets. Unless, my understanding of this is wrong ... |
Yes, I think that you are misunderstanding it. This works with no cooperation at all from the remote, because the standard requires that the complete optional payload of the LCP echo request frame is returned as is in the echo reply. So the magic number is really only relevant in the context of the peer sending and then receiving this data, not of the remote. It could even be argued it is almost superfluous. pppd sends as usual the LCP echo requests when configured to do so, but attaches a timestamp to each one. When it gets the timestamp back in the LCP echo replies then it will comput the RTT and write it to a ring buffer which can be read by monitoring systems. |
Just for everyone's reference. The wording out of RFC1661 section 5.8 regarding Echo Reply packets:
Unlike the Identifier field where the RFC explicitly says the ID must be copied into the response packet, the use of the data field implies a copy as it is used by the peer (or sender in this case). I'd say whether or not a particular implementation will support this detail can vary, but essentially this change as it currently stands should not break the specification. Given that this feature isn't enabled by default, the risk of regression is low. The C side of the change seems solid as if it was written by someone that knows the trade, I'd vouch for that. However, the question that still remains though is would this change be useful to others and if it would be worth our time maintaining it? I don't know, but pppd seems to still support many such changes e.g. EAP-SRP + others. I also don't know Marco's other involvement of e.g. putting pppd in the cloud or behind a CGI server and if this is a common use-case for pppd. |
I do not understand why you mention "cloud" and "CGI servers" and what you think that this has to do with this feature. |
Don't worry, you have my vote. Also, I am not the one who merges changes ;-) |
lost = lcp_echos_pending - 1; | ||
if (lost > 0xFF) | ||
lost &= 0xFF; /* truncate the lost packets to 256 */ | ||
if (rtt > 0xFFFFFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, wouldn't it be better to saturate rather than wrapping? I can imagine there might be cases where the round-trip time is longer than 16.7 seconds. Also, why the choice of microseconds as the unit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my intention: I have changed the code.
I choose microseconds because it is the natural unit which fits in three octets, and I thought that an LCP latency of over 16 seconds would be highly uncommon.
If you believe that it useful to represent higher latencies then I can easily rework the patch to use tens of microseconds instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking it would be rare to get latencies greater than 16.7 seconds, but it could happen in pathological situations. I guess we can leave it like this for now and change it later if necessary.
ring_buffer[next_entry + 1] = htonl((u_int32_t) ((lost << 24) + rtt)); | ||
|
||
/* update the pointer to the (just updated) most current data element */ | ||
lcp_rtt_buffer[2] = htonl(next_entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally with a mmaped ring buffer one should really have some sort of memory barrier here (i.e. before the write to lcp_rtt_buffer[2]), since otherwise, on architectures with a weakly-consistent memory model, a consumer that has mmaped the ring buffer might see the write to lcp_rtt_buffer[2] before the write to the ring buffer entry. If all consumers are simply going to read the whole buffer with the read() syscall, and we assume that read() is atomic with respect to msync(), then perhaps there isn't actually a problem. I don't know of portable constructs for memory barriers short of using a pthread mutex, and arguably this data is not so critical that an occasional bad value is not a real problem, so perhaps it is reasonable not to have any barrier. But at least a comment may be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my reasoning as well. I have added a comment.
If this will show to not be true in practice then I have thought of also setting the timestamp of the oldest sample to zero (i.e. the one which would be overwritten the next cycle): this way even if the index is read out of order then no wrong sample is read and the correct one will still be picked up by the reader the next time.
@@ -67,12 +72,18 @@ | |||
|
|||
static void lcp_delayed_up(void *); | |||
|
|||
#define LCP_RTT_MAGIC 0x19450425 | |||
#define LCP_RTT_HEADER_LENGTH 4 | |||
#define LCP_RTT_FILE_SIZE 8192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are Linux machines which use a 64kB page size, so we need to either make this a multiple of 64kB or else use getpagesize() in lcp_rtt_open_file(). (Hopefully Solaris has getpagesize().)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? The Linux mmap(2) man page explicitly describes what happens when the length is not a multiple of the page size, and I could not find anything forbidding this in the Solaris man page either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's OK if length is less than the page size, so ignore this comment.
@rfc1036: Have you seen @paulusmack comments? |
Yes, but I am on vacation with no access to my testbed, I will get back to you next month. |
This change adds the lcp-rtt-file configuration option, which instructs pppd to add a timestamp to the data section of each LCP echo request frame and then log their round-trip time and any detected packet loss to a circular buffer in that file. Other programs then can asynchronously read the file and report statistics about the line. Signed-off-by: Marco d'Itri <[email protected]>
Signed-off-by: Marco d'Itri <[email protected]>
Signed-off-by: Marco d'Itri <[email protected]>
I have updated the as discussed in the review. |
This patch adds the lcp-rtt-file configuration option, which instructs pppd to add a timestamp to the data section of each LCP echo request frame and then log their round trip time and any detected packet loss to a circular buffer in that file.
Other programs then can asynchronously read the file and report statistics about the line.
I have also added examples of a simple program which dumps the content of the log file and a Prometheus exporter which can be run as a CGI.
This feature was inspired by a similar one in a commercial PPP implementation, but I have never actually seen that and I do not know how it actually works or it was implemented.
Signed-off-by: Marco d'Itri [email protected]