Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT_ERROR_SEND_BUFFER_IS_FULL due to transient MQTT_ERROR_SOCKET_ERROR #151

Open
normanr opened this issue Dec 23, 2021 · 0 comments
Open

Comments

@normanr
Copy link

normanr commented Dec 23, 2021

There are several MQTT_ERROR_SEND_BUFFER_IS_FULL issues already but I think I've tracked down a possible root cause for how and why QoS 0 messages can cause it.

I'm writing a embedded app using NuttX (which uses MQTT-C 1.1.5), and if an interrupt arrives then the send call in mqtt_pal_sendall can return EINTR. With just mqtt_sync running on it's own thread, the client ends up in MQTT_ERROR_SEND_BUFFER_IS_FULL, but if I call mqtt_sync after mqtt_publish then the client ends up in MQTT_ERROR_SOCKET_ERROR. Note that NuttX does not support SA_RESTART, so there's no way to configure system calls to automatically restart when a signal is received.

If __mqtt_send fails to send a QoS 0 message, then it doesn't remove it from the queue. The error state is set to MQTT_ERROR_SOCKET_ERROR, but if another thread immediately calls mqtt_publish, then the MQTT_CLIENT_TRY_PACK macro pack_call fails, because the buffer is full (even though it's only got a QoS 0 message in it). It tries to call mqtt_mq_clean, but the failed message isn't dropped because when the send failed, the post send logic didn't run and change the QoS 0 message into MQTT_QUEUED_COMPLETE state. MQTT_CLIENT_TRY_PACK then tries the pack_call again which again fails, and so the clients ends up in MQTT_ERROR_SEND_BUFFER_IS_FULL.

Adding a call to mqtt_sync immediately after mqtt_publish ensures that the client->error is set to MQTT_ERROR_SOCKET_ERROR after which __mqtt_send will error out early (and then nothing will change it again).

There are probably a couple of fixes that could be made:

  • if send fails with EINTR, then treat that as a temporary error (like EAGAIN) (so that it will be retried, but see the caveat below),
  • if send of a QoS 0 message fails, then mark it as MQTT_QUEUED_COMPLETE (so that mqtt_mq_clean can drop it).

Unfortunately due to apache/nuttx#669 it's not safe to retry EINTR, because (some or all of) the data may have already been sent. Also it looks like calling close on the socket can deadlock the system.

I eventually worked around all of these issues by configuring NuttX to have more active sockets, and enabled TCP write buffering. I guess it moves the data to buffers sooner, and then doesn't interrupt the send when the signal arrives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant