Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux CDC (/dev/ttyACM*) <=> TinyUSB on nRF52 freezes with too much data #127

Closed
egnor opened this issue Oct 1, 2021 · 10 comments 路 Fixed by #135
Closed

Linux CDC (/dev/ttyACM*) <=> TinyUSB on nRF52 freezes with too much data #127

egnor opened this issue Oct 1, 2021 · 10 comments 路 Fixed by #135
Labels
Bug Something isn't working

Comments

@egnor
Copy link

egnor commented Oct 1, 2021

First of all many thanks 馃檹 馃帀 for excellent and very useful software!

Operating System

Linux (reproduced on Ubuntu 21.04 and also Raspberry Pi OS "buster")

IDE version

Reproduced with Arduino IDE 1.8.16.

Board

Adafruit Circuit Playground Bluefruit, S140 6.1.1, Level 0 (Release)
image

BSP version

Version 1.1.0, I think?

image

TinyUSB Library version

Adafruit TinyUSB Library 1.4.4 (and whichever tinyusb that bundles)

Sketch

I can repro with this simple read-and-echo-back sketch:

#include <Arduino.h>
#include <Adafruit_TinyUSB.h>

void setup() {
  Serial.begin(115200);
  while (!Serial) {
    digitalWrite(LED_BUILTIN, (millis() % 500) < 50);
    delay(10);
  }
}

void loop() {
  if (Serial.available()) {
    const char ch = Serial.read();
    Serial.write(ch);
    digitalToggle(LED_BUILTIN);
  }
}

I use the following Python script to write to the device (requires pyserial installed):

#!/usr/bin/env python3

import argparse
import time

import serial  # type: ignore

parser = argparse.ArgumentParser()
parser.add_argument("--port", default="/dev/ttyACM0")
parser.add_argument("--bps", type=float, default=1000000)
parser.add_argument("--chunk", type=int, default=128)

args = parser.parse_args()

with serial.Serial(args.port, timeout=0) as serial:
    start_mono = time.monotonic()
    rx_total = tx_total = 0
    next_status = 0.0
    while True:
        elapsed = time.monotonic() - start_mono
        if elapsed > next_status:
            print(
                f"tx={tx_total}b/{elapsed:.1f}s={tx_total/elapsed:.1f}bps "
                f"| rx={rx_total}b/{elapsed:.1f}s={rx_total/elapsed:.1f}bps"
            )
            next_status += 1.0
            continue
        
        rx_total += len(serial.read(serial.in_waiting) or b"")
        delay = ((tx_total + args.chunk) / args.bps) - elapsed
        if delay < 0:
            tx_total += serial.write(b"x" * args.chunk)
        else:
            time.sleep(min(1, delay))

What happened ?

With a small chunk size and a low rate, it will run for a while:

% ./serial_spam.py --bps 100 --chunk 40
tx=0b/0.0s=0.0bps | rx=0b/0.0s=0.0bps
tx=80b/1.2s=66.6bps | rx=40b/1.2s=33.3bps
tx=160b/2.0s=80.0bps | rx=120b/2.0s=60.0bps
tx=280b/3.2s=87.5bps | rx=240b/3.2s=75.0bps
tx=360b/4.0s=90.0bps | rx=320b/4.0s=80.0bps
tx=480b/5.2s=92.3bps | rx=441b/5.2s=84.8bps
tx=560b/6.0s=93.3bps | rx=520b/6.0s=86.7bps
tx=680b/7.2s=94.4bps | rx=640b/7.2s=88.9bps
tx=760b/8.0s=95.0bps | rx=720b/8.0s=90.0bps
tx=880b/9.2s=95.6bps | rx=840b/9.2s=91.3bps
tx=960b/10.0s=96.0bps | rx=920b/10.0s=92.0bps
tx=1080b/11.2s=96.4bps | rx=1040b/11.2s=92.9bps
tx=1160b/12.0s=96.7bps | rx=1120b/12.0s=93.3bps
tx=1280b/13.2s=97.0bps | rx=1240b/13.2s=93.9bps
tx=1360b/14.0s=97.1bps | rx=1320b/14.0s=94.3bps

Going to a larger chunk size and/or higher rate freezes fairly quickly:

% ./tools/serial_spam.py --bps 1024 --chunk 128
tx=0b/0.0s=0.0bps | rx=0b/0.0s=0.0bps
tx=896b/1.0s=895.7bps | rx=770b/1.0s=769.8bps
tx=1920b/2.0s=959.8bps | rx=1795b/2.0s=897.3bps
tx=2944b/3.0s=981.2bps | rx=2817b/3.0s=938.9bps
tx=3968b/4.0s=991.9bps | rx=3840b/4.0s=959.9bps
tx=4992b/5.0s=998.3bps | rx=4864b/5.0s=972.7bps
[[[ process is hung here ]]]

Once frozen, the USB-serial port is borked for host-to-device; any attempt to write will freeze the program. (Verified in a terminal program as well, or even just "cat" to the port.) Device-to-host data still flows, however.

I took some usbmon traces which I can share if they are interesting, they seem to show the device failing to respond to requests for data in that direction?? But I am not very versed in the nuts and bolts of USB communication.

POSSIBLY relevant: #57 (but the opposite direction).

How to reproduce ?

Run the above sketch on the device.

Run the above tester program on the connected host (setting --port if necessary).

Debug Log

(I'm not sure how to collect this log, but I can research it.)

Screenshots

(See text traces above.)

@egnor egnor added the Bug Something isn't working label Oct 1, 2021
@hathach
Copy link
Member

hathach commented Oct 1, 2021

thank you for opening the issue, however your sketch is too complicated, which can have its own issue e.g ble interfering with usb etc.. Can you

  1. Make an simplest sketch as possible with usb only e.g Serial echo or similar to narrow down the investigated code
  2. Please update the How to reproduce to be precise, what do you do to send lots of data. If you copy and paste some texts, please put the text here so that I could do it as well. Or if you send it via echo command please put the command here.
  3. ardi v2.4.1 (based on a snapshot of arduino-cli I believe) I only use Arduino IDE please make sure you test with the Arduino IDE to rule out any 3rd party issue

@egnor
Copy link
Author

egnor commented Oct 1, 2021

Okay! I wrote a mini serial-echo sketch, and a mini load-tester script, and have edited the first post to include the new repro case. Any help is welcome!! 馃檱

(Reproducing now with the arduino IDE to make sure there's not something weird going on with the ardi wrapper. Also I'm going to try this with a Mac host to see if there's some difference in behavior.)

@egnor
Copy link
Author

egnor commented Oct 1, 2021

Reproduced with stock Arduino IDE.

@egnor
Copy link
Author

egnor commented Oct 1, 2021

Reproduced (exact same behavior) with a Mac as the host.

@egnor
Copy link
Author

egnor commented Oct 1, 2021

Reproduced (exact same behavior) on Feather nRF52840 (not surprising, it's a very similar design as the Circuit Playground, just doesn't have all the extra playground peripherals).

(The Feather nRF52832 does NOT reproduce this behavior, which is also not surprising, because it uses a CP2104 USB/serial bridge instead of tinyusb.)

@hathach
Copy link
Member

hathach commented Oct 2, 2021

@egnor superb ! thank you for your update, this is exactly what needed to reproduce the issue. I will try to test it out and post update here when available

@hathach
Copy link
Member

hathach commented Nov 12, 2021

@egnor sorry for late response, I was busy with other works. I confirmed the issue occurred with your sketch and script. It is probably be the race condition or mutex locking issue. I will spend more time looking at this and post the update here.

PS: I also realize that the Serial is lacking read(buf, bufsize) API and added it as well

When reading multiple bytes at once, the hanged up is less likely to happen (but still occurred !!). So it is definitely a race condition

void loop() {
  if (Serial.available()) {
    uint8_t buf[64];
    uint8_t count = Serial.read(buf, 64);
    Serial.write(buf, count);
    digitalToggle(LED_BUILTIN);
  }
}

@hathach
Copy link
Member

hathach commented Nov 13, 2021

Update I put more effort looking debugging the nrf52. nrf52840 is not hanged, it appear to be running just fine:

  • internal fifo of cdc indicate there is no data
  • We could still put it into DFU mode with touch1200, which use control endpoint
  • However, my linux (ubuntu) doesn't seem to be able to communicate with the port anymore, CDC Data endpoint

further troubleshooting

@egnor
Copy link
Author

egnor commented Nov 22, 2021

That matches my understanding -- the nrf52840 is still running, but somehow USB-serial has wedged.

Glad to hear there's progress, let me know if I can help!

@hathach
Copy link
Member

hathach commented Nov 23, 2021

@egnor this takes more time than troubleshoot than I would expect. Please give it a try

I am glad it is fixed now. This has been 3rd time I got to fix the easydma race condition with nrf, previous one lack the intense test script like yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants