Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string parsing on arduino: switch char, strcmp, compiletime parser generators, state machines #12

Open
milahu opened this issue Oct 28, 2021 · 0 comments

Comments

@milahu
Copy link

milahu commented Oct 28, 2021

motivation: why string parsing?
in some cases, we want a custom text-based protocol,
which is easy for both humans and machines

as human, i do not want to type

{"cmd":"some_command","args":["arg1","arg2"]}

instead, i want to say

some_command arg1 arg2

challenges on arduino:

  • low memory usage
  • reading from serial port should be non-blocking

sample code can be run on a normal computer (no arduino)

single character parsers

as seen in many discussions on "arduino remote control via serial port"

// parse-char.cpp
// gcc parse-char.cpp -o parse-char && ./parse-char

#include <stdio.h> // printf, fgetc

int main() {
  char rc;
  while (true) {
    // arduino: read char with Serial.read()
    rc = fgetc(stdin); // read char from stdin (terminal)
    switch (rc) {
      case 'x':
        printf("received x -> exit\n");
        return 0;
        break;
      case '\n':
        printf("received \\n -> ignore\n");
        break;
      default:
        printf("received %c -> unknown command\n", rc);
    }
  }
}

as seen in

simple string parsers

use strcmp or strncmp to compare strings

as seen in

// parse-string.cpp
// gcc parse-string.cpp -o parse-string && ./parse-string

#include <stdio.h>
#include <string.h>

const char* expectedString = "hello";

int main() {

  // arduino: get actualString with Serial.read()
  const char* actualString = "hello";

  printf("actualString = %s\n", actualString);
  printf("expectedString = %s\n", expectedString);

  if (strcmp(actualString, expectedString) == 0) {
    printf("found expected string\n");
  }

}

parsing variable strings

commands are constant strings, but we also want to parse variables

sample input:

set some_key 1234

https://forum.arduino.cc/t/how-to-parse-multiple-variables-in-one-string/582456 - strtok, atoi, atof

https://forum.arduino.cc/t/serial-input-basics-updated/382007/3

void parseData() {      // split the data into its parts

    char * strtokIndx; // this is used by strtok() as an index

    strtokIndx = strtok(tempChars,",");      // get the first part - the string
    strcpy(messageFromPC, strtokIndx); // copy it to messageFromPC
 
    strtokIndx = strtok(NULL, ","); // this continues where the previous call left off
    integerFromPC = atoi(strtokIndx);     // convert this part to an integer

    strtokIndx = strtok(NULL, ",");
    floatFromPC = atof(strtokIndx);     // convert this part to a float

}

runtime-generated string parsers

the commands are declared at runtime, in the arduino setup() function

as seen in

concept:

void handleCommandHello(char** argv, int argc) {
  // process arguments ...
  // sample input: "hello world"
  // -> argc == 2
  // -> argv[0] == "hello"
  // -> argv[1] == "world"
}

// global variable
Parser* parser;

void setup() {
  parser = new Parser();
  parser->addCommand("hello", &handleCommandHello);
}

the lookup from command name to handler function can be realized in different ways:

  • strcmp, strncmp → simple string parsers
  • search trees
  • hashtables

compiletime-generated string parsers

  • write custom grammar
  • generate lexer
  • generate parser

a "lexer" reads an input string and generates a list of tokens
a "parser" reads a list of tokens and generates a structure / syntax tree

with compiletime-generated string parsers,
commands and handlers are declared at compile time,
so the runtime can be optimized for lower memory usage

in most cases, this is a micro-optimization, aka "a waste of time" (lets do it anyway!)

as seen in

tree parsers

aka: lisp, scheme, ...

Greenspun's Tenth Rule of Programming:

any sufficiently complicated C or Fortran program
contains an ad hoc informally-specified bug-ridden slow implementation
of half of Common Lisp.

trees are encoded as lists of strings

note: in the context of lisp interpreters,
"embedded" usually means "a lisp interpreter, embedded into a non-lisp programming language"

sample input:

(addTimer PERIOD COUNT)
(debug)
(load)
(save)
(wifi (SSID PASSWORD))

related

ThingML plugins

thingML allows us to write plugins to parse (and serialize) custom text-based protocols

https://heads-project.github.io/methodology/heads_methodology/thingml_plugins.html

Proprietary systems:

In many cases, a ThingML component will have to communicate
with external closed-source components,
whose implementation cannot be modified.

In this case, the only option is to adapt the ThingML end,
both in terms of encoding and transport.

Serialization plugins are in charge of generating
both the serialization and parsing of a set of ThingML messages.

Note that serialization and parsing are not necessarily perfect mirrors,
as ports can be asymmetrical.

ThingML provides a default binary and string-based (JSON) serialization and deserialization,
available and interoperable in C, JavaScript and Java.

document parser versus stream parser

  • document parsers: produce one large syntax-tree
  • stream parsers: produce many small events

finite state machines

FSM are usually

  • hand-made for simple machines
  • generated for complex machines

FSM can be used for string lexing and parsing,
but FSM can also be used to implement full arduino programs (manage state, handle events)

G-code parser

tgolla/GCodeParser - a hand-written parser (not generated from grammar) → category: compiletime-generated string parsers

what is G-code?

A typical piece of G-code as sent to a RepRap machine might look like this:

N3 T0*57
N4 G92 E0*67
N5 G28*22
N6 G1 F1500.0*82
N7 G1 X2.0 Y2.0 F3000.0*85
N8 G1 X3.0 Y3.0*33

full access to arduino via serial port

useful in development, not in production
in production, we want to give only limited access to the microcontroller

https://github.com/monsonite/SIMPL - category: single character parsers

serial-port-json-server

expose the arduino's serial port on a web server, to allow acces from a web browser
https://github.com/chilipeppr/serial-port-json-server

Locoduino/Commanders

string parser. obscure. what is the actual protocol?

https://github.com/Locoduino/Commanders/blob/master/examples/SerialCommander/SerialCommander.ino

binary protocols

firmata binary protocol

https://github.com/firmata/protocol
https://github.com/firmata/arduino

Firmata [protocol] is based on the midi message format in that
commands bytes are 8 bits and data bytes are 7 bits.

For example the midi Channel Pressure (Command: 0xD0) message is 2 bytes long,
in Firmata the Command 0xD0 is used to enable reporting for a digital port (collection of 8 pins).
Both the midi and Firmata versions are 2 bytes long

string libraries

non-blocking serial communication

https://arduino.stackexchange.com/a/22516/80923 Nick Gammon - reading serial without blocking

https://www.forward.com.au/pfod/ArduinoProgramming/Serial_IO/index.html

ALL output methods, i.e. the Serial.print and Serial.write methods,
will stop the rest of you sketch from running once the Serial Tx buffer fills up.

The size of the Tx buffer varies between different Arduino boards.
Uno/Meg2560 has a 63 byte Tx buffer. The NanoBLE has none.

So if you want the rest of your sketch to keep running
while you output results or debugging messages,
you should not use any of the Serial print/write methods in your loop() code.

This tutorial will show you how to use the BufferedOuput class
to avoid blocking, add extra buffering.

automatic compiletime-generated string parsers

http://swig.org/ - SWIG is typically used to parse C/C++ interfaces and generate the 'glue code' required for the above target languages to call into the C/C++ code.

problem: swig can not generate purely text-based interfaces, aka "shell code"

related: string parsing on web servers

in the context of web servers, this problem is known as "routing"
the web client requests a "route" like /somedir/index.php?q=somequery
and the web server will lookup the request handler

https://github.com/julienschmidt/httprouter

@milahu milahu changed the title string parsing on arduino (deserves an extra section) string parsing on arduino: switch char, strcmp, compiletime parser generators, state machines Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant