Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iotlab-testbed: serial_aggregator breaks the shell #3627

Closed
cgundogan opened this issue Aug 14, 2015 · 14 comments
Closed

iotlab-testbed: serial_aggregator breaks the shell #3627

cgundogan opened this issue Aug 14, 2015 · 14 comments
Assignees
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@cgundogan
Copy link
Member

When conducting an experiment on the iotlab testbed and using make iotlab-term OR the socat method described in [1], the shell breaks for some nodes after specifying some amount of commands.

I can only reproduce this when using experiments with a fairly large amount of nodes (I tested with 40). I could not reproduce this for an experiment with 10 nodes.

Reproducable by doing the following:

  1. connect via make iotlab-term or [1]
  2. enter command ifconfig and fibroute in an alternating fashion
  3. see that for some nodes the command fibroute may yield the output of ifconfig
  4. after observing this behaviour for a specific node, this node's id can be used to talk to it directly by prefixing the command with node_id;<command>. One can see that any command will be issued to this node with a "delay" in a sense, that the previous command is executed first and the current command will most likely be executed as the next command.

I am not sure if this is a bug on RIOT's side (shell robustness) or a misbehaviour of the serial_aggregator. Therefore, I just use the label: bug.

[1] https://www.iot-lab.info/tutorials/nodes-serial-link-aggregation/#Give_access_as_a_tcp_socket

EDIT: all tests were conducted with ng_networking on master.

@cgundogan cgundogan added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Aug 14, 2015
@cgundogan
Copy link
Member Author

@cladmi this may also be of interest to you

@jnohlgard
Copy link
Member

My first hunch would be to check if there is some buffering going on inside pyterm or whatever is used on iot-lab and/or the other tools that cause the end of the command line to stay in the buffer and not sent out on the serial port until more text arrives. It is less likely, however possible, that the RIOT shell would behave like this on random nodes with the same binary flashed.

@cgundogan
Copy link
Member Author

Further details: Once I enforce such a misbehaviour on a node, I bypass the serial aggregator by connecting via ssh to the testbed and use nc <node_id> 20000. I get the same results as with the serial aggregator, commands are issued with a delay.

@gebart buffering might be a/the problem, but unfortunately I am not educated enough in the internals of the testbed.
@cladmi may be able to enlighten us?

@cgundogan
Copy link
Member Author

@gebart I tried to execute the reboot statement by issueing it several times until it gets recognized by the shell. After a reboot, this node handles shell commands correctly again. So the problem lies definitely in the robustness of the RIOT shell. At least to a certain degree, because this only happens in the interplay when using the serial aggregator and a bunch of nodes in the experiment (>~30).

@OlegHahm
Copy link
Member

Have you checked if the problem is not simply related to still using uart0?

@OlegHahm
Copy link
Member

Try with #3402 or #3555.

@cgundogan
Copy link
Member Author

I tried with #3402. I am not sure if it was due to the shell modification, but some nodes started to ignore my shell input after playing around some time on the iotlab with the serial aggregator. When I looked at such a particular node without the serial aggregator, then I was endlessly receiving the prompt character >.

@OlegHahm
Copy link
Member

Can you try with pyterm?

@cgundogan
Copy link
Member Author

Can you try with pyterm?

Could you elaborate on this?

@OlegHahm
Copy link
Member

You wrote that you encountered these problems with serial_aggregator and I wanted to check if this reproducible with pyterm as well.

@cladmi
Copy link
Contributor

cladmi commented Aug 17, 2015

One issue that can come from serial_aggregator is that it's only handling lines. Until a line break arrives, the content is kept in a buffer. https://github.com/iot-lab/aggregation-tools/blob/master/iotlabaggregator/serial.py#L106

Another issue I just discovered, is that maybe I should run the 'utf-8' decode on data before splitting the lines. This can, maybe, lead to issues with 'splitlines' not splitting if it gets some random characters, like in case of a crash or a reboot for example. But in that case decoding issues will need to be handled correctly.

If this is the reason for the problems, I will look into it.

@OlegHahm
Copy link
Member

@cgundogan; I updated https://github.com/OlegHahm/RIOT/tree/shell_uart0_newlib_distinction - should fix the issue for now.

@OlegHahm OlegHahm modified the milestone: Release 2015.09 Sep 3, 2015
@OlegHahm
Copy link
Member

OlegHahm commented Sep 3, 2015

@cgundogan, have you encountered the problem recently?

@cgundogan
Copy link
Member Author

Tested on the iotlab with 50 nodes. Works 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

4 participants