Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/shell: refactor tokenizer code #13197

Merged
merged 6 commits into from
Jun 24, 2020

Conversation

HendrikVE
Copy link
Contributor

@HendrikVE HendrikVE commented Jan 24, 2020

Contribution description

With this PR I simply copied PR #12085 authored by @jcarrano. Since he left the team, I hereby offer to take care of rebasing and implement required changes.

The tokenizer (the code that breaks up the line given to the shell into strings to create argv) was quite a messy piece of code. This commit refactors it into a more traditional state-machine based parser.

This fixes the issues with quote handling exposed by the recently introduced test.

Testing procedure

Test vectors were added to the automated test. The old code could not handle this string:

> echo "t\e st" "\"" '\'' a\ b
shell: incorrect quoting
> echo "\""
shell: incorrect quoting

The new code handles it correctly.

Issues/PRs references

Copy of #12085

Dependencies:

@benpicco benpicco added Area: sys Area: System Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jan 25, 2020
@HendrikVE HendrikVE added the State: waiting for other PR State: The PR requires another PR to be merged first label Feb 7, 2020
@HendrikVE HendrikVE force-pushed the shell-tokenizer-refactor branch 3 times, most recently from d956584 to 3d08898 Compare February 10, 2020 12:24
@HendrikVE HendrikVE added Area: tests Area: tests and testing framework Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable labels Feb 10, 2020
@HendrikVE HendrikVE removed the State: waiting for other PR State: The PR requires another PR to be merged first label Mar 30, 2020
@HendrikVE HendrikVE force-pushed the shell-tokenizer-refactor branch from 3d08898 to bf6485c Compare March 30, 2020 14:33
@HendrikVE HendrikVE requested a review from miri64 as a code owner March 30, 2020 14:33
@HendrikVE HendrikVE removed the CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable label Mar 30, 2020
@HendrikVE HendrikVE force-pushed the shell-tokenizer-refactor branch 2 times, most recently from 5fec2d5 to a61d66a Compare March 30, 2020 16:24
@HendrikVE
Copy link
Contributor Author

Rebased and ready for review :)

HendrikVE and others added 2 commits May 18, 2020 14:02
Check that single and double quotes work, along with backslash escaping
and that malformed strings are rejected.

Right now the test is failing. The next commit will replace the tokenizer
with one that works correctly.

Co-authored-by: Juan Carrano <[email protected]>
The tokenizer (the code that breaks up the line given to the shell into
strings to create argv) was quite a messy piece of code. This commit
refactors it into a more traditional state-machine based parser.

This fixes the issues with quote handling exposed by the recently
introduced test.

Co-authored-by: Juan Carrano <[email protected]>
@HendrikVE HendrikVE force-pushed the shell-tokenizer-refactor branch from a61d66a to ee1e1ad Compare May 18, 2020 12:34
@HendrikVE
Copy link
Contributor Author

Rebased to master and Murdock builds were successful.

@HendrikVE
Copy link
Contributor Author

Could you please also take a look at this one, @fjmolinas ? :)

@fjmolinas fjmolinas requested review from fjmolinas and kaspar030 June 8, 2020 12:19
@fjmolinas fjmolinas added the CI: disable test cache If set, CI will always run all tests regardless of whether they have been run successfully before label Jun 9, 2020
@HendrikVE
Copy link
Contributor Author

Tabs are processed as before now. I replaced the state diagram by a text description which I hope is clear enough. I also did some other improvements in the code.

@HendrikVE HendrikVE force-pushed the shell-tokenizer-refactor branch from 9efd64e to ad7f6c4 Compare June 19, 2020 13:59
@fjmolinas
Copy link
Contributor

For the record, code size:

  • pr
    image

  • master
    image

@fjmolinas
Copy link
Contributor

I'll take a closer look at the latest changes on Monday!

Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small nitpicks, otherwise this PR is OK IMO. I'll run the test on many boards and see if everything is ok.

/**
* Break input line into words, create argv and call the command handler.
*
* Words are broken up at spaces. A backslash escaped the character that comes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Words are broken up at spaces. A backslash escaped the character that comes
* Words are broken up at spaces. A backslash escapes the character that comes

('echo abcdef"ghijklmn', 'shell: incorrect quoting'),
('echo abcdef\'ghijklmn', 'shell: incorrect quoting'),

# test more complex output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Suggested change
# test more complex output
# test default commands

@fjmolinas
Copy link
Contributor

All good from my local tests https://ci.inria.fr/ci-riot-tribe/job/build-pipeline.jk/lastSuccessfulBuild/artifact/failuresummary.md/*view*/ (the failing boards had connection issues, I re-launched the test and it worked).

@fjmolinas
Copy link
Contributor

@HendrikVE please squash once my last remaining comments are addressed!

HendrikVE and others added 4 commits June 23, 2020 13:46
Factor out common code for quoted and unquoted tokens. This makes the code
slighly less clear, but it also eliminates repetition (which may improve
clarity).

Co-authored-by: Juan Carrano <[email protected]>
The code for traversing arrays of shell commands (used to print help messages
and to search for commmand handlers) was needlessly complex.

Co-authored-by: Juan Carrano <[email protected]>
Code now correctly handles quotes within PARSE_UNQUOTED
and tabs are now considered a BLANK just like a space.
Divide test cases in to groups and add test cases for:
- multiple spaces between arguments
- tabs between arguments
- leading/trailing spaces
- more simple variations for escaping
- multiple tests for correct quoting

A second occurence of the test case "('help', EXPECTED_HELP),"
was removed.
@HendrikVE HendrikVE force-pushed the shell-tokenizer-refactor branch from 6f12972 to cbf5cc5 Compare June 23, 2020 11:58
@HendrikVE
Copy link
Contributor Author

HendrikVE commented Jun 23, 2020

I changed the comments as suggested and squashed my commits. But this time, I'm not sure about how much they should be squashed. One possibility would be to have 2 commits sys/shell and test/shell, but then the size of the commit would be quite big. What would you suggest here?

@fjmolinas
Copy link
Contributor

I changed the comments as suggested and squashed my commits. But this time, I'm not sure about how much they should be squashed. One possibility would be to have 2 commits sys/shell and test/shell, but then the size of the commit would be quite big. What would you suggest here?

I prefer keeping the division as you have now, I'd rather split up the different changes. So LGTM!

@fjmolinas
Copy link
Contributor

Lets see if murdock is happy.

Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, the code changes improve:

  • readability
  • escaped characters handling
  • quoted strings handling

There is an increase in the shell code, but the code itself is pretty complex, so IMO that change is expected. Maybe some people obsessed by code size want to take a second look (@kaspar030 @bergzand ), but IMO this is OK. ACK!

@fjmolinas
Copy link
Contributor

Ajj seems like there is a problem with a worker again...

@fjmolinas fjmolinas added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: disable test cache If set, CI will always run all tests regardless of whether they have been run successfully before CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jun 24, 2020
@fjmolinas
Copy link
Contributor

Go!

@fjmolinas fjmolinas merged commit af80e86 into RIOT-OS:master Jun 24, 2020
@HendrikVE
Copy link
Contributor Author

Awesome, thank you @fjmolinas !

@miri64 miri64 added this to the Release 2020.07 milestone Jun 24, 2020
@HendrikVE HendrikVE deleted the shell-tokenizer-refactor branch June 26, 2020 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: sys Area: System Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 3-testing The PR was tested according to the maintainer guidelines Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants