-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(parser): handle negative numeric tokens #62
Conversation
@@ -175,6 +187,28 @@ const parseArgs = ( | |||
return result; | |||
}; | |||
|
|||
// Look ahead to next token in argv array: | |||
function peek(argv, pos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: bakkot, @shadowspawn, @ljharb, a step towards a token based parser, as discussed in #52.
Benefit is it encapsulates the logic, and simplifies reading the codebase, e.g.,:
} else if (peek(argv, pos) !== TOKENS.FLAG) {
// Add 'i' to 'pos' such that short options are parsed in order | ||
// of definition: | ||
ArrayPrototypeSplice(argv, pos + (i - 1), 0, `-${short}`); | ||
} | ||
} | ||
|
||
const suffix = StringPrototypeSlice(arg, 2); // maybe -f=bar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix code is only for what I think is mistaken processing of =
, per previous comments.
(On a separate note, we use arg fairly loosely though this code. We could try for some more meaningful names, but I suggest try that is a pure refactor and not when we are changing code.)
Related: #60 |
function token(arg) { | ||
const chr = StringPrototypeCharAt(arg, 0); | ||
const nextChr = StringPrototypeCharAt(arg, 1); | ||
if (StringPrototypeMatch(chr, /^[0-9]$/)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (StringPrototypeMatch(chr, /^[0-9]$/)) { | |
if (RegExpPrototypeSymbolMatch(/^[0-9]$/, chr)) { |
const nextChr = StringPrototypeCharAt(arg, 1); | ||
if (StringPrototypeMatch(chr, /^[0-9]$/)) { | ||
return TOKENS.NUMERIC; | ||
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) { | |
} else if (chr === '-' && RegExpPrototypeSymbolMatch(/^[0-9]$/, nextChr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also use test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only testing for one digit? So -1foo
is returning numeric? It does not actually matter too much since to support negative numbers we are dropping support for digits, but I feel misleading to call it NUMERIC based on that test.
if (options.short && options.short[arg]) | ||
arg = options.short[arg]; // now long! | ||
// ToDo: later code tests for `=` in arg and wrong for shorts | ||
if (suffix.startsWith('=')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (suffix.startsWith('=')) | |
if (StringPrototypeStartsWith(suffix, '=')) |
if (StringPrototypeMatch(chr, /^[0-9]$/)) { | ||
return TOKENS.NUMERIC; | ||
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can drop
^
and$
since were only matching one character (Unless I'm overlooking something). - Should we use
\d
for increased unicode support, or does that add more complexity/edge cases? - Should we be using
.test()
since were not using the result of.match()
e.g./[0-9]/.test(chr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good call, this should use RegExpPrototypeTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to double check, but I believe \d
is the same as [0-9]
so a matter of taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to double check, but I believe
\d
is the same as[0-9]
so a matter of taste.
Needed to double checked myself! Got my languages mixed up (: https://regex101.com/r/PhKpmF/1
I don't agree in principle with treating negative numbers as special tokens. In brief it clashes with using digits as short options, and adds an opinionated behaviour. Slightly longer version: #60 (comment) I'm personally 👎 But I expect casual users are more likely to encounter negative numbers than single digit options or other situations where it misfires. So we might decide to implement it anyway... |
@shadowspawn without special casing negative numbers, it's hard to implement an example like a basic calculator application:
Becomes: short options: If you do want to implement the behavior With regards to |
Short version: Yes, there is a trade-off. Long version
Yes, that would be a use case that would not be supported out of the box. Four approaches from a quick think:
(Tongue in cheek.) Or to rephrase, if you want to use digits as short options, do it yourself. Wait, why are we offering a parser if this is the easy approach? As an aside. I think this is a subtle example of not always being feasible to build on top of a parser which has applied a number of decisions. Imagine processing this with negative numbers treated as positionals, and then try and post-process digits as options:
It isn't as simple as looking for the digits in the positionals and taking the next positional, the parser ate some options out of the middle. |
I'll add that there is somewhat less namespace pressure on short options these days, because we can use long-only options for the rare cases. For a user of our library, would still be a loss if a digit was the nicest semantic choice, but there would at least be alternatives available. Note
|
what if we treat arguments wrapped in quotes as positionals, or values, and drop the quotes? Then an implementer could pre-process negative numbers by wrapping them in quotes. |
Hmmm, I don't like that as a work-around. Then quotes are special and get eaten by the parser. Note though, if the implementer wrapped the negative numbers in quotes the current parser would not treat them as options. It is "just" the implementer has to remove the quotes themselves after parsing. |
@@ -118,20 +128,23 @@ const parseArgs = ( | |||
if (arg.length > 2) { | |||
for (let i = 2; i < arg.length; i++) { | |||
const short = StringPrototypeCharAt(arg, i); | |||
// Case of `-o=foo`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not consider this a short option and its value in the way you are assuming. The =
is for long options not for short options. The four possible forms for specifying the expected value are:
-v VVV
-vVVV // if we support it
--value VVV
--value=VVV
(Or -v
on the end of a combined short group with following argument.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanding the = to an argument seems off to me, so I think think that two options are either throw a parsing error, or treat it the same as:
-v foo
What does POSIX do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learnt something rereading the Utility Conventions. =
is not a valid short option name, so we are into undefined behaviour. (I thought I saw something about "printable" somewhere, didn't find that just now.)
I have not checked getopt
to see how it deals with this.
Guideline 3:
Each option name should be a single alphanumeric character (thealnum
character classification) from the portable character set.
https://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap12.html#tag_12_01
} else if (pos + 1 < argv.length && | ||
!StringPrototypeStartsWith(argv[pos + 1], '-') | ||
) { | ||
} else if (peek(argv, pos) !== TOKENS.FLAG) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processing the last arg will now drop into this conditional, whereas previously it would drop through to the next test.
i.e. removed pos + 1 < argv.length
test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is processing an option with a following argument available as a value, if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bounds check was moved into peek, if we decide not to handle negative numbers I'll abandon this refactor any ways.
// value and then increment pos so that we don't re-evaluate that | ||
// arg, else set value as undefined ie. --foo b --bar c, after setting | ||
// b as the value for foo, evaluate --bar next and skip 'b' | ||
const val = options.withValue && | ||
ArrayPrototypeIncludes(options.withValue, arg) ? argv[++pos] : | ||
(ArrayPrototypeIncludes(options.withValue, arg) || | ||
ArrayPrototypeIncludes(options.multiples, arg)) ? argv[++pos] : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you added a test for multiples
here? That does not look right for a multiple flag. The logic for multiples is in storeOptionValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of multiples mixing positional and '='s appeared to be buggy, I can add a test for this separately.
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) { | ||
return TOKENS.NUMERIC; | ||
} else if (chr === '-') { | ||
return TOKENS.FLAG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two other things we could identify using token()
are -
and --
. But the focus of this PR is negative numbers, so leave that for future refactor.
Treating negative numbers as positionals complicates the short option group processing, which could previously assume a blind expansion. Behaviour now broken:
(For interest, Commander suffered from similar weird outcomes from blind expansion. To get it right I switched to processing just one short option at a time and no fall-through, go around big loop again to process anything left. Commander has flags and required values and optional values, but not negative numbers. https://github.com/tj/commander.js/blob/02a124c7d58dbae2ef11f9284b2c68ad94f6dc8b/lib/command.js#L1378) |
(Sorry @bcoe , lots of comments after you were running hot trying out the yarns-parser tests!) |
I would be curious to hear other people's thoughts, re: negative numbers, I can attempt the pre-processing approach if we would prefer not to support them. Numerics in most cli apps I think are positive integers, delays, ports, etc. Are the common use cases for negative numerics I'm missing? |
Quoting props so the shell doesn’t interpret them - like globs - is standard. It seems perfectly fine to me that if i want to have a negative number as a value, my choices are to either use a long form with an = ( |
Quoting works for passing values that would otherwise be interpreted by shell, but does not work directly for protecting things that look like options. The shell removes the quotes before they are passed through.
|
For comparison, in Commander options that expect an argument take the next argument whether or not it starts with a dash. So There a trade-off, Commander gets an occasional question about why |
Another implementation for interest. Python's arparse goes the extra mile, and allows negative numbers as positional and option-arguments iff there are no digit options defined. This is quite sophisticated. (argparse does not have greedy options, arguments with a starting dash are otherwise options and not option-arguments.) https://docs.python.org/3/library/argparse.html#arguments-containing If we went this approach we would have to decide which mode zero-config uses. |
The more I look at the above examples and the referenced // node calc.js --add -300 200
const { values } = parseArgs({
options: {
add: {
type: 'number',
nargs: 2, // defaults to 1
}
}
})
// values: { add: [-300, 200] } |
I added I far prefer the explicit I'd rather hold off on adding a |
I said:
I want to expand on that after thinking about it again in greedy/choosy context. I still don't like the idea of detecting negative numbers as such... However, a different way of reasoning about this is that digits as options are rare, so we might choose not to recognise them. Then This allows negative numbers to appear as option values or as positionals, which is what Python's |
Handle a bunch of finicky edge-cases around negative positionals, and negative numbers as arguments.
From yargs' test suite:
Also address issue with parsing of
-o=foo
, for short opts.TODO: add tests.