Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: add util.parseArgs() #35015

Closed
wants to merge 14 commits into from
Closed

util: add util.parseArgs() #35015

wants to merge 14 commits into from

Conversation

boneskull
Copy link
Contributor

@boneskull boneskull commented Sep 1, 2020

Add a function, util.parseArgs(), which accepts an array of
arguments and returns a parsed object representation thereof.

Ref: nodejs/tooling#19

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

Motivation

While this has been discussed at length in the Node.js Tooling Group and its associated issue, let me provide a summary:

  • process.argv.slice(2) is rather awkward boilerplate, especially for those new to Node.js
  • Parsing options without the use of a userland module reliably requires, well, about as much code as this PR adds; the complexity quickly ramps up going from one argument to two
  • It's often useful to add a few command-line flags in an ad-hoc manner to, say, app.js
  • It's often useful to parse command-line flags in example code, which would otherwise demand installation of a userland module (thus complicating things)

process.parseArgs() makes handling command-line arguments "natively" much easier. Given that so many tools are being written in Node.js, it makes sense to center this experience.

Design Considerations

First, let me acknowledge that there are many ways to parse command-line arguments. There is no standard, cross-platform, agreed-upon convention. Command-line arguments look different in Windows vs. POSIX vs. GNU, and there's much variation across programs. And these are still conventions, not hard & fast requirements. We can easily be paralyzed attempting to choose what "styles" to support or not. It is certain that there will be someone who agrees that Node.js should have this feature, but should not do it in this way.

But to implement the feature, we have to do it in some way. This is why the way is the way it is:

I have researched the various features and behavior of many popular userland command-line parsing libraries, and have distilled it down to the most commonly supported features, while striving further to trim any features which are not strictly necessary to get the bulk of the work done. While these do not align to, say, POSIX conventions, they do align with end-user expectations of how a Node.js CLI should work. What follows is consideration of a few specific features.

The Requirement of = for Options Expecting a Value

For example, one may argue that --foo=bar should be the only way to use the value bar for the option foo; but users of CLI apps built on Node.js expect --foo bar to work just as well. There was not a single popular argument-parsing library that did not support this behavior. Thus, process.parseArgs() supports this behavior (it cannot be automatic without introducing ambiguity, but I will discuss that later).

Combination of Single-Character Flags

Another one is combining (or concatenating?) "short flags"--those using a single hyphen, like -v--where -vD would be equivalent to -v -D. While this is a POSIX convention, it is not universally supported by the popular command-line parsers. Since it is inherently sugar (and makes the implementation more complicated), we chose not to implement it.

Data Types

Like HTML attribute values (<tag attr="1">), command-line arguments are provided to programs as strings, regardless of the data type they imply. While most of the userland arg parsers support some notion of a "data type"-i.e., this argument value is a number, string, or boolean--it is not strictly necessary. It is up to the user to handle the coercion of these values.

Default Behavior: Boolean Flags

The default behavior is to treat anything that looks like an argument (that's mainly "arguments beginning with one or more dashes") as a boolean flag. The presence of one of these arguments implies true. From investigation of popular CLI apps, we found that most arguments are treated as boolean flags, so it makes sense for this to be the default behavior. This means that a developer who just wants to know whether something is "on" or "off" will not need to provide any options to process.parseArgs().

Handling Values

Some arguments do need values, (e.g., --require my-script.js), and in order to eliminate ambiguity, the API consumer must define which arguments expect a value. This is done via the expectsValue option to process.parseArgs(), which is the only option to process.parseArgs(). This is the only option process.parseArgs() accepts.

Possible alternatives:

  • Rename expectsValue to something else

Repeated Arguments

It's common to need to support multiple values for a single argument, e.g., --require a.js --require b.js. In this example, require needs to be listed in the expectsValue option. The result is an object containing a require property whose value is an array of strings; ['b.js', 'c.js']. In the example of --require c.js, the value of the require property is a string, 'c.js'.

When working with boolean flags (those not declared in expectsValue), it was trivial to support the case in which repeated arguments result in a count. One -v will result in an object where {v: true}, but -v -v will result in {v: 2}. Either way, the value will be truthy.

Possible alternatives:

  • Every argument expecting a value (as declared in expectsValue) will parse to Array of strings, even if there is only one string in the Array (e.g., --require c.js becomes {require: ['c.js']}. That makes the API more consistent at the expense of making the common case (no repetition) slightly more awkward.
  • Remove the "count" behavior. While this is widely supported by modules, I don't often see it used in the wild in Node.js CLI apps.

Positional Arguments

Arguments after -- or without a dash prefix are considered "positional". These are placed into the Array property _ of the returned object. This is a convention used by many other userland argparsers in Node.js. It is always present, even if empty. This also means that _ is reserved as a flag/option name (e.g., --_ will be ignored).

Possible alternatives:

  • Throw if _ is provided in expectsValue

Intended Audience

It is already possible to build great arg parsing modules on top of what Node.js provides; the prickly API is abstracted away by these modules. Thus, process.parseArgs() is not necessarily intended for library authors; it is intended for developers of simple CLI tools, ad-hoc scripts, deployed Node.js applications, and learning materials.

It is exceedingly difficult to provide an API which would both be friendly to these Node.js users while being extensible enough for libraries to build upon. We chose to prioritize these use cases because these are currently not well-served by Node.js' API.

Questions

  • In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals"). This does align with other documentation I've read on the subject of CLI arguments, I am unsure if introducing this terminology to our documentation is a Good Idea. Perhaps it can be expressed without new terminology.

  • I sorted some files around my modification in in node.gyp, which looked like it wanted to be in order, but was not. It did not seem to affect the build, but I can revert these changes if need be.

  • Should it be process.parseArgv()? While it does parse process.argv by default, it does not necessarily need to be used with process.argv.

  • Do I need to do more input validation, throw more exceptions, or take other defensive measures?

Credits

While this is my implementation, the design is a product of work by myself, @bcoe, @ruyadorno, and @nodejs/tooling.

@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. process Issues and PRs related to the process subsystem. labels Sep 1, 2020
@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Sep 1, 2020

Nice, I didn't know if this was ever going to actually happen or not.

In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals").

There's an IEEE standard for this, which might help.

The utility in the example is named utility_name. It is followed by options, option-arguments, and operands. The arguments that consist of characters and single letters or digits, such as 'a', are known as "options" (or, historically, "flags"). Certain options are followed by an "option-argument", as shown with [ -c option_argument]. The arguments following the last options and option-arguments are named "operands".
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html

You mention the possibility of renaming the expectsValue option to something else, so you might be interested in the excellent getopt API documentation, which calls this has_arg.

This field says whether the option takes an argument. It is an integer, and there are three legitimate values: no_argument, required_argument and optional_argument.
https://www.gnu.org/software/libc/manual/html_node/Getopt-Long-Options.html

Hope to have time for a more thorough review later, but I do agree that command-line argument parsing is a fundamental capability that should be readily available. What I've noticed is that there are basically three major styles:

  1. No frills (what you propose)
  2. Subcommands (i.e., the Git CLI git commit -m …)
  3. Docker-style (multiple subcommands & args)

@boneskull
Copy link
Contributor Author

@DerekNonGeneric Thanks, didn't know about the IEEE thing. I can adopt this terminology if we think it'd be more clear.

'object',
options);
}
if (typeof options.expectsValue === 'string') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for this overload? it seems somewhat nonsensical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's friendly to support a string if the value can be singular, instead of requiring an array. This sort of behavior is very common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that just seems confusing to me. can we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload seems to be a category error, it should be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -0 on having this overload: it doesn't really bother me, but since we should rename expectsValue to something else, and if we rename to something on plural (like optionsWithValues) it might make more sense/be more intuitive to have this as an Array.

Copy link
Contributor

@bcoe bcoe Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For yargs' chaining API, I like that I can provide options as either an array or string. If I only have a couple options, I'll usually write:

yargs
  .boolean('option-1')
  .boolean('option-2')

But, if I have many options that I'm configuring, I might write:

yargs.boolean(['option-1', 'option-2', 'option-3', 'option-4')

I don't have as strong of an opinion in this case, given that this API isn't chaining like yargs (you configure all the options in one go any ways).


I don't think either implementation would be a major usability issue, and am supportive of whatever we land on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't decided yet where I land on "accept string or not", but if we don't accept string it should throw when string is passed, otherwise we'll get situations where the string is treated as an array ("foo" getting treated as ['f', 'o', 'o'] leads to hard to debug issues).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -0 too. Either is fine by me, but I guess if it was up to me, I'd prefer to not permit a string initially, because adding it later is no problem but taking it away once it's out there is a big problem. That said, I doubt we'd have to/want to take it away, hence my -0 rather than -1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I am also a member of the -0 club.

const arg = argv[pos];
if (arg === undefined) {
return result;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be safer to remove the _ checks below and move them up here, maybe something like if (/^-{0,2}_$/.test(arg)) continue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't just do that, because --_ is ignored, --_=foo is ignored, and --_ foo is ignored (in the case of _ in expectsValue), so we need to parse our array further before just skipping ahead (or we introduce ambiguity). We could throw if _ is in expectsValue and make this somewhat easier.

object supporting the following property:
* `expectsValue` {Array<string>|string} (Optional) One or more argument
strings which _expect a value_ when present
* Returns: {Object} An object having properties corresponding to parsed Options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just return { options: { ... }, positionals: [...] }? it gets rid of the awkward _ handling and is imo much cleaner than magic properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this, but it is based on userland conventions. Personally I would prefer the magic over nested properties, but that's just me (as a potential consumer of this API).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've seen the userland libraries. I'd like to think we can do better 😄

Copy link
Contributor

@mmarchini mmarchini Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm with Gus here, userland convention originated before deconstructors, so having a _ property could be considered better/more attractive than returning options: { ... }, positionals: [...] }. The explicit object return is, IMO, more intuitive and user-friendly:

const { options, positionals } = processArgs()

_ might be somewhat familiar to users used to yargs, commander etc, but it won't be intuitive for new users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that const { options, positionals } = processArgs() is a pretty elegant compromise, I never loved _ 😆

@mscdex
Copy link
Contributor

mscdex commented Sep 2, 2020

This seems like something that should live off of util instead.

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 that we can do way better than the magic _

lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
@boneskull
Copy link
Contributor Author

Would like someone else to weigh in on the _ convention for positionals, e.g., @bcoe

@boneskull
Copy link
Contributor Author

I'm interested in feedback on all "possible alternatives" and questions listed in the description. Consolidating it into a list here:

  1. Should we rename the expectsValue option? For example, optionArguments (to follow the IEEE guidelines)
  2. For args expecting a value (e.g., --foo=bar) should the parsed result always be an array of values? {foo: 'bar'} vs {foo: ['bar']} in this case
  3. Should we remove the "count" behavior for repeated flags? What should happen instead, in the case of --foo --foo --foo?
  4. Should we adopt IEEE terminology in the code and documentation?
  5. Should I revert the change to the order of files in node.gyp?
  6. Should the function be named process.parseArgv()?
  7. Should more exceptions be thrown somewhere?

In addition, I would like further input on these:

  1. Should this function live somewhere else besides process, e.g., util?
  2. Should we abandon the _ convention, and return a nested object instead (with e.g., options and positionals properties)?

When answering these, please consider the expected users of this API and the developer experience. What would be easiest to understand at first glance? Once familiar, will the API be ergonomic or tedious?

doc/api/process.md Outdated Show resolved Hide resolved
Copy link
Member

@devsnek devsnek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boneskull
Copy link
Contributor Author

@devsnek I will not be changing the API to force expectsValue to be an array (though I'm happy to rename it!), as I strongly believe providing a smoother developer experience is more important than somebody's idea of theoretical purity. It's my hope you will not block the PR for this reason.

@devsnek
Copy link
Member

devsnek commented Sep 2, 2020

I don't think confusing apis provide a smoother developer experience.

@boneskull
Copy link
Contributor Author

@devsnek There's nothing confusing about the API; as I mentioned, this is widely supported across userland.

@boneskull
Copy link
Contributor Author

Sounds like @nodejs/tsc may need to weigh in.

@mmarchini
Copy link
Contributor

Some comments on the issue summary first, will look at the code later.

Another one is combining (or concatenating?) "short flags"--those using a single hyphen, like -v--where -vD would be equivalent to -v -D. While this is a POSIX convention, it is not universally supported by the popular command-line parsers. Since it is inherently sugar (and makes the implementation more complicated), we chose not to implement it.

How much more complicated is it to implement this behavior, and why is it more complicated vs more complex?

Every argument expecting a value (as declared in expectsValue) will parse to Array of strings, even if there is only one string in the Array (e.g., --require c.js becomes {require: ['c.js']}. That makes the API more consistent at the expense of making the common case (no repetition) slightly more awkward.

String type when there's no repetition seems fine.

Remove the "count" behavior. While this is widely supported by modules, I don't often see it used in the wild in Node.js CLI apps.

I never seen this behavior on any CLI apps (Node.js or otherwise), IMO it can be removed

In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals"). This does align with other documentation I've read on the subject of CLI arguments, I am unsure if introducing this terminology to our documentation is a Good Idea. Perhaps it can be expressed without new terminology.

Python uses the terminology "arguments" (https://docs.python.org/3/library/argparse.html), but we don't have to use the same terminology. We need to use a terminology, the one you proposed seems as good as any, as long as we are consistent across our documentation (including documentation about Node.js CLI options) about it we're good.

I sorted some files around my modification in in node.gyp, which looked like it wanted to be in order, but was not. It did not seem to affect the build, but I can revert these changes if need be.

We tend to avoid reordering things unless necessary because it makes git archaeology harder.

Copy link
Contributor

@mmarchini mmarchini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still reviewing, commenting on the docs first so I don't lose my comments)

Comment on lines 1692 to 1698
* `argv` {Array<string>|Object} (Optional) Array of argument strings; defaults
to [`process.argv.slice(2)`](process_argv). If an Object, the default is used,
and this parameter is considered to be the `options` parameter.
* `options` {Object} (Optional) The `options` parameter, if present, is an
object supporting the following property:
* `expectsValue` {Array<string>|string} (Optional) One or more argument
strings which _expect a value_ when present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how we usually write optional parameters ("If an..., it is ...")

https://nodejs.org/docs/latest-v12.x/api/crypto.html#crypto_cipher_update_data_inputencoding_outputencoding

I'm worried the over verbosity might make it more complicated to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed this somewhat, but I find the linked documentation too terse. 🤷

doc/api/process.md Outdated Show resolved Hide resolved
object supporting the following property:
* `expectsValue` {Array<string>|string} (Optional) One or more argument
strings which _expect a value_ when present
* Returns: {Object} An object having properties corresponding to parsed Options
Copy link
Contributor

@mmarchini mmarchini Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm with Gus here, userland convention originated before deconstructors, so having a _ property could be considered better/more attractive than returning options: { ... }, positionals: [...] }. The explicit object return is, IMO, more intuitive and user-friendly:

const { options, positionals } = processArgs()

_ might be somewhat familiar to users used to yargs, commander etc, but it won't be intuitive for new users.

doc/api/process.md Outdated Show resolved Hide resolved
doc/api/process.md Outdated Show resolved Hide resolved
doc/api/process.md Outdated Show resolved Hide resolved
doc/api/process.md Outdated Show resolved Hide resolved
following a `--` (e.g., `['--', 'script.js']`)
* Positionals appear in the Array property `_` of the returned object
* The `_` property will _always_ be present and an Array, even if empty
* If present in the `argv` Array, `--` is discarded and is omitted from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some user land lib has this behavior and I remember finding it very confusing. If users want to pass through everything after -- to a child process, there's no way of knowing which positionals were after --.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, though I'm unsure if it's really common enough to worry about here... again, if this API doesn't cut it, there are great userland libraries that will (though, I'm not sure if they do in this case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I'm unsure if it's really common enough to worry about here

It's more common than the count behavior :)

doc/api/process.md Outdated Show resolved Hide resolved
doc/api/process.md Outdated Show resolved Hide resolved
node.gyp Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
lib/internal/process/parse_args.js Outdated Show resolved Hide resolved
@mmarchini
Copy link
Contributor

IMO if a flag with = is provided by the user but the option is not present on expectsValue, parseArgs should throw. I want to hear other's opinion wrt this though.

@devsnek
Copy link
Member

devsnek commented Sep 3, 2020

I think the proper solution would be a system you have to pass in all the args you expect, whether they're flags or options, if it's an option whether it's required, whether certain positional arguments are required, etc.

@mmarchini
Copy link
Contributor

mmarchini commented Sep 3, 2020

@devsnek I kinda agree, this API is kind of a mix of low level and high level API: high level due to the magic bits, but low level because the user has to implement so many boilerplate things that are common on CLI tools (--help, options validation, type conversion, etc.). The way it is implemented today it would be possible to add a "strict" mode in the future though, or a more comprehensive parser builder.

The API clearly has a goal of simplicity and is not intended to fully replace existing parser modules, nor is intended to cater for more complex use cases without offloading the heavy work to the user, or at least that's my understanding.

@boneskull
Copy link
Contributor Author

Regarding the "count" feature: ssh uses it; see man page. But that's the only one I can think of offhand. It's not in wide use, but it's also an information channel that is lost if we do not support it.

Regarding returning diff't data structures (a string for singular option values and an array for multiples), I would rather @bcoe could speak to the relative wisdom of such an API, because that is yargs's default behavior, and it is the most popular userland arg parsing lib. Does it trip people up?

Regarding whether options should be explicitly defined... maybe, but, a better developer experience is to not throw on unknown options, and let the API consumer decide what to do with them. In a CLI app, throwing an exception should be avoided wherever possible--end-users typically don't need to see a stack trace, and whether the exception message would actually provide actionable context is a crapshoot. I think we'd rather let the API consumer decide how to express such an error (or silently accept unknown options).

But otherwise, I'm unclear on what explicit definition of boolean-flag-style options really buys an API consumer. If it doesn't throw (and IMO it should not), should this API silently ignore the unknown options? The API consumer would not know if unknown options were passed, unless we threw them into another bucket property, e.g. unknownFlags or provided another option to instruct parseArgs how to handle unknown options. Adding more ways to control its behavior is a very slippery slope (see yargs-parser; apologies Ben 😉).

Also: @nodejs/tooling in case you missed it

Comment on lines 61 to 66
if (typeof options !== 'object' || options === null) {
throw new ERR_INVALID_ARG_TYPE(
'options',
'object',
options);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (typeof options !== 'object' || options === null) {
throw new ERR_INVALID_ARG_TYPE(
'options',
'object',
options);
}
validateObject(options, 'options');

(from internal/validators)

if (arg === undefined) {
return result;
}
if (StringPrototypeStartsWith(arg, '-')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps

Suggested change
if (StringPrototypeStartsWith(arg, '-')) {
if (arg[0] === '-') {

? (and same for other single-char startsWith use-cases)

@tniessen
Copy link
Member

tniessen commented Sep 3, 2020

Many popular languages don't have argument parsers in their standard libraries, including Java, R, C, C++, C#, etc.

But a few standard libraries do have built-in option parsers.

  • For C/C++, there are getopt, getopt_long, and getopt_long_only in the POSIX/GNU standard library. Even though these are historic remnants and most C/C++ projects implement their own parsing logic or use external libraries, the newer getopt variants do provide a bit of flexibility. For example, they allow defining options before parsing.
  • Boost has probably almost reached the point of being a standard library for C++, and provides the powerful Boost.Program_options class.
  • Python has an older, deprecated option parser getopt that is very similar to the traditional getopt function in POSIX C/C++. This parser has since been replaced with the new optparse module, which is much more flexible and configurable.
  • Similarly, Go has flag, which is also quite flexible.

All popular languages provide option parsers as external libraries. Node.js probably has hundreds via npm, C# has System.CommandLine. Java has many, including Apache Commons CLI. R has optparse.

So it seems to me that any modern framework either brings no option parser with it, or a much more powerful parser than the one being proposed here. Also, I am afraid that adding this to Node.js might reduce awareness for better solutions in the ecosystem.

See also Sam's concerns in nodejs/tooling#19 (comment), nodejs/tooling#19 (comment), nodejs/tooling#19 (comment).

@mmarchini
Copy link
Contributor

Regarding whether options should be explicitly defined... maybe, but, a better developer experience is to not throw on unknown options, and let the API consumer decide what to do with them. In a CLI app, throwing an exception should be avoided wherever possible--end-users typically don't need to see a stack trace, and whether the exception message would actually provide actionable context is a crapshoot. I think we'd rather let the API consumer decide how to express such an error (or silently accept unknown options).

But otherwise, I'm unclear on what explicit definition of boolean-flag-style options really buys an API consumer. If it doesn't throw (and IMO it should not), should this API silently ignore the unknown options? The API consumer would not know if unknown options were passed, unless we threw them into another bucket property, e.g. unknownFlags or provided another option to instruct parseArgs how to handle unknown options. Adding more ways to control its behavior is a very slippery slope (see yargs-parser; apologies Ben wink).

With the approach on this PR the developer has to validate all options through if statements or similar and then manually write a help message for their end users, whereas with explicit declaration that throws the developer can catch the error and print the help message (because we can generate a help message with explicit declaration). It seems a lot more work for developers or the developer will take the simple road and it will be a worse experience for their final user.

@bcoe
Copy link
Contributor

bcoe commented Sep 3, 2020

So it seems to me that any modern framework either brings no option parser with it, or a much more powerful parser than the one being proposed here. Also, I am afraid that adding this to Node.js might reduce awareness for better solutions in the ecosystem.

I would like to chime in from the perspective of someone who is both supportive of this functionality, and leads development yargs (one of the most widely adopted argument parsers in the npm ecosystem).

Thinking from the perspective of my day job

For my job, I work at Google as a DPE. We behave as customer zero for Google Cloud Products:

  • helping build the client libraries that interact with APIs.
  • writing samples and documentation for our various products.
  • interacting with language communities (👋 hello).

It's my work around samples that has me advocating for a built-in command line argument parser in Node.js.

It allows you to provide an elegant snippet of code to folks that just works:

const {projectId, datasetName} = process.parseArgs()
const bigqueryClient = new BigQuery({
  projectId: args
});
const [dataset] = await bigqueryClient.createDataset(datasetName);
console.log(`Dataset ${dataset.id} created.`);

It creates confusion for users that they need to bring other dependencies to the table to use an exemplary snippet of code (beyond the code you're writing an example for).

Thinking from the perspective of a Node.js tooling author

Node.js is a powerful platform, and I would love to be able to do more with it without pulling in dependencies:

Node.js has great http built-ins, fs built-ins, encryption built-ins, etc., and yet when I write a new library I tend to pull in dependencies like: mkdirp, rimraf, or serve. Why? so I can write npm scripts like this:

{
  "scripts": {
    "precompile": "rimraf ./build",
    "compile": "... some build step"
  }
}

Python exposes a variety of useful built-ins as modules, and allows you to run them as command line applications:

python -m http.server 8000 --bind 127.0.0.1

What if I could do this in Node.js?

{
  "scripts": {
    "precompile": "node -m rmdir --recursive ./build",
    "compile": "... some build step"
  }
}

Having a built-in command line argument parser I'm convinced could help unlock a variety of interesting ideas like this.

@jkrems
Copy link
Contributor

jkrems commented Sep 3, 2020

In terms of opinions for repeated flags: I have seen too many bugs caused by repeated query params breaking apps that didn't bother to check for isArray. In practice this kind of "almost always a string but if you pass 2+ it becomes an array" leads to one of two things: super verbose defensive code that needs to type check every step of the way OR tools that throw errors because the file named "[a,b]" couldn't be found (or even worse "x.substring isn't a function").

So... I personally would prefer if support for repetition would be dropped (it's the much less often used kind afaict), made explicit via a 2nd option, or if it's always an array and never a string.

Copy link
Contributor

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few initial nits, but I feel like this is an awesome start.

doc/api/process.md Outdated Show resolved Hide resolved
object supporting the following property:
* `expectsValue` {Array<string>|string} (Optional) One or more argument
strings which _expect a value_ when present
* Returns: {Object} An object having properties corresponding to parsed Options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that const { options, positionals } = processArgs() is a pretty elegant compromise, I never loved _ 😆

doc/api/process.md Outdated Show resolved Hide resolved
@mmarchini
Copy link
Contributor

So... I personally would prefer if support for repetition would be dropped (it's the much less often used kind afaict), made explicit via a 2nd option, or if it's always an array and never a string.

I don't think it's uncommon enough for us to drop it (personally I've used it on quite a few CLIs I wrote). Always an array seems like a good option for me.

One feature I don't think this proposal supports, which I know is quite popular, is the concept of arrays

The current proposal supports arrays :D

@devsnek
Copy link
Member

devsnek commented Sep 18, 2020

Honestly I'm not sure we're ever going to approach "good" behaviour, much less perfect, if so much guessing and coercion has to be involved.

@gabrielschulhof
Copy link
Contributor

@boneskull I think we should leave interpretation to the user. The only thing we should absolutely ensure is that, if the option has a parameter, that it makes it into the result as a string so the user can then process it as they see fit.

@boneskull
Copy link
Contributor Author

boneskull commented Sep 18, 2020

@gabrielschulhof So essentially you're proposing:

  • if you do not use optionsWithValues, you can expect a boolean true, string, or undefined
  • if you use optionsWithValues, you can expect a string (empty string in the case of a missing value) or undefined

(Is that correct?)

@boneskull
Copy link
Contributor Author

It's important that @tniessen gives the green light to this strategy.

@boneskull boneskull added the notable-change PRs with changes that should be highlighted in changelogs. label Sep 18, 2020
@tniessen
Copy link
Member

please describe how it should work instead of waiting for me to make a change as @tniessen is suggesting, then coming up with something else.

I don't think I suggested any changes directly, at least none that you implemented. I did highlight problems, but I never said I had solutions to these problems (apart from leaving option parsing to the ecosystem).

You are looking for an API that should be as simple as possible, should work with basically zero configuration, and should never throw exceptions based on argv. From the very beginning, I was doubtful towards such an API in core, and I expressed that, like multiple others have in nodejs/tooling#19. My specific concerns are correctness and safety, first mentioned here and later here. Based on that concern, I requested changes here. My position did not change, and I never suggested any changes.

You expect a solution from me, but I don't have one. I simply don't know how to combine the properties I am concerned about (safety and correctness) with the properties you are focusing on (simplicity and not throwing).

This is nothing personal @boneskull, and I understand your frustration. Nobody has proposed any complete solutions, and at this point, I don't even know if a good solution exists :)

if you do not use optionsWithValues, you can expect a boolean true, string, or undefined

As you correctly mentioned in #35015 (comment), this is not safe either, but probably at least closer to being correct. However, it introduces return values with unpredictable types, and I don't understand why this behavior would be preferable over throwing an error. I know that you are not considering that an option (no pun intended), but I personally don't get why.

To me, it seems illogical to accept values for flags, which, according to the documentation, don't have values.

To use the proposed API safely, developers must check whether values were passed for flags (which don't accept values). If a value is permitted for a flag, why didn't the developer declare it as an option, which takes an optional value? If a value is passed for a flag, most applications should probably throw or display an error. In other words, it seems to me that not throwing an error in this API makes safe usage much more difficult for developers, compared to just handling an error. I simply don't see the logic behind this.

This is what I assume safe usage would look like:

const { options, positionals } = util.parseArgs({ optionsWithValues: ['foo'] });

// Since Node.js allows passing anything for flags, we need to check every single flag for a value.
// The value might be 'false' or something similar, which would lead to incorrect behavior if ignored.
// Node.js could absolutely do this itself, but it doesn't want to.
for (const name of ['bar', 'baz', 'qux', 'quux', 'quuz', 'corge', 'grault']) {
  if (options[name] !== undefined && options[name] !== true) {
    console.error(`I did not expect a value for the flag ${name}, but Node.js accepted one.`);
    console.error(`I cannot really help you, other than tell you not to specify a value for this flag.`);
    process.exit(1);
  }
}

// Okay, after manually checking whether Node.js accepted any values for
// flags that should not accept values, we can finally actually use the values.
const { foo, bar, baz, qux, quux, quuz, corge, grault } = options;

// Oh wait, we also need to make sure no unexpected positionals were passed...
// (Not included here)

I honestly don't understand how this is better than simply handling one exception. I know you don't want to consider this, but I am interested in the reason.

It's important that @tniessen gives the green light to this strategy.

If the PR is somewhat close to implementing somewhat correct behavior, and if the resulting unsafe behavior is properly documented, I'll dismiss my request for changes. That doesn't mean I'll approve the PR, but I also won't stand in the way.

There must be some way for the API to be used safely. With its current state, I don't see how that would be possible, and adding inherently unsafe APIs to Node.js does not seem logical to me.

As much as I am not a fan of the proposed solution, it seems to allow safe usage of the API, assuming developers are aware of the potential pitfalls caused by it, and implement their own error checking.

Of course, you can also leave it to the TSC to overrule my request for changes. Maybe others are fine with landing it as it is, despite the resulting safety and correctness issues.

@mmarchini
Copy link
Contributor

fwiw I also raised concerns about correctness and safety, but I think those comments got lost when the PR was updated at some point.

@boneskull
Copy link
Contributor Author

@mmarchini I haven't rebased... what are you referring to?

@mmarchini
Copy link
Contributor

Also as a point of order: given @tniessen objection and the fact that to reach a correct and safe implementation we might end up with material changes to the API, I don't think it's worth discussing Gus objection right now (as the code they're objecting might not exist on a future solution). If that piece of code remains once @tniessen objection is either resolved or dismissed, the TSC must reach a decision on Gus' objection in a timely manner.

@mmarchini
Copy link
Contributor

@boneskull I'll try to find my comments, but as I said they disappeared (at least in the code view), so it might take a while to find it.

@boneskull
Copy link
Contributor Author

Won't be pushing any changes until Tuesday. I plan on implementing this and updating the documentation accordingly.

@mmarchini
Copy link
Contributor

Ok, so here are my comments related to correctness and safety (although I didn't use those specific words, and I apologize if I wasn't as clear as @tniessen on my concerns):

(note I didn't make it an explicit objection because I do believe we can have a follow up API with more strict/structured format, but I definitely see why others might think a loose API is not ideal in the first place)

@boneskull
Copy link
Contributor Author

boneskull commented Sep 18, 2020

I don't believe balancing these concerns is impossible.

OKAY, what about this:

We want to know if an end-user gave us weird or unexpected input. A flag having a value, for instance. e.g., --verbose=yes is unexpected; --verbose was the correct usage.

Instead of just returning {options: {verbose: 'yes'}}, we return {options: {}, errors: [SomeError]}, where errors is one or more Problems.

This way, we can:

  1. keep the value of "flags" as boolean true (consistent types) and
  2. we lose no information and
  3. we do not throw.

This means that --verbose=false is an "error." The developer can choose how to interpret what was provided. SomeError could be an actual Error or not, but a code and contextual information would be helpful, so I'm leaning towards yes.

Likewise, if optionsWithValues was specified, a missing value would also be considered an "error", instead of returning an empty string.

// called via `node script.js --foo bar baz`
const argv = util.parseArgs();

if (argv.foo === true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argv.options.foo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@mmarchini mmarchini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking objection explicitly to make sure this don't fall off the cracks:

-- shouldn't be discarded as it makes it impossible for the developer to provide pass-through behavior for their scripts.

What is the behavior of the API today:

> util.parseArgs(['--foo', '--', '--bar'])
{ options: { 'foo': true }, { positionals: ['--bar'] } }

What I expect from the API:

> util.parseArgs(['--foo', '--', '--bar'])
{ options: { 'foo': true }, { positionals: ['--', '--bar'] } }

My objection is resolved with the above change in behavior.

@boneskull
Copy link
Contributor Author

@mmarchini I think that is reasonable.

@mmarchini
Copy link
Contributor

mmarchini commented Sep 19, 2020

Edited my comment just to clarify that my objection is resolved with the change above (just to avoid confusion). Also, I'm happy with other solutions for the passthrough problem, I only suggested that one because it's the simplest solution I could think of.

Other options I find reasonable:

  • preserving the -- being optional via parseArgs parameter
  • return passthrough items in a separate array ({options, positionals, passthrough})
  • return the index of the first passthrough option (same return structure as previous suggestion, but passthrough is a numeric attribute)

) => {
if (!ArrayIsArray(argv)) {
options = argv;
argv = ArrayPrototypeSlice(process.argv, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to process.argv.slice(2) is fine but I think the implementation should somewhat also consider the eval usecase, right now if I try running node -p|-e I get a very unhelpful error:

$ node -p 'require('util').parseArgs()' foo bar
internal/validators.js:122
    throw new ERR_INVALID_ARG_TYPE(name, 'string', value);
    ^

TypeError [ERR_INVALID_ARG_TYPE]: The "id" argument must be of type string. Received an instance of Object
    at new NodeError (internal/errors.js:253:15)
    at validateString (internal/validators.js:122:11)
    at Module.require (internal/modules/cjs/loader.js:972:3)
    at require (internal/modules/cjs/helpers.js:88:18)
    at [eval]:1:1
    at Script.runInThisContext (vm.js:132:18)
    at Object.runInThisContext (vm.js:309:38)
    at internal/process/execution.js:77:19
    at [eval]-wrapper:6:22
    at evalScript (internal/process/execution.js:76:60) {
  code: 'ERR_INVALID_ARG_TYPE'
}

In the past I've done something like process.argv.slice(require.main ? 2 : 1) in order to support it (though there might be better ways to handle the check in core).

IMO parseArgs should either handle eval/script properly OR at least throw an useful error instead 😊

@boneskull boneskull closed this Sep 20, 2020
@kibertoad
Copy link
Contributor

@boneskull Why was this closed?

@bcoe
Copy link
Contributor

bcoe commented Dec 21, 2020

@kibertoad there are efforts to reopen the discussion around this feature in the new year.

@kibertoad
Copy link
Contributor

Awesome!
If any development support would be needed, more than happy to help.

@shadowspawn
Copy link
Member

Questions
In particular, I'm not 100% confident in the terminology I chose for the documentation ("Flags", "Options", "Positionals"). This does align with other documentation I've read on the subject of CLI arguments, [...]

For possible interest, I researched what terminology to use consistently in Commander docs last year. I settled on different terms, but did mention these as used elsewhere.

Short version, Commander terminology: The command line arguments are made up of options, option-arguments, commands, and command-arguments.

Long version: https://github.com/tj/commander.js/blob/master/docs/terminology.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version. tsc-agenda Issues and PRs to discuss during the meetings of the TSC. util Issues and PRs related to the built-in util module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.