Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation should mention how to accept filenames as arguments. #15

Open
vi opened this issue Mar 27, 2019 · 12 comments
Open

Documentation should mention how to accept filenames as arguments. #15

vi opened this issue Mar 27, 2019 · 12 comments

Comments

@vi
Copy link

vi commented Mar 27, 2019

A filename or path is not a String, but OsString. I don't see any mention of OS strings in crate-level documentation for gumdrop.

Also I see parse(try_from_str and parse(from_str, but I don't see the exhausive list of what I can put into parse(...). Is there from_os_str?

@murarth
Copy link
Owner

murarth commented Mar 27, 2019

gumdrop does not currently accept OsString values at all. The Parser structure accepts only values implementing AsRef<str>.

This is because the parser must check whether each string begins with the substring - or -- in order to report short and long options to the caller. OsStr does not currently provide a platform-independent method equivalent to str::starts_with. Nor does it allow iterating over the characters of a string (which is required for processing short option strings).

It may be possible to implement an internal gumdrop API that implements these operations atop the platform-specific Unix and Windows OsStr extensions, but I have not yet explored this possibility.

@vi
Copy link
Author

vi commented Mar 27, 2019

starts_with

Just view OsStr as byte slice, then use something like starts_with(b"--")? Should work, at least on Unix. I also found this (but haven't checked).

does not currently accept OsString values at all

It should be documented then, with specific mention about filenames. What happens if one specifies a non-UTF8 argument?

@murarth
Copy link
Owner

murarth commented Mar 27, 2019

It would not be possible to implement this feature only on Unix, as it requires changing the public API to accept argument values as T: AsRef<OsStr> instead of T: AsRef<str>. Making such a change on only one platform would be messy and not a very good idea. However, it may be possible to also implement this feature on Windows. I will investigate this soon.

As to your last question, the functions parse_args_or_exit and parse_args_default_or_exit call std::env::args to the get the list of arguments as String values. That function will panic if any arguments cannot be stored in a String. I have just pushed a commit that adds this documentation.

@vi
Copy link
Author

vi commented Mar 27, 2019

In any case the limitation should be mentioned in the docs for current version. For example, example code blocks may include #[options(short="o")] output_file: Option<String>, with a notice in the comment about non-Unicode pitfall.

Bugs in handling of non-unicode filenames are easy to miss.

@tarcieri
Copy link

What's the concrete issue here? At least on recent Rust versions, I'm able to parse arguments to e.g. PathBuf. Perhaps this is facilitated by some recently added trait impls?

@Arcterus
Copy link

Internally, gumdrop processes arguments using str rather than OsStr, meaning that non-Unicode filenames given as arguments can cause problems. When you're parsing the arguments to PathBuf or whatever, it's actually just converting &str/String to PathBuf rather than &OsStr/OsString, so it will still break with non-Unicode paths.

All of the above assumes nothing has changed in the past few months (I'm pretty sure nothing has).

@tarcieri
Copy link

Internally, gumdrop processes arguments using str rather than OsStr

...but as @murarth already pointed out, the arguments come in as String, not OsStr(ing).

Can you give concrete, minified examples (outside of gumdrop, using std alone) of:

  1. How one of these breakages might occur
  2. How to avoid the issue

?

@Arcterus
Copy link

You can avoid the issue by using OsString rather than String. You can use std::env::args_os() to get an iterator returning the arguments as OsStrings (or &OsStrs? I can’t remember which). This crate uses std::env::args() if it’s getting Strings (as mentioned by @murarth).

You can easily break something that uses String for argument parsing by creating a file using garbage non-Unicode characters in the path and trying to use the path as an argument to the program.

If you want a concrete example that doesn’t use gumdrop, just find any program using the getopts crate and give it a non-Unicode path.

@tarcieri
Copy link

You can avoid the issue by using OsString rather than String.

Wouldn’t that have an analogous problem in that it would break parsing of Unicode String arguments?

@Arcterus
Copy link

Do you actually need to know that the argument is Unicode most of the time? If it’s a path that you use to open a file, you likely don’t. If it’s a command-line option, you can just use the OsStrExt or whatever methods to check the string. If you really need Unicode for whatever reason for one string, you can either do a lossy conversion or a checked conversion to String (Cow<‘_, str> for the lossy conversion).

The benefit of OsString is that you have the ability to handle everything (although it may be more effort), whereas String will just not work in certain cases.

@Freaky
Copy link

Freaky commented Aug 1, 2019

clap-rs/clap#1524 might be of interest. I made an OsStr wrapper with string-like methods that tries to do-the-right-thing, and have had a go at replacing clap's existing OsStr handling with it.

@ssokolow
Copy link

ssokolow commented May 31, 2020

For the record, this is the number-one reason why I'm not even considering switching from structopt to gumdrop to cut down the size of the template I use for writing little CLI utilities.

I have mojibake'd paths on my drives that would panic something that assumes arguments are valid UTF-8.

In fact, they revealed that Serde's PathBuf serialization fails unexpectedly if you try to output not-valid-UTF8 paths to JSON. My workaround for that was to come up with a newtype implementing an escaping scheme. (POSIX paths may contain any sequence of bytes except binary null, and spec-compliant JSON parsers are required to support strings containing binary nulls, so, by using binary null as an escape character that means "reinterpret the next codepoint as a byte", the 99+% of paths that are String-able pass through the escaping process unchanged.)

(The same project also discovered that Serde panics on attempting to serialize the pre-epoch modification timestamps from an old floppy disk with mildly corrupted filesystem metadata.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants