Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage OsString much more deeply througout #112

Merged
merged 1 commit into from
May 12, 2017

Conversation

alexcrichton
Copy link
Contributor

This initially started out tackling #93 and ended up plumbing the OsStr type
basically everywhere! This knocked out a TODO or two along the way and should
remove some pesky unwraps and/or error handling around handling of arguments and
paths that may not be unicode.

Some notable other points are:

  • A few utilities were added to src/util.rs to assist with handling os strings
  • The Rust implementation doesn't deal with os strings in argument parsing at
    all, but otherwise internally stores os strings for maximal compatibility
    (more comments in the code).
  • Some unsafe transmutes were removed in util in favor of a more stable
    implementation.
  • The output_file function was renamed to output_pretty as it turns out it's
    not actually the actual output file but rather just a pretty description of it
    (for logging and such).

Closes #93

@gumpt
Copy link

gumpt commented Apr 28, 2017

This would be so nice! I ran into a lot of conversions when working on #104 between PathBuf <-> OsString <-> String.

Copy link
Contributor

@luser luser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few comments, nothing major. Feel free to fix as much as is reasonable and then land this.

env_vars: &[(OsString, OsString)],
pool: &CpuPool)
-> SFuture<HashResult<T>>
{
let me = *self;
let CCompilerHasher { parsed_args, executable, executable_digest, compiler } = me;
let result = compiler.preprocess(creator, &executable, &parsed_args, cwd, env_vars, pool);
let out_file = parsed_args.output_file().into_owned();
let out_pretty = parsed_args.output_pretty().into_owned();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering recently if we shouldn't make output_file return an Rc<String> or something like that, given how much we copy it around just for logging purposes...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah one thing I was considering in the logging PR was to add the output file to the "local logger's scope" and that way it'd automatically get prepended to all log messages. What would you think of that approach? (I can change that PR over there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make things a lot nicer vs. all the cloning we do just to have useful log messages.

.collect::<String>();
hash_key(&executable_digest, &arguments, &env_vars, &preprocessor_result.stdout)
hash_key(&executable_digest,
&parsed_args.common_args,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're undoing the "remove object file" bit here. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right sorry I meant to ask you about this. I wasn't really sure if this was still necessary. The test here is against output_file which is what prompted me to rename the relevant method to output_pretty because the return value of output_file is just the file name, not the full path.

In that sense I'm not sure that this was 100% working before, and I otherwise couldn't figure out what it was intended to be doing. I thought the output file was already missing from the common_args?

pub fn hash_key(compiler_digest: &str, arguments: &str, env_vars: &[(OsString, OsString)],
pub fn hash_key(compiler_digest: &str,
arguments: &[OsString],
env_vars: &[(OsString, OsString)],
preprocessor_output: &[u8]) -> String
{
// If you change any of the inputs to the hash, you should change `CACHE_VERSION`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So HashToSha1 will not change hashes, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah I think in practice this is going to end up disturbing the hashes a bit. The motivation of HashToSha1 was to avoid transmuting OsStr to bytes (which while valid today isn't something we'd like to commit to in libstd).

Is changing the hash though a problem? I could back that out if it's best left for another time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the hash is fine, but if you're doing so please bump the cache version as well so we're explicit about it.


/// Returns an iterator over the results of this compilation.
///
/// Each item is a descriptive (and unique) name of the outpu paired with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling nit: 'output'. Thanks for adding a doc comment!

"-o" => output_arg = it.next(),
"-gsplit-dwarf" => {
match arg.to_str() {
Some("-c") => compilation = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little clunky, having to put Some on every match arm. Could you do something like:

let arg = match arg.to_str() {
  None => continue,
  Some(s) => s,
};
match arg {
...

(I have no idea if that's actually valid Rust)

write!(f, "{}: {} ", objfile, parsed_args.input)?;
let mut f = File::create(cwd.join(depfile))?;

// TODO: these calls to `display` are lossy ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really worse than what we had before, where a source file name that wasn't valid Unicode would just fail somewhere along the way, but it's a little worrying that we could be writing bad data here and silently breaking things. In any event, you don't have to fix this right now. I was going to say you could use local_encoding like we're using to decode the MSVC output, but that only deals with turning strings into bytes and that doesn't actually help. I guess we could open an issue on local_encoding to support encoding &[u16] or some such to bytes, and then pass it input.to_wide()? It'd basically just need to call WideCharToMultiByte, and I'm sure it's already doing something very similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh looks like WideCharToMultiByte may actually be supported, I'll play around with calling that (never tried it before!)

{
// While we could go the extra mile here and handle non-utf8 `OsString`
// instances the rustc compiler certainly does not. With that knowledge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah!

assert_eq!(os_str_bytes(OsStr::new("你好")), "你好".as_bytes());
pub trait OsStrExt {
fn starts_with(&self, s: &str) -> bool;
fn split_prefix(&self, s: &str) -> Option<OsString>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep thinking that these sorts of things would be really nice in std.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! We on the libs team have often wanted some form of string-like API added to OsStr as well, we just never figured out the best way to do so... For now less efficient implementations :(

// Attempt to interpret this OsStr as utf-16. This is a pretty "poor
// man's" implementation, however, as it only handles a subset of
// unicode characters in `s`. Currently that's sufficient, though, as
// we're only calling `starts_with` with ascii string literals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever, but not so clever that it's hard to understand. :)

src/util.rs Outdated
}

fn finish(&self) -> u64 {
panic!("not supposed to be ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panic string feels unfinished?

@luser
Copy link
Contributor

luser commented May 3, 2017

Oh! But you should fix whatever caused the CI bustage, obviously. Possibly just an issue being rebased over one of your other PRs?

error[E0425]: cannot find value `output` in this scope
   --> src\server.rs:597:54
    |
597 |                     error!("[{:?}] fatal error: {}", output, err);
    |                                                      ^^^^^^ not found in this scope
error[E0425]: cannot find value `output` in this scope
   --> src\server.rs:602:47
    |
602 |                         error!("[{:?}] \t{}", output, e);
    |                                               ^^^^^^ not found in this scope
error: aborting due to 2 previous errors

@alexcrichton alexcrichton force-pushed the osstring branch 4 times, most recently from e74738d to 037cd14 Compare May 3, 2017 22:49
@alexcrichton
Copy link
Contributor Author

Ok I think I've updated with all comments, but I'll hold off on merging until I hear back on removing the object file bit changing the hash value.

If you wanna take a peek at what I did to parse_argument in gcc/msvc you can also tell me if I'm crazy!

@alexcrichton alexcrichton force-pushed the osstring branch 2 times, most recently from ea2d2fa to ef203dc Compare May 11, 2017 00:06
@alexcrichton
Copy link
Contributor Author

@luser just to confirm, is removing the object file handling ok there?

@luser
Copy link
Contributor

luser commented May 12, 2017

just to confirm, is removing the object file handling ok there?

Given that the code as written was broken, that's fine. We can reopen the issue that made that change originally.

@luser
Copy link
Contributor

luser commented May 12, 2017

If you wanna take a peek at what I did to parse_argument in gcc/msvc you can also tell me if I'm crazy!

Those changes look reasonable. You've got CI bustage, maybe it's just from rebasing over 70b4426 ? That added some new tests that probably need the same fixup you did for other tests.

Feel free to land this once you fix that.

This initially started out tackling mozilla#93 and ended up plumbing the `OsStr` type
basically everywhere! This knocked out a TODO or two along the way and should
remove some pesky unwraps and/or error handling around handling of arguments and
paths that may not be unicode.

Some notable other points are:

* A few utilities were added to `src/util.rs` to assist with handling os strings
* The Rust implementation doesn't deal with os strings in argument parsing at
  all, but otherwise internally stores os strings for maximal compatibility
  (more comments in the code).
* Some unsafe transmutes were removed in util in favor of a more stable
  implementation.
* The `output_file` function was renamed to `output_pretty` as it turns out it's
  not actually the actual output file but rather just a pretty description of it
  (for logging and such).

Closes mozilla#93
@alexcrichton
Copy link
Contributor Author

Ok thanks for the info! I've opened an issue at #121

@alexcrichton alexcrichton merged commit 982d5dc into mozilla:master May 12, 2017
@alexcrichton alexcrichton deleted the osstring branch May 12, 2017 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants