Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::fs::canonicalize returns UNC paths on Windows, and a lot of software doesn't support UNC paths #42869

Open
radix opened this issue Jun 23, 2017 · 51 comments
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@radix
Copy link
Contributor

radix commented Jun 23, 2017

Hi, I hope this is the right forum/format to register this problem, let me know if it's not.

Today I tried to use std::fs::canonicalize to make a path absolute so that I could execute it with std::process::Command. canonicalize returns so-called "UNC paths", which look like this: \\?\C:\foo\bar\... (sometimes the ? can be a hostname).

It turns out you can't pass a UNC path as the current directory when starting a process (i.e., Command::new(...).current_dir(unc_path)). In fact, a lot of other apps will blow up if you pass them a UNC path: for example, Microsoft's own cl.exe compiler doesn't support it: rust-lang/cc-rs#169

It feels to me that maybe returning UNC paths from canonicalize is the wrong choice, given that they don't work in so many places. It'd probably be better to return a simple "absolute path", which begins with the drive letter, instead of returning a UNC path, and instead provide a separate function specifically for generating UNC paths for people who need them.

Maybe if this is too much of an incompatible change, a new function for creating absolute paths should be added to std? I'd bet, however, that making the change to canonicalize itself would suddenly make more software suddenly start working rather than suddenly break.

@Mark-Simulacrum Mark-Simulacrum added O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 23, 2017
@retep998
Copy link
Member

retep998 commented Jun 23, 2017

canonicalize simply asks the kernel to canonicalize the path, which it does and happens to return the canonical path as a root local device path. Root local device paths are capable of representing some paths which normal absolute paths are incapable of representing accurately (such as components being named ".." or "." or having "/" in them), along with the fact that they're the only way to call many system functions with paths longer than MAX_PATH (aside from being on some recent version of Windows 10 and having a certain registry key enabled). As a result having libstd automatically strip the prefix would definitely break some situations. However I'm definitely in favor of having more support in libstd for converting between these different kinds of paths so the user can easily turn a root local device path into an absolute path. I'd also love to have an fs::normalize which merely normalizes a possibly relative path into an absolute path without hitting the filesystem on Windows.

radix added a commit to arpeggiorpg/arpeggiorpg that referenced this issue Jun 23, 2017
@retep998
Copy link
Member

In reference to your commit which referenced this PR, normalization is not the same as merely joining the path onto the current directory due to drive relative paths being relative to the current directory on the given drive. For example given a drive relative path of C:foo, and an env::current_dir() of D:\bar, normalizing C:foo will have to get the current directory for C:\ and could end up being normalized to something radically different such as C:\i\dont\even\foo.

@radix
Copy link
Contributor Author

radix commented Jun 24, 2017

thanks, @retep998 :) it's just a hacked-together build tool that probably will eventually be replaced with something else, and I didn't intend to notify this ticket about my commit. but I guess it goes to show that a good way to get an absolute path in std would be really helpful.

@nagisa
Copy link
Member

nagisa commented Jun 25, 2017

Command::current_dir should be fixed. I doubt we will change canonlicalize.

Note, the i-wrong tag is only for the Command::current_dir, not the canonicalize behaviour.

@nagisa nagisa added the I-wrong label Jun 25, 2017
@retep998
Copy link
Member

Quick testing on Windows 10.0.15063 indicates that both SetCurrentDirectoryW and CreateProcessW are okay with a current directory starting with \\?\. They are not okay with a current directory that exceeds MAX_PATH regardless of \\?\. CreateProcessW is okay with the path to the process itself starting with \\?\ regardless of whether the first parameter is used. CreateProcessW is only okay with the path to the process exceeding MAX_PATH if it starts with \\?\ and is specified as the first parameter which Rust does not currently use. I tested std::process::Command::current_dir and it works as expected, accepting paths starting with \\?\ but rejecting any paths exceeding MAX_PATH.

@kornelski
Copy link
Contributor

kornelski commented Sep 5, 2017

Technically, AFAIK it is safe to strip the prefix in common simple cases (absolute path with a drive letter, no reserved names, shorter than max_path), and leave it otherwise.

So I think there's no need to compromise on correctness as far as stdlib goes. The trade-off is between failing early and exposing other software that doesn't support UNC paths vs maximizing interoperability with non-UNC software.

In an ideal world, I would prefer the "fail early" approach, so that limitations are quickly found and removed. However, Windows/DOS path handling has exceptionally long and messy history and decades of Microsoft bending over backwards to let old software not upgrade its path handling. If Microsoft can't push developers towards UNC, and fails to enforce this even in their own products, I have no hope of Rust shifting the Windows ecosystem to UNC. It will rather just frustrate Rust users and make Rust seem less reliable on Windows.

So in this case I suggest trying to maximize interoperability instead, and canonicalize to regular paths whenever possible (using UNC only for paths that can't be handled otherwise).

Also, careful stripping of the prefix done in stdlib will be much safer than other crates stripping it unconditionally (because realistically whenever someone runs into this problem, they'll just strip it unconditionally)

@ofek
Copy link

ofek commented Oct 9, 2017

@kornelski I completely agree. The current behavior is unexpected in my opinion.

@danielpclark
Copy link

danielpclark commented Oct 10, 2017

I hope this is helpful…

According to Microsoft:

Note File I/O functions in the Windows API convert / to \ as part of converting the name to an NT-style name, except when using the \\?\ prefix as detailed in the following sections.

Source: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

And the Ruby language uses forward slashes for File paths and that works on Windows.

@kornelski
Copy link
Contributor

kornelski commented Nov 22, 2017

I've looked at this problem in detail. There are a few rules which need to be checked to safely strip the UNC prefix. It can be implemented as a simple state machine.

I've implemented that using public APIs, but because OsStr is opaque it's not nearly as nice as stdlib's implementation could have been:

https://lib.rs/dunce

So I'm still hoping canonicalize would do it automatically, because if it's done only for legacy-compatible paths there's no downside: all paths work for UNC-aware programs, and all paths that can work for legacy programs work too.

@mykmelez
Copy link

mykmelez commented May 9, 2018

Another example of this issue that I encountered in alexcrichton/cargo-vendor#71:

url::URL.to_file_path() returns a non-UNC path (even if the URL was initialized with a UNC path). And std::path::Path.starts_with() doesn't normalize its arguments to UNC paths. So calling to_file_path() on a file: URL and then comparing it to the output of canonicalize() via starts_with() always returns false, even if the two paths represent the same resource:

extern crate url;

use std::path::Path;
use url::Url;

fn main() {
	// Path.canonicalize() returns a UNC path.
	let unc_path_buf = Path::new(r"C:\Windows\System").canonicalize().expect("path");
	let unc_path = unc_path_buf.as_path();

	// Meanwhile, Url.to_file_path() returns a non-UNC path,
	// even when initialized from a UNC path.
	let file_url = Url::from_file_path(unc_path).expect("url");
	let abs_path_buf = file_url.to_file_path().expect("path");
	let abs_path = abs_path_buf.as_path();

	// unc_path and abs_path refer to the same resource,
	// and they both "start with" themselves.
	assert!(unc_path.starts_with(unc_path));
	assert!(abs_path.starts_with(abs_path));

	// But they don't "start with" each other, so these fail.
	assert!(unc_path.starts_with(abs_path));
	assert!(abs_path.starts_with(unc_path));
}

Arguably, to_file_path() should return a UNC path, at least when initialized with one. And perhaps starts_with() should normalize its arguments (or perhaps clarify that it compares paths, not the resources to which they refer, and thus does no normalization). Also, the mitigation for consumers of this API is straightforward: canonicalize() all paths you compare if you do so to any of them. So maybe the current behavior is reasonable.

Nevertheless, it does feel like something of a footgun, so it's worth at least documenting how it differs from that of some other APIs on Windows.

@retep998
Copy link
Member

retep998 commented May 9, 2018

comparing it to the output of canonicalize() via starts_with() always returns false, even if the two paths represent the same resource:

Comparing canonical paths is a footgun in general because it is the wrong thing to do! Things like hard links and so on mean that such comparisons will never be entirely accurate. Please don't abuse canonicalization for this use case.

If you want to tell whether two paths point to the same file, compare their file IDs! That's what same-file does and it works great!

@kornelski
Copy link
Contributor

but starts_with is not for is-file-a-file comparison, but is-file-in-a-directory check. There are no hardlinks involved (and AFAIK apart from private implementation detail of macOS time machine, no OS supports directory hardlinks).

@retep998
Copy link
Member

retep998 commented May 9, 2018

There are more ways than just \\?\C:\ and C:\ to represent the same path, so unfortunately any sort of file in directory check is a hard problem. For example, paths can refer to drives via names other than drive letters.

@nagisa
Copy link
Member

nagisa commented May 9, 2018 via email

max-heller added a commit to max-heller/mdbook-pandoc that referenced this issue Apr 6, 2024
…nonicalize()` (#84)

On Windows, `std::fs::canonicalize()` [produces a weird type of path
that cannot be used with
`Command::current_dir()`](rust-lang/rust#42869).
This switches to using the
[`normpath`](https://docs.rs/normpath/latest/normpath/trait.PathExt.html#tymethod.normalize)
crate, which should avoid these issues.
@FeldrinH
Copy link

FeldrinH commented Jan 8, 2025

Clearly the current behavior of std::fs::canonicalize is problematic, as evidenced by the comments and issues above. It seems to me that dunce does more or less what people expect std::fs::canonicalize to do on Windows.

The logic used by dunce seems preferrable to what std::fs::canonicalize currently does in every case that I can think of. Dunce seems to have no outstanding issues, is widely used and trusted and should be compatible with all supported versions of Windows, so my simple proposal is this: why not upstream the logic used by dunce into std::fs::canonicalize? Is there any concrete objection to this?

@retep998
Copy link
Member

retep998 commented Jan 8, 2025

I do think we should provide an equivalent to dunce in the standard library. Whether std::fs::canonicalize should use it by default, I'm not entirely sure.

That said, I still insist that most people do not need std::fs::canonicalize in the first place, and would be much better off using std::path::absolute or same-file depending on their use case.

@FeldrinH
Copy link

FeldrinH commented Jan 8, 2025

I do think we should provide an equivalent to dunce in the standard library. Whether std::fs::canonicalize should use it by default, I'm not entirely sure.

I think having std::fs::canonicalize and dunce::canonicalize as separate functions in the standard library would just cause confusion for no extra value. What use case would prefer the current std::fs::canonicalize behavior over dunce::canonicalize? I can't think of a single one.

That said, I still insist that most people do not need std::fs::canonicalize in the first place, and would be much better off using std::path::absolute or same-file depending on their use case.

Maybe, but I think there are still valid and important use cases where std::fs::canonicalize/dunce::canonicalize is the right tool for the job. For example if I want to create some kind of lookup table where a file path is the key.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jan 8, 2025

The problem is that people may be intentionally relying on canonicalize to produce \\?\ style paths to bypass path length restrictions in the Windows API (or to convert to NT-style paths). So changing this now is a breaking change.

Solutions:

  • As retep998 says, std::path::absolute is better for many purposes
  • For when resolving all symlinks is needed, canonicalize is the right tool. Instead of changing the behaviour for everyone we could have a new method that converts any \\?\ style path to its Win32 equivalent (if possible). This does have the advantage of being more versatile. E.g. you can call this method on paths received from other applications, not just when using canonicalize.

@teohhanhui
Copy link

teohhanhui commented Jan 8, 2025

For when resolving all symlinks is needed, canonicalize is the right tool. Instead of changing the behaviour for everyone we could have a new method that converts any \\?\ style path to its Win32 equivalent (if possible). This does have the advantage of being more versatile. E.g. you can call this method on paths received from other applications, not just when using canonicalize.

That's just not going to work:

https://gitlab.com/kornelski/dunce/-/issues/3#note_1096103063

@kornelski
Copy link
Contributor

The length limit is not a problem. The UNC prefix can be kept for paths that exceed PATH_MAX or other limits.

The dunce crate has been doing this successfully for 7 years, and has millions of download per month.

@FeldrinH
Copy link

FeldrinH commented Jan 8, 2025

The problem is that people may be intentionally relying on canonicalize to produce \\?\ style paths to bypass path length restrictions in the Windows API (or to convert to NT-style paths). So changing this now is a breaking change.

As kornelski pointed out, the path length limit is not a problem, because dunce will automatically preserve the UNC prefix if it is needed because of the path length limit.

People who rely on std::fs::canonicalize to convert to NT-style paths are in my opinion misusing std::fs::canonicalize for a purpose that it is not intended for. While the documentation does state that it will convert to extended length path syntax, I have always interpreted that as an implementation detail rather than a strong guarantee.

For when resolving all symlinks is needed, canonicalize is the right tool. Instead of changing the behaviour for everyone we could have a new method that converts any \\?\ style path to its Win32 equivalent (if possible). This does have the advantage of being more versatile. E.g. you can call this method on paths received from other applications, not just when using canonicalize.

Having this as a separate method could be useful, but I still strongly believe that std::fs::canoncalize should do this conversion automatically (when it is safe to do so, as dunce does). If it needs to be done manually then people that want to use std::fs::canonicalize in cross-platform code will have to manually add the extra function call to strip UNC prefixes on Windows and will have to know and remember to do that every time (an easy thing to miss if you primarily test on Linux or Mac).

I think it is impossible to overstate that support for UNC paths on Windows is really spotty. Many programs will just refuse to work with UNC paths. You want to avoid UNC paths if at all possible.

@ChrisDenton
Copy link
Member

I think it is impossible to overstate that support for UNC paths on Windows is really spotty. Many programs will just refuse to work with UNC paths. You want to avoid them if at all possible.

I don't disagree with that. Hence why readlink, for example, does attempt the conversion.

But changing the documented behaviour of canonicalize is a breaking change and we have no way to gauge its impact. I've been (unintentionally) responsible for breaking other people's code before. I don't like it.

@nathaniel-daniel
Copy link

nathaniel-daniel commented Jan 8, 2025

As kornelski pointed out, the path length limit is not a problem, because dunce will automatically preserve the UNC prefix if it is needed because of the path length limit.

It is a breaking change. Users can push new path components to the path after canonicalization and then use that new path. I've done this a few times in some of my projects. At least the tar crate also seems to do this.

As a side note, I'm also a bit concerned about dunce's ability to implement the specified path parsing. The documentation has changed in the past:
Old: https://web.archive.org/web/20220920223716/https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
Current: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
LPT¹, LPT², and LPT³, which were previously not reserved, became reserved. The docs also clarified what they mean by "followed immediately by an extension". This clarification shows that dunce doesn't implement reserved file handling correctly:

fn main() {
    let cwd = std::env::current_dir().unwrap();
    let cwd = cwd.canonicalize().unwrap();
    dbg!(&cwd);
 
    let test_file = cwd.join("nul.tar.gz");
 
    std::fs::write(&test_file, "Hello World!").unwrap();
 
    let canonical_test = test_file.canonicalize().unwrap();
    dbg!(&canonical_test);
    dbg!(std::fs::read_to_string(canonical_test).unwrap());
 
    let dunce_test = dunce::canonicalize(test_file).unwrap();
    dbg!(&dunce_test);
    dbg!(std::fs::read_to_string(dunce_test).unwrap());
}
[src/main.rs:4:5] &cwd = "\\\\?\\C:\\Users\\natha\\Desktop\\html\\dunce-test"
[src/main.rs:11:5] &canonical_test = "\\\\?\\C:\\Users\\natha\\Desktop\\html\\dunce-test\\nul.tar.gz"
[src/main.rs:12:5] std::fs::read_to_string(canonical_test).unwrap() = "Hello World!"
[src/main.rs:15:5] &dunce_test = "C:\\Users\\natha\\Desktop\\html\\dunce-test\\nul.tar.gz"
[src/main.rs:16:5] std::fs::read_to_string(dunce_test).unwrap() = ""

So, using dunce can in some cases cause issues where there weren't any before. I also don't think there's anything stopping something similar from happening in the future.

Also just my 2 cents, but I think making canonicalize return a different path type based on whether it can safely de-UNC path would also be confusing. UNC and DOS paths can have different semantics for path components. Also, on some computers where a project or data directory is deeply nested, a canonicalize call on it could return a UNC path, while someone else's computer could return a normal path. This could make pushing components on that path and creating files work on one person's computer, but fail on another's. While I think this potential confusion is fine for a crate, I'd expect a stdlib api to be more predictable. I would also prefer if attempting to de-UNC a path was a manual operation. I may be a bit biased though, I only rarely run into issues with UNC paths and I just reach for dunce when I do.

As a side note, Windows 11 seems to not have reserved file names, but I can't find the documentation for this behavior.

@teohhanhui
Copy link

teohhanhui commented Jan 8, 2025

So, using dunce can in some cases cause issues where there weren't any before. I also don't think there's anything stopping something similar from happening in the future.

The other alternative crate normpath that has been suggested before in this thread calls GetFullPathNameW directly:

https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L143

@nathaniel-daniel
Copy link

So, using dunce can in some cases cause issues where there weren't any before. I also don't think there's anything stopping something similar from happening in the future.

The other alternative crate normpath that has been suggested before in this thread calls GetFullPathNameW directly:

https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L143

Unfortunately, this does not seem to be a suitable replacement in general for a canonicalization operation: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamew
"This function does not verify that the resulting path and file name are valid, or that they see an existing file on the associated volume."

I also haven't tested, but is this not the same as std::path::absolute?
"On Windows, for verbatim paths, this will simply return the path as given. For other paths, this is currently equivalent to calling GetFullPathNameW."

@teohhanhui
Copy link

teohhanhui commented Jan 8, 2025

"This function does not verify that the resulting path and file name are valid, or that they see an existing file on the associated volume."

Yes, but...

https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L149

I guess what most people actually want is sort of a combination of what canonicalize does for POSIX + what absolute does for Windows? (Or in other words, "canonicalize but no UNC paths on Windows please".) Hence the popularity of the crates...

@nathaniel-daniel
Copy link

"This function does not verify that the resulting path and file name are valid, or that they see an existing file on the associated volume."

Yes, but...

https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L149

I guess what most people actually want is sort of a combination of what canonicalize does for POSIX + what absolute does for Windows? (Or in other words, "canonicalize but no UNC paths on Windows please".) Hence the popularity of the crates...

Canonicalization is a lot more than confirming the file exists. You also need to resolve links, which I don't think GetFullPathNameW will do. I think most people probably want std::path::absolute, but I don't really want to generalize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests