Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: allow file: URL strings as paths #20944

Closed
wants to merge 1 commit into from

Conversation

guybedford
Copy link
Contributor

This extends support for WhatWG URL objects in fs APIs to also include file: URL strings beginning with file://.

The URL standard actually suggests using strings over the object where possible in https://url.spec.whatwg.org/#url-apis-elsewhere, so it would be good to be able to enable such a best-practice.

This will also enable use cases like fs.readFile(import.meta.url) for modules, where the URL is provided as a string.

While it is possible for files or directories to exist with names like 'file:' or 'file:/', by restricting only to the 'file://' we can reliably distinguish file URLs from unique file paths.

Performance is largely unaffected as there is a fast opt-out of this code path through the string prefix check.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added dont-land-on-v4.x whatwg-url Issues and PRs related to the WHATWG URL implementation. labels May 24, 2018
@guybedford guybedford requested a review from jasnell May 24, 2018 17:44
@addaleax addaleax added semver-major PRs that contain breaking changes and should be released in the next major version. and removed dont-land-on-v4.x labels May 24, 2018
if (!path.startsWith('file://'))
return path;
path = new URL(path);
} else if (path == null || !path[searchParams] ||
!path[searchParams][searchParams]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer lines up with the rest of the conditional above this line.

@mscdex
Copy link
Contributor

mscdex commented May 24, 2018

Can we see benchmark results to verify the non-URL cases are unaffected?

@addaleax
Copy link
Member

addaleax commented May 24, 2018

We should try to mitigate the issue described in #10739 (comment) sufficiently (how?), and if we decide to land this, call this out very very explicitly as a breaking change with security implications in our changelog.

/cc @nodejs/security

@mscdex
Copy link
Contributor

mscdex commented May 24, 2018

We should either mitigate the issue described in #10739 (comment) sufficiently (how?), and if we decide to land this, call this out very very explicitly as a breaking change with security implications in our changelog.

Good point, I'm not so sure this is a good idea...

@Fishrock123
Copy link
Contributor

Ummm... is this really a good idea?

@guybedford
Copy link
Contributor Author

@addaleax one of the things I was actually hoping this could pave the way for as well would be new Worker(import.meta.url) as well.

From a security perspective, fs.readFile(base + userInput) is unaffected due to prefixing (there is no injection vector for hostnames as they don't support backtracking).

Such sanitization would likely be designed to whitelist check a path prefix like /valid/base, as well as things like duplicate slashes and backtracking prevention. But if it was doing path prefix whitelisting, that would also apply to file URLs certainly. I'm not sure it's possible to call a sanitization function safe at this level if it's not already doing whitelisting, as it's not really possible to blacklist the whole file system!

@guybedford
Copy link
Contributor Author

Here's another security case:

function readUserFile (name) {
  if (name.match(/\/\\/))
    throw new Error('Must be a valid name');
  // name is taken to be relative to cwd here
  fs.readFile(name);
}

But because we require the file:// URL to include the slash, it still fails the validation fine and is handled by the sanitizer, so there still isn't a security concern.

@addaleax
Copy link
Member

I wouldn’t call myself an expert, but I’d expect that something like fs.readFile(foo) only reads files from the current directory if path.normalize(foo) doesn’t start with ../ or /? I could totally imagine that such validators exist in the wild.

I’m not saying it’s likely, but my impression is that it’s absolutely something which we need to warn users about in some way.

@jasnell
Copy link
Member

jasnell commented May 24, 2018

@Fishrock123

... is this really a good idea?

Extremely fair question. Scoping on file:// is likely the safest least-likely-to-be-problematic approach but this would semver-major and potentially breaking no matter how it is done. All of the concerns raised here are precisely why this change hadn't already been made previously.

On the sanitization point... with a file:// URL string, it is only possible to perform proper sanitization after parsing the URL, which performs the necessary normalization on the path component.

Also, keep in mind that file:// URLs must be absolute paths. There is no such thing as a relative file URL, so that must also be taken into consideration. fs.readFile(new URL('file://whatever')) is definitely very different than fs.readFile('whatever').

@guybedford
Copy link
Contributor Author

I wouldn’t call myself an expert, but I’d expect that something like fs.readFile(foo) only reads files from the current directory if path.normalize(foo) doesn’t start with ../ or /? I could totally imagine that such validators exist in the wild.

That still leaves fs.readFile('asdf/asdf/../../../../path').

If the sanitizer allows slashes but not dot segments, then yes that could be an issue though.

But since drive names in windows use : that that injection case also needs to be tested too for a safe sanitizer.

So if the sanitizer did .matches/[a-z]\:|[\/\\]..[\/\\]/ that might actually be an example of a safe sanitizer that would fail here.

@jasnell
Copy link
Member

jasnell commented May 24, 2018

fwiw, @guybedford and I chatted about this yesterday for a bit and I definitely pointed out that there are challenges with enabling this. I'm not yet convinced we have a clear enough picture of the potential downsides on this.

@addaleax
Copy link
Member

I wouldn’t call myself an expert, but I’d expect that something like fs.readFile(foo) only reads files from the current directory if path.normalize(foo) doesn’t start with ../ or /? I could totally imagine that such validators exist in the wild.

That still leaves fs.readFile('asdf/asdf/../../../../path').

> path.normalize('asdf/asdf/../../../../path')
'../../path'
> _.startsWith('../') || _.startsWith('/')
true

i.e. this validator would reject your example (and I believe would be a correct validator for fs methods up until now) but accept file:///etc/passwd.

@guybedford
Copy link
Contributor Author

guybedford commented May 24, 2018

To flesh out the case with drive letters as well - yes /^(\.\./|/|[a-z]\:)/.test(path.normalize('file:///etc/passwd')) would have been a safe check that would then fail here, although /^(\.\./|/|[a-z]+\:)/.test(path.normalize('file:///etc/passwd')) would still be ok.

Unless anyone has any more to add here, that does seem like the final nail here unfortunately actually - and we likely will want something like import.meta.filename and import.meta.dirname in Node for these use cases in modules rather.

@addaleax would you be open to new Worker accepting a file URL string in this same way? I could certainly work towards a PR in that direction instead.

@TimothyGu
Copy link
Member

See #17658. Without a solution to #17658 (comment), I am -1 on this PR.

@jasnell
Copy link
Member

jasnell commented May 24, 2018

@addaleax ...

... but accept file:///etc/passwd

Yep, and that's precisely why validation would have to be applied after parsing the file URL and would be required to look at only the path component. For anyone wanting to apply validation here before the fs.readFile() is called, they would have to do the URL parse first which would defeat the purpose of allowing URL strings the way this PR does.

@addaleax
Copy link
Member

@guybedford I’m not saying I don’t like this idea – I really do. I might be more easily on board if we were to do this for fs.promises, because that still has the “experimental” label printed on it and is an intentionally divergent API…

@guybedford
Copy link
Contributor Author

I might be more easily on board if we were to do this for fs.promises, because that still has the “experimental” label printed on it and is an intentionally divergent API…

@addaleax that's a very interesting suggestion!

@addaleax
Copy link
Member

Yeah, but the point @TimothyGu brought up again remains valid – we’re taking valid file names and doing something else with them if they have a specific format, it’s kind of awkward and a bit scary. (Like, it seems like the kind of thing that could come back and bite us in 5 years like the Buffer constructor…)

The best solution might be adding new methods that do this, tbh.

@guybedford
Copy link
Contributor Author

@TimothyGu I addressed this point in the description of the PR - by checking file:// with exactly two slashes we isolate from the subset of normalized file paths for rare /file:/directory cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-major PRs that contain breaking changes and should be released in the next major version. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants