Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-UTF-8 env vars #16961

Closed
daurnimator opened this issue Nov 12, 2017 · 18 comments
Closed

Non-UTF-8 env vars #16961

daurnimator opened this issue Nov 12, 2017 · 18 comments
Labels
feature request Issues that request new features to be added to Node.js. process Issues and PRs related to the process subsystem.

Comments

@daurnimator
Copy link

daurnimator commented Nov 12, 2017

  • Version: v8.8.1
  • Platform: Linux daurn-z170 4.13.7-1-ARCH deps: update openssl to 1.0.1j #1 SMP PREEMPT Sat Oct 14 20:13:26 CEST 2017 x86_64 GNU/Linux
  • Subsystem: process

Node.js doesn't seem to have a way to read environment variables that aren't valid unicode. For both keys and values. e.g.

$ env -i $'F\xa5B=BAR' node -e 'console.log(process.env)' 
{ 'F�B': undefined }
$ env -i $'FOO=B\xa5R' node -e 'console.log(process.env.FOO.codePointAt(1))'
65533
@daurnimator
Copy link
Author

Issue seems to stem from assumption that env vars are utf8. e.g. In the getter and during enumeration

@bnoordhuis
Copy link
Member

UTF-8 is assumed in a lot of other places, environment variables aren't the exception.

The behavior of process.env can't change because of backwards compatibility (i.e., we can't change the encoding to latin-1) and this seems too minor to add a new API for.

Do you have a real-world example where this is an issue and that can't be solved another way?

@bnoordhuis bnoordhuis added the process Issues and PRs related to the process subsystem. label Nov 12, 2017
@daurnimator
Copy link
Author

UTF-8 is assumed in a lot of other places, environment variables aren't the exception.

Most places in e.g. fs also accept a Buffer/Uint8Array.

The behavior of process.env can't change because of backwards compatibility

I think the current behaviour is broken in a way no one expects. One solution might be that unrepresentable keys/values are returned as a Buffer instead of a string with unicode replacement characters?

Do you have a real-world example where this is an issue and that can't be solved another way?

This was found while working on fengari, a port of lua (including lua standard libraries) to javascript. A test found that some env vars were being read in differently in fengari to normal lua. The code path goes past here: https://github.com/fengari-lua/fengari/blob/0a8bc0b30644c514d822fb32a184318f7353a285/src/loslib.js#L180

Different use case:

Passing encryption keys (e.g. a private EC key) into a container via env var. They'd work fine 80% of the time, but once the secret (binary) key contains the wrong sequence, then you get hard to debug failures.

@addaleax addaleax changed the title Non-unicode env vars Non-UTF-8 env vars Nov 12, 2017
@addaleax
Copy link
Member

I think the current behaviour is broken in a way no one expects.

I don’t think this is true; It’s a single data point, but I have never seen environment variables that aren’t conceptually character strings.

One solution might be that unrepresentable keys/values are returned as a Buffer instead of a string with unicode replacement characters?

Sometimes returning a string and sometimes returning a Buffer from the getter is a non-starter, I think.

Passing encryption keys (e.g. a private EC key) into a container via env var. They'd work fine 80% of the time, but once the secret (binary) key contains the wrong sequence, then you get hard to debug failures.

80 % of the time sounds like a very optimistic estimate – A nice property of UTF-8 is that invalid UTF-8 will usually be recognizable as such. Even if you read as little as 8 bytes that are ~ uniformly random (like you’d expect from a private key) you get a 99 % chance of the result not being valid UTF-8.

Also, this is a great use case for base64.

I’m not -1 on Node supporting this, but if it does, that should happen through a separate API.

@vkurchatkin
Copy link
Contributor

I guess we could add API for that, something like process.env.get(key[, encoding]), where key can be a Buffer and encoding corresponds to encoding of the result, which is a Buffer by default.

@addaleax addaleax added the feature request Issues that request new features to be added to Node.js. label Nov 13, 2017
@bnoordhuis
Copy link
Member

process.env.get is not an option, IMO - that would break existing code that tries to read a property called 'get'.

It also makes the environment accessors slow and awkward since they have to check for these magic properties every time.

What might be acceptable is os.getenv() and os.setenv() methods but then again, it seems too minor to merit new APIs.

The binary data example is flawed and will never work; envvars are C strings, nul bytes terminate them.

@daurnimator
Copy link
Author

What might be acceptable is os.getenv() and os.setenv() methods

Also need to be able to iterate env vars.

What about just pushing environ as a Buffer? And it's up to users to parse it themselves?

@TimothyGu
Copy link
Member

TimothyGu commented Nov 16, 2017

I'd be okay with a set of functions that work with Buffers.

This currently also leads to an issue with roundtripping right now, which is bad:

$ env -i $'FOR=abc' node -p 'process.env[Object.keys(process.env)[0]]'
abc
$ env -i $'F\xa5R=abc' node -p 'process.env[Object.keys(process.env)[0]]'
undefined

@bnoordhuis
Copy link
Member

What about just pushing environ as a Buffer? And it's up to users to parse it themselves?

I've toyed with that idea. Constructing an ArrayBuffer mapping environ and moving parsing to JS land is definitely an option, that would let us remove about 150 lines of finicky C++ code.

Some special-casing is needed for Windows because its environment is UCS-2 but otherwise it's pretty straightforward.

@daurnimator
Copy link
Author

I've toyed with that idea. Constructing an ArrayBuffer mapping environ and moving parsing to JS land is definitely an option, that would let us remove about 150 lines of finicky C++ code.

Is anyone working on this? Should I start work on it? (I'll have to set up a node dev environment...)

@bnoordhuis
Copy link
Member

No one's working on it, but on second thought, you also need a way to sync changes from the ArrayBuffer back to the global environ.

It's probably easier to keep using getenv() / GetEnvironmentVariableW() / etc. but expose (for example) a process.envb that's a class extends Map that only accepts Buffer and Uint8Array instances as keys and values and internally calls out to the platform-native APIs.

@daurnimator
Copy link
Author

class extends Map that only accepts Buffer and Uint8Array instances as keys and values and internally calls out to the platform-native APIs.

But Maps consider different Uint8Array/Buffer keys to be different keys.
I think this is a false-equivalence that is assured to confuse....

@bnoordhuis
Copy link
Member

That's a fair point.

@refack
Copy link
Contributor

refack commented Nov 11, 2018

I've put this into https://github.com/nodejs/node/projects/13 backlog

@refack refack closed this as completed Nov 11, 2018
@daurnimator
Copy link
Author

Put into https://github.com/nodejs/node/projects/13 backlog

Is there some more context on this I could read?
Closing issues in a backlog seems like an odd process.

@refack
Copy link
Contributor

refack commented Nov 11, 2018

Hello @daurnimator,
It's actually a new thing I'm suggesting. As I see it we have two issues this can solve: (1) We have too many open issues (2) Among them there are feature requests like this, that have no objections, but also isn't picked up by anyone.

The idea behind the list in https://github.com/nodejs/node/projects/13 is to surface these, both the Collaborators and to members of the community who are looking to contribute. The "close" is just to separate these from bugs.

If GitHub would give us a third item type, that would have been optimal. Until then we are trying to find the best way to use the available metadata options to have visibility into what's going on.

@iamahuman
Copy link

iamahuman commented Jan 14, 2019

Yet another approach could be to follow what Python 3 does with its surrogateescape method, such as: "\xff" -> "\udcff". This has the advantage of having to neither introduce a new API nor break compatibility with existing codes. Currently node does substitute them with \uFFFD (REPLACEMENT CHARACTER), which indeed has information loss issue compared to the aforementioned method. Of course Windows uses Unicode natively for them, but then we have the option of detecting what OS the application is running on.

@daurnimator
Copy link
Author

@refack it seems that the project is no longer being used: should this be reopened now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. process Issues and PRs related to the process subsystem.
Projects
None yet
Development

No branches or pull requests

7 participants