-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test-stdio-closed failing on some AIX environments #10234
Comments
It would be great to track down exactly why it sometimes works one way on AIX and sometimes works another way. But seeing as both behaviors are spec-compliant (I think), I'd be OK with replacing: assert.strictEqual(exitCode, common.isAix ? 126 : 42); ...with something like: if (common.isAix)
assert([42, 126].includes(exitCode));
else
assert.strictEqual(exitCode, 42); I agree, though, that tracking down the source of the behavior difference would be a superior solution... |
Sounds reasonable, and it means we'll spot if the same issue shows up on another OS since the acceptance of 126 is still in an AIX-specific clause. I've checked it and it seems ok so I'll get that submitted, especially since I don't think Gireesh will be around for a while who was involved in the previous discussion on this topic. |
Fixes: nodejs#10234 Allow either of the two possible return codes on AIX to be considered successful on test-stdio-closed.js
is supposed to start node with fd=1 (stdout) closed and not open. (upon which node will reopen it to point to /dev/null) Renaming the catalog file to something else produces the expected outcome (i.e. fd=1 is closed). |
So, on AIX, Instead of this: const proc = spawn('/bin/sh', ['-c', cmd], { stdio: 'inherit' }); ...maybe something like this?: // Insert comment here explaining the bug so that the next line is not a mystery.
const shell = common.isAIX ? '/bin/bash' : '/bin/sh';
const proc = spawn(shell, ['-c', cmd], { stdio: 'inherit' }); |
Another possibility: If we identify the version of ksh that has this bug and when it is fixed, check the version of ksh before running the test and skip if we are using a buggy version? |
Unfortunately, bash fails the same as ksh. However "csh" does the job. Another possible fix (more comprehensive i think) is to do the following If stdout is open and it is read-only (which is the case on aix/ksh/bash), then we assume that the user originally intended to close stdout. So, we
jBarz@25d758d |
I can submit a PR for the above suggestion if it seems valid to you guys. |
Might be a good guess, based on:
I'm wondering if the shell did close fd=1 but something has then opened the catalog file before node checks if fd=1 is closed. The contents of |
Actually thinking a bit more we only close stdout and stderr for the node process -- The shell's stdout and stderr should be being piped back to the parent (the test) so we could probably capture it in the test (maybe assert the output is empty?). |
@jBarz wrote:
/cc @bnoordhuis |
Sounds worth a shot @jBarz - I'd be happy with my PR or your one if you can confirm it will work :-) |
That's not a good idea, having fd 1 or 1 being read-only is valid. The code under test isn't intended to guaranteed that fd 1 or 2 is writeable or that fd 0 is readable, its just there to guarantee that the fd is uses, so no other resource is opened on that descriptor. The ksh bug wherein one of its message catalog files ends up being opened on fd 1 is actually a good example of the bug node is avoiding that ksh is not, wherein a child process (node in this case) accidentally receives a readable descriptor to a file it was not intended to have access to. The solutions here are, IMO:
|
@sam-github The latter sounds like the better fix. |
I was incorrect about csh. Using "1>&- 2>&-" on csh simply redirects output to a file called "-" and the test passes because stdout and stderr are not closed :-). There doesn't seem to be a way to close fds with csh. |
@sxa555 Gave me access to an AIX system that exhibits the behavior where the child process was exiting with 126. We've run This is part of the output on the system that returns 126:
We can see that:
In the other case (return 42) none of the kload calls appear in the output, and 2. does not appear either so fd=1 remains unused after it is closed and node reopens as `/dev/null'. So we went back to the premise that changing the When we look in So we may have a workaround (changing the LANG environment variable). What we don't know is why only some AIX systems are exhibiting this behavior. The systems where the child process was returning 42 (in line with the other platforms) by default at no point (according to truss) perform any kload calls nor attempt to load message catalogs no matter what LANG was set to and we confirmed that message catalogs were present in cc @nodejs/platform-aix |
By workaround, I meant we change the test back so that it expects return 42 everywhere (so no special casing AIX) and run on AIX with the LANG environment variable changed. |
For the record, removing the bos.msg.en_US.rte fileset from the machine would also result in consistent behaviour across platforms, since if it cannot find the execerr.cat file then it can't open it, the behaviour and expected RC a;ways appears to end up the same as other platforms (i.e. 42) ... Probably not the best option, but it is another possibility (that package had no direct dependencies on my test systems so it removed cleanly) Everything comes back to 42 ;-) |
Another option would be to set NLSPATH in the environment to something invalid so it can't find the message catalogues |
I think the test case needs to be fixed as @sam-github suggested in option 2 of #10234 (comment) |
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. What this test needs to only ensure is that they are used at startup. This fixes nodejs#10234
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: nodejs#10339 Fixes: nodejs#10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: nodejs#10339 Fixes: nodejs#10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: nodejs#10339 Fixes: nodejs#10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: nodejs#10339 Fixes: nodejs#10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: #10339 Fixes: #10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: #10339 Fixes: #10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: #10339 Fixes: #10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Don't do a write on stdout/stderr because that checks for their writability. But fd=1 could legitimately be opened with read-only access by the user. All this test needs to ensure is that they are used at startup. PR-URL: #10339 Fixes: #10234 Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]>
Reference: #8375 (FYI @gireeshpunathil @Trott @mhdawson)
Happening on current versions of node 4 and 6 (4.7.0/6.9.2) and on AIX 6.1TL09
The fix that was applied in e65a2d7 does not appear to make the test pass universally on all AIX systems. I have a mix of some that it works for, and others which are still returning "42" which is consistent with the other platforms, so is failing this assertion:
assert.strictEqual(exitCode, common.isAix ? 126 : 42);
On the system that it works on, if I set the LANG variable to anything other the en_US I get the "42" result (not certain yet why that is making a difference) although on one of my "failing" machines which is giving 42 setting it to en_US doesn't resolve it, despite it being in the output of locale -a. We could, potentially, be seeing a timing issue where going down the path of a non-default locale is changing things, but I'm guessing so far at this point.
The AIX level on the "failing" box (the one defaulting to en_GB which is always giving me 42) is patched to a slightly later level AIX level - .6100-09-08-1642 vs 6100-09-07-1614. On my 6100-07 system (defaults to en_US) it's always returning 42 as well.
I get the feeling we may need to be able to trap both cases for this test to pass reliably (or change the exit(126) to exit(42), or understand why we're getting the differences that caused the initial patch to be required.
The text was updated successfully, but these errors were encountered: