-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix flaky test-net-connect-local-error #12964
test: fix flaky test-net-connect-local-error #12964
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small request, other than that, let's see what the CI sais
`${err.localAddress} !== ${common.localhostIPv4} in ${err}` | ||
); | ||
getUnassignedPort(common.mustCall((unassignedPort) => { | ||
assert(unassignedPort); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assert.ok
, or even assert.strictEqual(typeof unassignedPort, 'number')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@refack Fixed! Thanks! I've also moved the getUnassignedPort
call closer to where the value is actually used.
server.listen({port: 0}, common.mustCall(() => { | ||
// When the server is closed this port will no longer be assigned | ||
const unassignedPort = server.address().port; | ||
server.close(common.mustCall(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't see a completely safe way of calling net.connect()
to a free port. I would probably just move the original test (before the port changes) to sequential.
/cc @nodejs/testing @thefourtheye
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@santigimeno thanks for the feedback
IMHO port + 1
will always be flaky even in sequential.
@nodejs/testing It seems like we need a way to find deterministically erring port for a few others tests as well... Re: #12996
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW For Windows (and acording to rfc793) the closed socket will enter TIME_WAIT state for 2*MSL and will not SYN,ACK and will not be reused by the OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny story: I've been looking at this issue's brother #12951.
These tests are run in parallel first this one, then the other.
In the other one there's a server that's supposed to receive 6 connections, instead it received 7.
I wonder where that 7th request comes from 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO port + 1 will always be flaky even in sequential.
I'm not sure I follow. Can you elaborate?
If you mean that there can be another test using common.PORT + 1
(or common.PORT
for that matter), I agree, but I think we'll be fine as long as there's no test on sequential listening/binding to 0
port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO port + 1 will always be flaky even in sequential.
I think that's wrong, unless you're just arguing that some other process can always use that port. But I don't think we generally concern ourselves with that. I'm with @santigimeno: Moving it to sequential
seems like the simpler and better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thirded @santigimeno's suggestion, moving to sequential.
common.localhostIPv4, | ||
`${err.localAddress} !== ${common.localhostIPv4} in ${err}` | ||
); | ||
server.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought about it, you don't need getUnassignedPort
. There is no need for a server to be alive for testing that a connection to an empty port will fail.
Do all the testing in the server.close()
callback. Use the closed server's assigned port like you did in getUnassignedPort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@refack You're right. I've used 8080
like I did here since no connection is actually made and this issue suggests that in the future there will be a linting rule against using common.PORT
in parallel tests.
I've changed it now, thanks!
const port = server.address().port; | ||
server.close(common.mustCall(() => { | ||
const client = net.connect({ | ||
port: 8080, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should test other way around as well {port: port, localPort: 8080}
(with a new client)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually also need to assert the other properties of err
like
assert.strictEqual(err.syscall, 'connect');
assert.strictEqual(err.code, 'ECONNREFUSED');
assert.strictEqual(err.message, `connect ECONNREFUSED ${err.address}:${err.port} - Local (${err.localAddress}:${err.localPort)`);
IMHO we have a good solution for the flakiness of test. |
Stress on macOS: https://ci.nodejs.org/job/node-stress-single-test/1217/ |
@refack I could work on the requested changes in this PR either today or tomorrow. It will need a new CI. Let me know how to proceed. If you need to land this fast, I can make the changes in another PR. |
It's the weekend, there's no rush, it you have the time and energy add the assertions and reversed connection to this PR. |
@refack Made the changes! Thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very uncomfortable with all of the PRs lately that add non-trivial amounts of code and complexity for tests where moving to sequential is the better solution. The marginal cost of having the test in sequential is negligible (maybe 150 ms on a few platforms?). Our slowest CI platforms don't benefit at all from having tests in parallel. (They run them sequentially anyway.) I'd much rather have simple, short, straightforward, easy-to-understand, easy-to-maintain tests. The time taken to do the whole reserve-a-port-then-close-the-server dance probably largely negates any benefit from having the test in parallel anyway.
@Trott I have to agree with you (even though I was at first supporting those kind of complex changes) |
@Trott PTAL
|
New CI: https://ci.nodejs.org/job/node-test-commit/9867/ |
@@ -3,25 +3,46 @@ const common = require('../common'); | |||
const assert = require('assert'); | |||
const net = require('net'); | |||
|
|||
const fixedPort = 8080; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we hard-coding 8080
and not using common.PORT
or common.PORT + 1
or whatever? Just to avoid moving to sequential
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Trott I thought to use it because of your comment on this: #12639
Also as I understood from the common.PORT
in parallel tests issue, it was planned to use a linting rule against using common.PORT
in parallel tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastianplesciuc I think they convinced me to move the test to /sequential/
there it's Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought to use it because of your comment on this: #12639
@sebastianplesciuc Not sure which comment you mean.
Also as I understood from the common.PORT in parallel tests issue, it was planned to use a linting rule against using common.PORT in parallel tests.
If/when that happens, eslint-disable
comments can be used for any remaining valid common.PORT
uses in parallel
. Changing them now to accommodate a rule that may never come to pass is probably putting the cart before the horse.
Regardless, none of that applies if the test is moved to sequential
. :-D
Lastly: I hope none of this is too frustrating for you. I appreciate all the work you're doing and I know it's not fun to get contradictory suggestions from people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hooray, we're all kinda sorta on the same page (or getting there) after all. :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastianplesciuc Oh, I think I see the test/comment you are referring to. In that case, other (intentional and for testing purposes) errors in the code prevent that port from ever being in use. If I understand what's going on in this test (and I may not!), that port (the one that is now 8080
) does in fact get used. A connection is attempted there and ECONNREFUSED is expected, meaning nothing is listening on that port. So if something else is using that port, bad things happen. Again, I may be misunderstanding the test, but that's the way it seems to me. (Massively divided attention right now, apologies if I'm hurting more than I'm helping by participating.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Trott I understood why this should move to sequential. I'm not defending this, I understand why this is the case and I agree with your review. I just wanted to explain why I thought to use 8080
there.
Frankly, I'm not really sure what happens on every platform in a server's close callback. I just thought you guys might know and determine if this is an acceptable solution. I'm satisfied with the outcome and also I've learned some things along the way.
So, thanks for that :)
I think we should revert the changes in this file that were included in 94eed0f and move this file to
I'm not sure what the nature is of the flakiness that's being seen, but that seems very likely to resolve it. (The first bullet point is the more important one in this regard. If another test running in parallel uses port 0 somewhere shortly after this test does, it is exceedingly likely to get |
Yeah I found which one #12951 there a server receives 7 requests when the test clearly only issues 6 🤣 But this test needs some fixin' since it's flaky even sequentially (with the @sebastianplesciuc we need to rethink this test, it fails on windows :(, and there's the hard coded |
@refack I'll take a look at the code before the bind to 0 commit and try to make a PR with the move to sequential if that's ok. Should we close this PR? |
|
@Trott this fragment +const server = net.createServer();
+server.listen(0);
+const port = server.address().port;
const client = net.connect({ const client = net.connect({
- port: common.PORT + 1, + port: port + 1,
@sebastianplesciuc scratch my last comment, if you revert 94eed0f and move to P.S. revert 94eed0f just on |
Moving the test to |
Agreed, not the whole commit, just the change to this file from that commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if CI is green
Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: nodejs#12964 Fixes: nodejs#12950 Reviewed-By: Refael Ackermann <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Landed in cf30d5e |
Post land CI: https://ci.nodejs.org/job/node-test-commit/9910/ |
Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: nodejs#12964 Fixes: nodejs#12950 Reviewed-By: Refael Ackermann <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Relanded in 0c2edd2 (forgot the missing LF) |
Post reland CI: https://ci.nodejs.org/job/node-test-commit/9913/ |
Not a fan of running CI after landing. You could just push your fix to their branch and run CI against the PR with your fixes in place. |
After landing I run against |
@Trott To be clear, I think what @refack does is make sure CI ran before landing, then run CI again after landing to make sure no last minute other conflicting PR might have caused an issue. If this is the case, it's an example of extra rigour around the release process, and I'm quite impressed that anyone makes the effort. |
Ah! I see now. Yes, that's awesome. 👍 Thanks for the clarification. |
Thinking a bit more on this, I would ask that you (and everyone) please please please at least run Our docs ask that people run JS linting is comparatively fast and would catch most of the "oops, I shouldn't have pushed to master" things that seem to come up from time to time, including this one. |
@Trott I apologize for not doing this. But I didn't expect the PR to land until after I've fixed the |
This is on me. It was a known lint failure I said I'd fix before landing #12964 (comment). I broke my own rule and landed this after 10PM 🤦♂️ Re: nodejs/build#705 IMHO we should strive to move all the automatable (read; boring, repetitive, and human-error prone) to the CI. |
P.S. a git hook that lints only git changed files: #!/c/node/node
var cmd = require('child_process');
cmd.exec('git diff --cached --name-only --diff-filter=ACM | grep ".js$"', function (err, stdout) {
if (stdout.length == 0) return;
var args = stdout.split('\n');
args.unshift('');
args.pop();
var cli = require("jshint/src/cli.js");
cli.getBufferSize = function () { return 0; };
cli.interpret(args);
}); |
This wouldn't catch things that are already committed right? |
@sebastianplesciuc I was addressing people with commit bits on the repo. You didn't do anything wrong. (For that matter, @refack's mistake was minor,lots of folks have done it, and he was eager to fix CI.) Everything's good. We can always improve though. Automation and git pre-commit hooks are both great things to apply here. |
Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: nodejs#12964 Fixes: nodejs#12950 Reviewed-By: Refael Ackermann <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: #12964 Fixes: #12950 Reviewed-By: Refael Ackermann <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Fixed test-net-connect-local-error by moving the test from parallel to sequential. PR-URL: #12964 Fixes: #12950 Reviewed-By: Refael Ackermann <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Fixed test-net-connect-local-error by moving the test from parallel to sequential.
Reverted to commit https://github.com/nodejs/node/blob/eeae3bd07145a770209e4899a9d40f67109d3d01/test/parallel/test-net-connect-local-error.js. Added a few more assertions.
Fixes: #12950
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
test