Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown ENOTCONN on TLS.Socket._final #26315

Closed
tcolgate opened this issue Feb 26, 2019 · 25 comments
Closed

shutdown ENOTCONN on TLS.Socket._final #26315

tcolgate opened this issue Feb 26, 2019 · 25 comments
Labels
tls Issues and PRs related to the tls subsystem.

Comments

@tcolgate
Copy link

  • Version: v10.15.1
  • Platform: node:10.15.1 docker image on 64bit linux

When using v10.15.1 we are seeing processes exiting with:

Error: shutdown ENOTCONN
  File "net.js", line 364, col 25, in TLSSocket.Socket._final
  File "_stream_writable.js", line 612, col 10, in callFinal
  File "internal/process/next_tick.js", line 63, col 19, in process._tickCallback

This appears in services connecting to Amazon RDS instances via TLS. We are using
https://registry.npmjs.org/mysql/-/mysql-2.15.0.tgz
and https://registry.npmjs.org/knex/-/knex-0.14.4.tgz for connection pooling.

We do not see these errors on earlier node image (I have tested v10.15.0, v10.14.2 and v10.0.0).

Could this be impacted by #24290 and related commits?
The errors could possibly be bettered handled higher up (by knex or the mysql client), but this does seem to be a significant change in behaviour.

@tcolgate
Copy link
Author

I've run a git bisect between v1.15.0 and v1.15.1. c84b420 seems to be the cause of the problem.

@tcolgate
Copy link
Author

@addaleax any thoughts on this?

@addaleax
Copy link
Member

addaleax commented Feb 26, 2019

@tcolgate I don’t suppose you have an easy reproduction available? Does this also occur for you with v11.x (v11.4.0 and above)?

@tcolgate
Copy link
Author

tcolgate commented Feb 26, 2019

I don't have a cut down test at the moment. But I suspect the problem is with idle TLS connections in the knex pool. It's also awkward to test as it shows up after "about 5 minutes" in a lightly loaded app. I'll try and reduce the problem down a bit though.
Unfortunately the app wont install with node 11, npm install fails. I can try again if I get a cut down test working.
Sorry, I'm not really a node developer :/

@addaleax addaleax added the tls Issues and PRs related to the tls subsystem. label Feb 26, 2019
@syrnick
Copy link

syrnick commented Mar 8, 2019

We're running into the same issue. I don't have a fully runnable repro that I can share, but it reproduces when the connection is closed while the query is still in flight. It looks roughly like this:

const connection = mysql.createConnection(...);
await connection.connect();
connection.query('SOME QUERY', []); // Important: missing await
await util.setTimeoutAsync(100 * Math.random()); // wait a bit; not sure if this is necessary.
connection.end();

There is just [email protected], no pooling.

@addaleax
Copy link
Member

addaleax commented Mar 9, 2019

@tcolgate @syrnick Would either of you be able to try a patch like this?

diff --git a/lib/net.js b/lib/net.js
index b649c6779a94..71feba0cfd10 100644
--- a/lib/net.js
+++ b/lib/net.js
@@ -36,7 +36,8 @@ const {
 const assert = require('internal/assert');
 const {
   UV_EADDRINUSE,
-  UV_EINVAL
+  UV_EINVAL,
+  UV_ENOTCONN
 } = internalBinding('uv');
 
 const { Buffer } = require('buffer');
@@ -359,7 +360,7 @@ Socket.prototype._final = function(cb) {
   req.callback = cb;
   var err = this._handle.shutdown(req);
 
-  if (err === 1)  // synchronous finish
+  if (err === 1 || err === UV_ENOTCONN)  // synchronous finish
     return afterShutdown.call(req, 0);
   else if (err !== 0)
     return this.destroy(errnoException(err, 'shutdown'));

That seems like a cleaner solution to me, but either way, I’m a bit wary of adding or re-adding code that we do not test for and that should not be triggered in the first place…

@syrnick
Copy link

syrnick commented Mar 11, 2019

I'll try some time this week. It seems like it's either

  • mysql library is not using the sockets correctly or
  • mysql library is using the sockets in an acceptable way, but node lacks test coverage for that particular logical path

@bjornstar
Copy link

We're also seeing this using [email protected] in node v10.15.3 and not on node v10.15.0

We cannot update to node v12 at the moment so it would be great if this bug can be looked into.

@toddbluhm
Copy link

We are experiencing this issue as well when using knex and the latest version of node v10.15.3. Would really appreciate anytime that could be put in on this issue. For now we have reverted to Node v10.15.0 as well which has resolved the issue, but unfortunately aws lambda's node v10 is stuck on v10.15.3 due to it always using the latest node v10. We have reverted our lambdas back to Node v8.10 for now.

@manishmotiani
Copy link

manishmotiani commented Jun 20, 2019

We are also facing this issue using knex and node 10.16. Any update on this?

Adding log trace:
[email protected]
[email protected]

require( 'knex' )({
  client: client,
  connection: connection,
  pool: {
    afterCreate: function( conn, done ) {
      conn.query("select 1",function(err,response){
        done(err,conn);
      })
      conn.on( 'error', function( err ) {
        console.trace();  // Line 28
      });
    }
  }
});

at Connection.<anonymous> (/app/knex.js:28:14)
    at Connection.emit (events.js:203:15)
    at Connection.EventEmitter.emit (domain.js:448:20)
    at Connection._handleProtocolError (/usr/src/app/listener/node_modules/mysql/lib/Connection.js:433:8)
    at Protocol.emit (events.js:198:13)
    at Protocol.EventEmitter.emit (domain.js:448:20)
    at Protocol._delegateError (/usr/src/app/listener/node_modules/mysql/lib/protocol/Protocol.js:392:10)
    at Protocol.handleNetworkError (/usr/src/app/listener/node_modules/mysql/lib/protocol/Protocol.js:365:10)
    at Connection._handleNetworkError (/usr/src/app/listener/node_modules/mysql/lib/Connection.js:428:18)
    at TLSSocket.<anonymous> (/usr/src/app/listener/node_modules/mysql/lib/Connection.js:313:20)
    at TLSSocket.emit (events.js:198:13)
    at TLSSocket.EventEmitter.emit (domain.js:448:20)
    at TLSSocket._tlsError (_tls_wrap.js:612:8)
    at TLSSocket.emit (events.js:198:13)
    at TLSSocket.EventEmitter.emit (domain.js:448:20)
    at emitErrorNT (internal/streams/destroy.js:91:8)

@phillhocking
Copy link

phillhocking commented Jul 14, 2019

I am also experiencing this issue on ghost 2.22.0 and node 10.16.0 with Amazon RDS:

events.js:174
throw er; // Unhandled 'error' event
Error: shutdown ENOTCONN
at TLSSocket.Socket._final (net.js:363:25)
at callFinal (_stream_writable.js:617:10)
at process._tickCallback (internal/process/next_tick.js:63:19)
Emitted 'error' event at:
at Connection._handleProtocolError (/var/app/current/node_modules/mysql/lib/Connection.js:425:8)
at Protocol.emit (events.js:198:13)
at Protocol._delegateError (/var/app/current/node_modules/mysql/lib/protocol/Protocol.js:390:10)
at Protocol.handleNetworkError (/var/app/current/node_modules/mysql/lib/protocol/Protocol.js:363:10)
at Connection._handleNetworkError (/var/app/current/node_modules/mysql/lib/Connection.js:420:18)
at TLSSocket. (/var/app/current/node_modules/mysql/lib/Connection.js:305:20)
at TLSSocket.emit (events.js:198:13)
at TLSSocket._tlsError (_tls_wrap.js:612:8)
at TLSSocket.emit (events.js:198:13)
at emitErrorNT (internal/streams/destroy.js:91:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
at process._tickCallback (internal/process/next_tick.js:63:19)

@anthonynovatsis
Copy link

anthonynovatsis commented Jul 21, 2019

Also seeing this issue (with AWS Lambda 10.x, RDS and [email protected]):

db error { Error: shutdown ENOTCONN
at TLSSocket.Socket._final (net.js:361:25)
at callFinal (_stream_writable.js:612:10)
at process._tickCallback (internal/process/next_tick.js:63:19)
errno: 'ENOTCONN',
code: 'ENOTCONN',
syscall: 'shutdown',
fatal: true }

@intervalia
Copy link

This error is killing me. I was forces to return to Node version 10.15.0 just to keep things working.

I would really like to move to 12.9.1, but I am not stuck and can not take advantage of improvements and new features until this is fixed.

We tried with 12.9.1 and the bug is still there.

@addaleax
Copy link
Member

Just to reiterate, anybody who can either a) provide a reproduction or b) confirm that the patch in #26315 (comment) solves the issue is likely to get this resolved really, really fast.

@richardfrost
Copy link

I ran into this issue as well. I went ahead and applied the patch mentioned in the comment, and built node from master (+ the patch). As expected, it looks like the patch resolved the error. If I switch back to node 12.x, the issue returns immediately. Let me know if you want me to throw a project together to attempt reproduction.

Some notes:

  • MariaDB 10.2.25, mysql 2.17.1 package.
  • The issue is not present in node 10.15.0, but is in 10.15.1+.
  • The issue is only present while connecting to mysql with a secure connection.
  • No connection pooling.
  • This issue may have some relevant info.

@intervalia
Copy link

Now that Richard has verified the fix, can we get this in? There have already been two releases since Richard verified. Can we please get it in the next release?

@skinnerdev
Copy link

We have also been experiencing this issue in our app. Downgrading to 10.15.0 resolved the issue. We confirmed the bug exists in all newer versions. Any update on getting this fixed in latest release?

addaleax added a commit to addaleax/node that referenced this issue Oct 10, 2019
While it is not entirely clear why this condition is being
triggered, it does resolve a reported bug.

Fixes: nodejs#26315
@addaleax
Copy link
Member

I’ve opened #29912 with the patch above.

Fwiw, anybody can open PRs here, you don’t need to wait for somebody else to do it, especially when it has been confirmed to resolve an issue.

@anthonynovatsis
Copy link

@addaleax Thanks for raising that PR. Quite a few people were having issues with AWS Lambda and node10.x. So what is the process for getting this into the next v10.x release (ideally the upcoming v10.17.x)? Happy to raise a PR against another branch if required.

addaleax added a commit to addaleax/node that referenced this issue Oct 14, 2019
While it is not entirely clear why this condition is being
triggered, it does resolve a reported bug.

Fixes: nodejs#26315

PR-URL: nodejs#29912
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
@addaleax
Copy link
Member

@anthonynovatsis It should get picked up automatically; our rules generally say that something must have lived in a current release for 2 weeks before it is backported to LTS, but I’ve commented on #29875 so that maybe we can skip that part for the patch, and opened a backport PR @ #29968.

@anthonynovatsis
Copy link

Thanks for that @addaleax. Much appreciated..!

BethGriggs pushed a commit that referenced this issue Oct 16, 2019
While it is not entirely clear why this condition is being
triggered, it does resolve a reported bug.

Fixes: #26315

Backport-PR-URL: #29968
PR-URL: #29912
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
@intervalia
Copy link

intervalia commented Oct 24, 2019

How can we know what versions sill have this fix?? Is there an overall bug # to build # lookup table?

@kylealwyn
Copy link

@intervalia You can check the releases tab to verify which commits are included in which releases. This particular patch was landed in v10.17.0 and in the latest v13 releases. Thank you @addaleax for pushing this through!

@syrnick
Copy link

syrnick commented Oct 25, 2019

What's the process for getting it into v12.x LTS branch? I checked that (https://github.com/nodejs/node/blob/v12.x/lib/net.js is unpatched)

addaleax added a commit that referenced this issue Oct 25, 2019
While it is not entirely clear why this condition is being
triggered, it does resolve a reported bug.

Fixes: #26315

PR-URL: #29912
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
@addaleax
Copy link
Member

@syrnick Yeah, thanks for pointing that out … I think this is unfortunate timing because it was merged so close to the Node.js 12 LTS release. I’ve added it to the v12.x-staging branch, meaning it will go into the next v12.x release; that should be unprolematic given that this has already been accepted for v10.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.