Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having Error: connect ECONNREFUSED #217

Closed
kwongkz opened this issue Nov 19, 2019 · 14 comments
Closed

Having Error: connect ECONNREFUSED #217

kwongkz opened this issue Nov 19, 2019 · 14 comments
Labels

Comments

@kwongkz
Copy link

kwongkz commented Nov 19, 2019

Anyone got idea why this happen? I'm using serverless lambda and with X-Ray tracing turned on

WARN Error: connect ECONNREFUSED 169.254.79.2:2000
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1107:14)

@willarmiros
Copy link
Contributor

Hi @kwongkz,
This doesn't sound like an error associated with X-Ray. Please provide a code snippet to reproduce this error as well as logs or another indication that X-Ray is causing this so I can further assist you.

@kwongkz
Copy link
Author

kwongkz commented Nov 23, 2019

Hi @willarmiros,

I guess I found out the issue & solution already.

I did reference this issue to get the idea #143

So the solution for me add env - AWS_NODEJS_CONNECTION_REUSE_ENABLED=1, you can refer to this Lambda optimization tip.

Hope we can put this in the documentation so other people can get more clear to use it.

Thanks for the assist.

@kwongkz kwongkz closed this as completed Nov 23, 2019
@petermorlion
Copy link

I'm having the same issue.

Searching the internet for that IP address leads me to believe it's the IP address of the X-Ray daemon (see here). It definitely should be a "local" IP address because it starts with 169. So I think it should be somewhere in the AWS network, though I'm no network specialist.

I've tried adding the AWS_NODEJS_CONNECTION_REUSE_ENABLED environment variable and set it to 1. But I'm still getting the issue. I will further investigate and update if I find anything.

@awssandra awssandra reopened this Dec 6, 2019
@awssandra
Copy link
Contributor

Hi petermorlion,

Is this affecting you on Lambda as well? The Daemon should be automatically configured in Lambda, no need to include it or set it up.

@petermorlion
Copy link

Hi awssandra,

Yes, this is on AWS Lambda. I have it with several Lambda's, all of which use the AWS X-Ray Express package. Strangely, I don't seem to be having the issue when only using the core, but also not when using AWS X-Ray Express with NestJS (both in Lambda's). Though those Lamba's are executed less often.
I'll see if I can write a minimal Lambda and execute a load test on them?

@davidcheal
Copy link

I am also seeing this error in x-ray traces.
xray-error
I am using the express package, node 12.x

@petermorlion
Copy link

petermorlion commented Dec 9, 2019

I've been able to reproduce this on Node 10.x with this piece of code:

const AWSXRay = require('aws-xray-sdk-core');
const xrayExpress = require('aws-xray-sdk-express');
const express = require('express');
const serverlessHttp = require("serverless-http");

module.exports.handler = async function(event, context) {
    const app = getApp();
    const slsHttp = serverlessHttp(app);
    const result = await slsHttp(event, context);
    return result;
}

function getApp() {
    const app = express()

    app.use(xrayExpress.openSegment('PMO-xray-error-test'));
    
    
    app.get('/', function (req, res) {
      res.send('Hello World')
    })
    
    app.use(xrayExpress.closeSegment());

    return app;
}

I just invoked the API Gateway several times from the AWS Console. So this is no heavy load test, i.e. no concurrent requests. As you can see, some invocations have no issue, but others do:
image

After that, other requests work fine again. So there's no real pattern I can deduce.

Things I've tried but that didn't make a difference:

  • changing to Node 8.10
  • adding the AWS_NODEJS_CONNECTION_REUSE_ENABLED environment variable
  • setting the httpOptions.agent of the AWS config with keepAlive to true and maxSockets to 50

@awssandra
Copy link
Contributor

Sorry for the delayed response!

I'm thinking there's a disconnect between the custom Lambda code and the Express middleware. Each have their own expected workflow of the daemon and SDK behavior. We'll take a deep dive into this.

@willarmiros willarmiros added the bug label Jan 8, 2020
@willarmiros
Copy link
Contributor

Hi @petermorlion,
I am investigating this issue with the Lambda team. Please sit tight for any updates!

@petermorlion
Copy link

@willarmiros I don't mean to put pressure on you, but I'm curious if there is any progress on this?

@willarmiros
Copy link
Contributor

willarmiros commented Mar 9, 2020

Hi @petermorlion,

After some further inspection, it appears the root cause is in our service connector here. It's from a poller that runs in the background to retrieve sampling rules from X-Ray's service back end roughly every 5 minutes (speaking of patterns you should see the error about every 5 minutes if you're consistently making requests uninterrupted by cold starts). These requests are attempting to communicate directly with the daemon, which is not possible in Lambda environments.

I changed the way we make these requests in #255 to no longer be lazy, and that actually appears to have made the errors appear instantly upon invocation. I'm going to make a PR to disable these requests for now in Lambda environments, since we don't support sampling configuration in Lambda yet.

@avin-kavish
Copy link

avin-kavish commented Mar 10, 2020

These requests are attempting to communicate directly with the daemon, which is not possible in Lambda environments.

@willarmiros Why isn't this possible? Is the X-Ray service on a private network?

Also, when you disable them, how will you be checking for it? This code is essentially express code and express code is not aware that it is being run in a lambda environment.

In the meantime, will I be able to stop the api call with,

AWSXRay.middleware.disableCentralizedSampling();

?

@willarmiros
Copy link
Contributor

Hi @avin-kavish,
Sorry, that wasn't entirely correct. It is possible to communicate with the daemon in Lambda environments, but only over UDP connections (see segment_emitter). The problematic requests that we're making use TCP under the hood. I believe that the Lambda service has some tight iptables configurations prohibiting these.

I will check for Lambda environments using the LAMBDA_TASK_ROOT environment variable, which is how we make the check elsewhere in the SDK. The disableCentralizedSampling() call should prevent these errors, that's a great call out. However to minimize the burden on other customers I'll still just disable sampling by default in Lambda until we get better support for it.

@willarmiros
Copy link
Contributor

This fix was released in v3.0.0-alpha.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants