Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fargate/ECS healthcheck #1124

Closed
pauldoherty-optifly opened this issue Apr 5, 2022 · 11 comments
Closed

Fargate/ECS healthcheck #1124

pauldoherty-optifly opened this issue Apr 5, 2022 · 11 comments
Assignees
Labels
ADOT collector ADOT Collector related issues extension For any extension related items

Comments

@pauldoherty-optifly
Copy link

Describe the question
Hi all, I have an issue getting the healthcheck to function with Fargate. I followed the instructions and installed the sidecar but cannot get the sidecar healthcheck to be healthy. This means that my service keeps getting killed because ECS thinks the aws-otel-collector sidecar is unhealthy.

Steps to reproduce if your question is related to an action
Service is provisioned with CDK. The sidecar health check is specified as follows:

healthCheck: {
  command: ["CMD-SHELL", "curl -f http://127.0.0.1:13133/ || exit 1"],
  timeout: Duration.seconds(10),
  startPeriod: Duration.seconds(10),
},

What did you expect to see?
The sidecar would be found to be healthy

Additional context
Looking at the Dockerfile here it looks like aws-otel-collector is build from scratch and so will not have curl, or even a shell for that matter. How are health checks expected to be configured?

Thanks

@bryan-aguilar bryan-aguilar added ADOT collector ADOT Collector related issues extension For any extension related items labels Apr 5, 2022
@bryan-aguilar
Copy link
Contributor

Could you please provide your Collector Config that you used when setting up the ADOT Collector?

@pauldoherty-optifly
Copy link
Author

pauldoherty-optifly commented Apr 5, 2022

Hi,

Thanks for getting back to me. I just used the standard insights config. E.g.

taskDefinition.addContainer("otelContainer", {
      image: ContainerImage.fromRegistry("public.ecr.aws/aws-observability/aws-otel-collector:latest"),
      command: ["--config=/etc/ecs/container-insights/otel-task-metrics-config.yaml"],
      essential: false,
      portMappings: [...],
      healthCheck: {
        command: ["CMD-SHELL", "curl -f http://127.0.0.1:13133/ || exit 1"],
        timeout: Duration.seconds(10),
        startPeriod: Duration.seconds(10),
      }
}

@bryan-aguilar bryan-aguilar self-assigned this Apr 5, 2022
@bryan-aguilar
Copy link
Contributor

Currently I don't have any Collector CDK documentation to point you toward so this may require some experimenting.

I can setup a similar environment and see what I can discover on my side. Is there any other CDK environment information that could be useful for when I build out my own CDK deployment?

@bryan-aguilar
Copy link
Contributor

What version of CDK are you using?

@pauldoherty-optifly
Copy link
Author

The latest v2.17

I really don't think CDK has anything to do with it though. Fundamentally I am unsure how you are supposed to run the healthcheck when on the Fargate/ECS sidecar. Given the healthcheck is run on the sidecar and the otel image doesn't have a shell or curl etc how can ECS consider it healthy?

The only option I believe I have for the healthcheck definition is to use the shell
e.g. command: ["CMD-SHELL", ...

Here's a fairly minimal example which should illustrate it,
https://github.com/pauldoherty-optifly/fargateOtelExample

@pauldoherty-optifly
Copy link
Author

I could obviously take container aws-otel-collector image and add to it then publish it myself but the documentation makes no reference to having to do that

@bryan-aguilar
Copy link
Contributor

Hi @pauldoherty-optifly,

I am going to bring this to the team and see if we can provide an official recommendation. I will reach back out here when I have more information.

@pauldoherty-optifly
Copy link
Author

Thanks 👍

@bryan-aguilar
Copy link
Contributor

Hi @pauldoherty-optifly ,

We do see the issue here. We are working on a solution currently and have added it to the backlog milestone. I will leave this issue open and ensure that is mentioned when a PR is created with a fix.

@PaurushGarg
Copy link
Member

we have now added the healthcheck component with the new ADOT collector release v0.23.0.

@PaurushGarg
Copy link
Member

Closing Issue as PR for this issue is merged and is part of collector v0.23.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADOT collector ADOT Collector related issues extension For any extension related items
Projects
None yet
Development

No branches or pull requests

3 participants