Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atlantis terraform live logs are not showing #2542

Open
omotoso78 opened this issue Sep 27, 2022 · 15 comments
Open

Atlantis terraform live logs are not showing #2542

omotoso78 opened this issue Sep 27, 2022 · 15 comments
Labels
bug Something isn't working Stale

Comments

@omotoso78
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

I am able to run the atlantis plan and atlantis apply, working fine. But, unable to see terraform live logs, when it is planning/applying.

The link provided in the "details" opens a blank screen.

Reproduction Steps

Atlantis install v.0.19.7. it is a local install using git enterprise user. No repo.yaml or atlantis.yaml is used. pull request is submitted from a branch. and works fine. But the log is not visible

Logs

Environment details

Additional Context

@omotoso78 omotoso78 added the bug Something isn't working label Sep 27, 2022
@prastamaha
Copy link

I'm also experiencing the same thing while using Atlantis image v0.19.8 with Terragrunt customization.

My repos.yaml configuration as below

repos:
- id: "/.*/"
  workflow: terragrunt
  apply_requirements: [approved,mergeable]
workflows:
  terragrunt:
    plan:
      steps:
      - env:
          name: TERRAGRUNT_TFPATH
          command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
      - run: terragrunt plan -out=$PLANFILE
      - run: terragrunt show -json $PLANFILE > $SHOWFILE
    apply:
      steps:
      - env:
          name: TERRAGRUNT_TFPATH
          command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
      - run: terragrunt apply $PLANFILE

I tried enabling logLevel: "debug" and found the following error

{"level":"debug","ts":"2022-09-29T17:37:30.849Z","caller":"server/middleware.go:70","msg":"GET /jobs/7836d3a3-f208-4c4d-ac9e-37d1a06116a7/ws – respond HTTP 500","json":{}}

@nitrocode
Copy link
Member

nitrocode commented Oct 19, 2022

Try atlantis plan --verbose

It would help to know how you folks have deployed the app. I use the latest version and can see the logs within my eks cluster.

@andy-paine-numan
Copy link
Contributor

Have you got websockets enabled on any networking infrastructure that your Atlantis installation sits behind? For example, I had to add an annotation to my K8s Contour Ingress to allow websocket streaming to work (the /ws on the end of the URL is for websockets)

@pantelis-karamolegkos
Copy link
Contributor

pantelis-karamolegkos commented May 5, 2023

I am facing a similar issue, the difference being that the logs appear at once altogether at once once the apply / plan process is complete, i.e. they are not actually "streamed". atlantis is running on a VM so I don't know what type of websocket - related configuration can be done.

@nitrocode
Copy link
Member

@miguelaferreira
Copy link

While I can see the live logs, it often happens that I need to refresh the page a few times before the web-socket connection succeeds and the logs start streaming.

Screenshot of browser console when web-socket connection fails

image

@nitrocode
Copy link
Member

nitrocode commented Jun 5, 2023

@marceloboeira it seems like this may be reflected by the deployment of atlantis. I'm curious if there is a misconfiguration in the deployment, a limitation in the cloud deployment used, or something that can be mitigated by additional logic in atlantis. Or maybe a combination.

If we can do anything in the atlantis server, then please feel free to propose a pr if you find a way to reproduce and resolve the issue.

Maybe it's as simple as doing a retry in the frontend to connect to the websocket?

@cloudn8ve
Copy link

  • helm chart version (4.14.0)
  • app version of 0.24.4

I've deployed atlantis into an EKS Cluster. I am also running into this issue with the websocket and getting 500 error codes when running in debugging mode.

@marcosdiez
Copy link
Contributor

I had this problem before. It was a permission issue.
The good thing is that on Atlantis >= v0.27.0, this is explicitly logged, so you can double check that on atlantis stdout.

The trivial solution (just to test) is make atlantis a repo owner.

Also, this new version of atlantis shows every terraform log on it's HTTP website. It's not as comfortable as clicking on github, but it does the trick.

@dimisjim
Copy link
Contributor

I am experiencing this as well in v0.27.1

Sometimes it works if you click on the link / job with the output but some other times it works only after a refresh, or it could even show a partial part of the output every time you refresh.

@marceloboeira
Copy link
Contributor

@dimisjim what do you see if you open that page with developer tools? in theory, that's most of the times because of the loadbalancer and websocker connection...

@dimisjim
Copy link
Contributor

dimisjim commented Feb 19, 2024

@marceloboeira

These are there always:

image

This shows when {some} / {sometimes all} of the content loads up:

xterm-4.9.0.js:24 Canvas2D: Multiple readback operations using getImageData are faster with the willReadFrequently attribute set to true. See: https://html.spec.whatwg.org/multipage/canvas.html#concept-canvas-will-read-frequently

This shows up when no content loads up:

93e23b46-4b37-429f-8959-a2d03b39d3db:66 WebSocket connection to 'wss://<ATLANTIS_URL>/jobs/93e23b46-4b37-429f-8959-a2d03b39d3db/ws' failed: 

We are using a GCP load balancer.

@marceloboeira
Copy link
Contributor

I think it might be that you need to tweak your LB to properly forward the WebSocket connection to the Atlantis instance.

That was the case when I used Atlantis with NGINX / ALB, I had to make a few changes to allow sticky sessions, some specific config for NGINX to keep alive and Upgrade/Connection headers — WebSockets on NGINX.

You might have to figure out the equivalents for GCP — https://cloud.google.com/load-balancing/docs/https#websocket_support

It seems to be by default enabled, but you might want to review if the timeouts and such.

Overall, atlantis could use a much simpler polling-based log-stream, it would be easier to make it compatible everywhere, WS for this purpose is overkill.

@dimisjim
Copy link
Contributor

dimisjim commented Feb 19, 2024

@marceloboeira

Hmm based on the doc you linked:

The load balancer does not need any configuration to proxy WebSocket connections.

and Upgrade/Connection headers are also supported:

When the load balancer recognizes a WebSocket Upgrade request from an HTTP(S) client followed by a successful Upgrade response from the backend instance, the load balancer proxies bidirectional traffic for the duration of the current connection. If the backend instance does not return a successful Upgrade response, the load balancer closes the connection.

so it should be working out of the box, at least from the GCP Load balancing side. Maybe also the setup I am using based on: https://github.com/bschaatsbergen/terraform-gce-atlantis makes a difference in this regard? Can't tell.

The session affinity is set to none in GCP load balancing by default (I thought to modify this as per doc, this is the one we can manipulate anyway). Setting it to ClientIP and "Maglev" routing policy didn't make a difference 🤔

Thanks for the hints anyhow!

@starkers
Copy link

starkers commented Apr 10, 2024

Same problem with my atlantis

We're using oauth2 proxy and haproxy ingress for the k8s ingress. What I found was that disabling oauth2 proxy security for /jobs magically solved this..

There are no logs generated by atlantis which I can see; but without checking the code this leads me to think its behaving differently based on headers..

The additional headers I can see in use (when authentication is applied) to /jobs (prefix) are:

next up I'll try to disable sending these/or some of these headers to atlantis and see if it works

also possibly/maybe the oauth2 filtering layer by the ingress doesn't see expected headers from the client also.. not sure honestly

image

Would be really great if atlantis just didn't insist on wss:// which are notoriously painful on k8s.
Re: #2026

@dosubot dosubot bot added the Stale label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests