Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Incident] US cluster is unstable (some users are unable to log in, some unable to start workspaces) #3499

Closed
extrakun opened this issue Mar 19, 2021 · 21 comments
Labels
type: incident Gitpod.io service is unstable

Comments

@extrakun
Copy link

Bug description

I am getting a 504 error when I attempt to login via Gitpod via Github

Steps to reproduce

  1. Go to https://www.gitpod.io
  2. Click Login
  3. Select Github

Expected behavior

Able to login via Github

Example repository

No response

@jankeromnes
Copy link
Contributor

Hi @extrakun, thanks for your report!

We've deployed a security patch to gitpod.io around the same time as your report, so this might have been a temporary issue around deployment.

Is it still happening now?

@extrakun
Copy link
Author

Yes it is still happening. I tried clearing cache and using incognito and it's still does not work.

@jankeromnes jankeromnes added the type: incident Gitpod.io service is unstable label Mar 19, 2021
@jankeromnes
Copy link
Contributor

Thanks for confirming it's still happening.

We're now investigating a potential incident in the US cluster. Will update here as soon as we know more.

@jankeromnes jankeromnes changed the title Unable to log into Gitpod [Incident] US cluster is unstable (some users are unable to log in, some unable to start workspaces) Mar 19, 2021
@iamparasbansal
Copy link

I also happened with me. Which brought me here.

@jankeromnes
Copy link
Contributor

jankeromnes commented Mar 19, 2021

Incident posted: https://status.gitpod.io/incidents/547e8145

We've identified a few problems in the US cluster and are working to mitigate them.

EDIT: We've now restarted core services in the US cluster. (If it remains unstable after the restart, we're looking to route all US cluster traffic to EU.)

@kunxin-chor
Copy link

Hi, I am stuck on acquiring nodes and some of my students having problems with detecting ports.

@smacintyre
Copy link

My whole team is down; some can't login and those who can are all stuck acquiring nodes. It would be nice if there was an Asia cluster rather than routing those of us in Asia to the US. I assume is deploy was aimed at night time in the US. But it's core business hours for us.

Still, I love gitpod. HugOps!

@jankeromnes
Copy link
Contributor

Thanks @smacintyre, I agree. We're planning to open a new Asia cluster later this year.

Also, we were able to resolve the US cluster incident, and the service is back to operational. Please report here if anything still doesn't work well on your side.

@davidar
Copy link

davidar commented Mar 20, 2021

I'm still unable to start any workspaces (it hangs on "Starting..." for a long time until getting a "Connection got disposed" error), and https://gitpod.io/workspaces/ produces similar errors

@Bubbler-4
Copy link

Yeah, apparently the incident is happening again. I could log in via Github and view my workspaces list (though it takes some time to load), but the container I started is stuck at "Starting...".

While I was typing this, the container on the workspaces page gave me the message "Last run a few seconds ago: QueryFailedError: ER_LOCK_WAIT_TIMEOUT: Lock wait timeout exceeded; try restarting transaction".

@apolopena
Copy link

apolopena commented Mar 21, 2021

Yeah this issue is still happening as of 5:00pm PST 3/20/2021. I am on the US cluster.
There was some time earlier through the afternoon where the issue was gone, however it is back.
I have cleaned my cache and there was no noticable effect. Workspaces still wont build or display.

https://gitpod.io won't load.
image
image

Either will http://gitpod.io/workspaces
image

@ytadesse
Copy link

Still happening for me as well. Workspace finally loads but has issues connecting to Github repo stating “Request fetch failed with message: The repository does not seem to exist anymore. You may not have access, or it may have been deleted or renamed.” even though it’s a private repo that only I own/operate (and, obviously still exists).

@Bubbler-4
Copy link

Bubbler-4 commented Mar 22, 2021

This issue should be reopened. From the view of an end user, the US cluster incident is not fully resolved and is still happening (periodically breaking down, making workspaces unusable).

The problems encountered on my side:

  • Friday: Unable to authenticate via GitHub, unable to see the list of workspaces
  • Saturday: Able to log in and see the list of workspaces, unable to start workspaces with various error messages
  • Sunday: Able to start workspaces, but very long startup time and longer build time (I saw Gitpod is constantly indexing the directories, which made my build at least 10x slower)
  • Monday (right now): Unable to start workspaces Able to start workspaces but very long startup time (10min+, it was stuck at "Booting... (0/3)" for pretty long time)

(Days in GMT+9)

Reported problems so far: #3515 #3516 #3518 #3522 #3523 #3527 (also probably related to #3520 )


EDIT: It looks like the severity of the problems varies wildly across users. At least I can see workspaces, start workspaces without getting a "message bus" problem, and I can use the terminal just fine. Maybe it's not a whole US cluster problem. I'm on ws-us03 btw.

@smacintyre
Copy link

@jankeromnes Also seeing issues. Not good for start of Monday (TZ: Asia/Bangkok). I'm able to login and start workspaces fine, however the terminal never starts. Just sits blank.

@amafjarkasi
Copy link

This issue is still active in the USA - please look into it...

image

@apolopena
Copy link

This issue should be reopened. From the view of an end user, the US cluster incident is not fully resolved and is still happening (periodically breaking down, making workspaces unusable).

The problems encountered on my side:

  • Friday: Unable to authenticate via GitHub, unable to see the list of workspaces
  • Saturday: Able to log in and see the list of workspaces, unable to start workspaces with various error messages
  • Sunday: Able to start workspaces, but very long startup time and longer build time (I saw Gitpod is constantly indexing the directories, which made my build at least 10x slower)
  • Monday (right now): Unable to start workspaces Able to start workspaces but very long startup time (10min+, it was stuck at "Booting... (0/3)" for pretty long time)

(Days in GMT+9)

Reported problems so far: #3515 #3516 #3518 #3522 #3523 #3527 (also probably related to #3520 )

EDIT: It looks like the severity of the problems varies wildly across users. At least I can see workspaces, start workspaces without getting a "message bus" problem, and I can use the terminal just fine. Maybe it's not a whole US cluster problem. I'm on ws-us03 btw.

I can confirm that I am on the same cluster and have experienced the exact same symptoms on the exact same days as @Bubbler-4

@amafjarkasi
Copy link

Ya my workspaces are opening but eslint isn't working

@smacintyre
Copy link

smacintyre commented Mar 22, 2021

@jankeromnes I love Gitpod, but I'm very disappointed by the lack of response. It's been over 6 hours and still no response from GitPod here, on twitter, on the forum, or at status.gitpod.io. Where is the ops team? I hate to play the whole "paying customer" thing here, but this is really starting to shake my confidence for using Gitpod as my team's go to IDE. You guys have so much potential. Once this is resolved, a blog post explaining the issue, the lack of response, and how your team will improve it's response in the future is in order. We all know that things break and go wrong. But the silence is the issue. It makes me wonder if you support your offering outside of European business hours. There should have been at least an acknowledgement by now.

As with others, it seems I'm on ws-us003. While things are "working" more or less, running commands in the shell is painfully slow. Standard commands that usually execute in a couple of seconds are taking up to 10 minutes to run. top is reporting massive load averages but my container is doing nothing.

@Cdca12
Copy link

Cdca12 commented Mar 22, 2021

Same problem here, can't open my repos

image

Edit: Now it's working!

@smacintyre
Copy link

Now I'm getting the "Red Screen of Death" too. 😢

@svenefftinge
Copy link
Member

As Jan explained here we are truly sorry for all this unreliability and the lack of communication. We are actively expanding our SRE team in order to have someone 24/7 on call and be more professional when things go wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: incident Gitpod.io service is unstable
Projects
None yet
Development

No branches or pull requests