-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status endpoint should confirm database connection is working #1214
Comments
Did the incident stop only one instance of Via (i.e. one autoscaling VM) from connecting? Or all of them? If Via couldn't connect to its DB and requests that do DB queries (such as those related to the YouTube feature) would have been erroring (with Anyway, the status endpoint can certainly do a trivial DB query to test the DB connection, see h and LMS's status endpoints for an example. See How do our |
I don't know the full answer to that, but the way Via was failing was that instead of failing immediately with a 500, it would hang for 30s and then timeout. So I suspect this means that the Python app's attempt to connect to the DB was just hanging, rather than being rejected. Or maybe something was attempting a retry? The infrastructure issue was that the Via instance got removed from the security group that allowed it to access the DB. |
Hmm, that seems like an undesirable behaviour when the DB is down. We might want to reduce that timeout. Although ultimately the important thing is that the DB should be reliably up.
Hmm, I don't think so. |
We had an outage of the YouTube integration today because an infrastructure change stopped an instance of Via from connecting to its DB. We weren't alerted to the issue immediately because the status endpoint doesn't check the ability to connect to the DB, like the corresponding endpoints in h and lms do.
The text was updated successfully, but these errors were encountered: