Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Downtime in Rails Application Running on AWS App Runner #261

Open
svyatmuzyka opened this issue Sep 17, 2024 · 1 comment
Open

Comments

@svyatmuzyka
Copy link

Description:
I'm experiencing intermittent downtime with my Rails application hosted on AWS App Runner. A third-party service, Crisp, monitors the availability of my site and sends notifications whenever it is unreachable. I receive these downtime alerts from Crisp a couple of times a day.

Steps to reproduce:

  • Deploy a Rails application on AWS App Runner.
  • Set up a monitoring service (e.g., Crisp) to check site availability.
  • Observe the intermittent downtime notifications.

Expected behavior:
The application should remain available consistently, especially since there are no signs of high CPU or memory usage.

Observed behavior:
Even though the application is running under low load (as indicated by AWS App Runner metrics), Crisp reports that my site is unavailable for short periods. During these periods:

  • Metrics show no CPU or memory exhaustion.
  • However, AWS CloudWatch logs indicate gaps of up to 4 minutes between the last stream's end and the beginning of the next one, which aligns with the reported downtime.

Screenshot 2024-09-17 at 22 39 46
Screenshot 2024-09-17 at 22 41 04

Questions:

  1. What could be causing these 4-minute gaps in the logs, which coincide with the periods when my site is unreachable?
  2. Why might the Rails application be stopping so frequently, even when resource usage seems normal?

Logs and metrics:
I have attached relevant logs above and metric graphs below. Let me know if you need more details.

Screenshot 2024-09-17 at 22 44 08

@EliteXCoder1
Copy link

I think i am having a similar issue with my django/python app. Everything was working fine until i did a recent update 3 days ago and now at least like once or twice every hour but only sometimes and a new instance gets deployed when this was never happening before. My app will be down for a couple of minutes until it's available again... My metrics do not show any spike in CPU or Memory before it happens and I know this is correct cause my app doesn't have much traffic at this point. Cloud watch generates new instance logs every time this happens so i know when a new instance gets created. I have the dev version in the same environment, just a difference service and its not happening to that service. Whenever this issue happens, i get the following in the browser until the new instance is up:

upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error: 111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants