Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly #15632

Open
4 of 11 tasks
sli68 opened this issue Nov 13, 2024 · 3 comments
Open
4 of 11 tasks

Comments

@sli68
Copy link

sli68 commented Nov 13, 2024

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

I user AWX 24.3.1. and external Postgres (postgres (PostgreSQL) 15.2)
Every 5 minutes, the awx-web pod log have the following error:
2024-11-13 15:20:41,270 ERROR [-] awx.main.cache_clear Encountered unhandled error in cache clear main loop
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/management/commands/run_cache_clear.py", line 24, in handle
for e in conn.events(yield_timeouts=True):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 90, in events
for notification in notification_generator:
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 74, in current_notifies
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in
sys.exit(manage())
^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/init.py", line 177, in manage
execute_from_command_line(sys.argv)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/init.py", line 442, in execute_from_command_line
utility.execute()
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/init.py", line 436, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 412, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 458, in execute
output = self.handle(*args, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/management/commands/run_cache_clear.py", line 24, in handle
for e in conn.events(yield_timeouts=True):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 90, in events
for notification in notification_generator:
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 74, in current_notifies
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2024-11-13 15:20:41,888 WARN exited: awx-cache-clear (exit status 1; not expected)
2024-11-13 15:20:41,888 WARN exited: awx-cache-clear (exit status 1; not expected)

AWX version

24.3.1

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

deploy awx with external Postgres

Expected results

no error log

Actual results

2024-11-13 15:20:41,270 ERROR [-] awx.main.cache_clear Encountered unhandled error in cache clear main loop
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/management/commands/run_cache_clear.py", line 24, in handle
for e in conn.events(yield_timeouts=True):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 90, in events
for notification in notification_generator:
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 74, in current_notifies
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in
sys.exit(manage())
^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/init.py", line 177, in manage
execute_from_command_line(sys.argv)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/init.py", line 442, in execute_from_command_line
utility.execute()
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/init.py", line 436, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 412, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 458, in execute
output = self.handle(*args, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/management/commands/run_cache_clear.py", line 24, in handle
for e in conn.events(yield_timeouts=True):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 90, in events
for notification in notification_generator:
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/init.py", line 74, in current_notifies
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2024-11-13 15:20:41,888 WARN exited: awx-cache-clear (exit status 1; not expected)
2024-11-13 15:20:41,888 WARN exited: awx-cache-clear (exit status 1; not expected)

Additional information

No response

@2and3makes23
Copy link

2and3makes23 commented Dec 13, 2024

We see the same thing, running AWX 24.6.1, also using an external Postgres DB (15.8)

We also see the message posted above from awx-web/rsyslog cache clear main loop.

In addition to that we see from awx-web/awx-web rsyslog_configurer main loop (probably the same underlying problem):

2024-12-13 06:37:55,451 ERROR [-] awx.main.rsyslog_configurer Encountered unhandled error in rsyslog_configurer main loop
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/management/commands/run_rsyslog_configurer.py", line 29, in handle
for e in conn.events(yield_timeouts=True):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/__init__.py", line 90, in events
for notification in notification_generator:
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/__init__.py", line 74, in current_notifies
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: Connection timed out
Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in <module>
sys.exit(manage())
^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/__init__.py", line 177, in manage
execute_from_command_line(sys.argv)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
utility.execute()
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 412, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 458, in execute
output = self.handle(*args, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/management/commands/run_rsyslog_configurer.py", line 29, in handle
for e in conn.events(yield_timeouts=True):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/__init__.py", line 90, in events
for notification in notification_generator:
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/__init__.py", line 74, in current_notifies
raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: Connection timed out

@2and3makes23
Copy link

2and3makes23 commented Dec 16, 2024

Possible source of problem

Looking at DB connections from a firewall point of view might reveal the source of the problem here:
With our setup idle sessions expire/are closed by firewall after 1h without any traffic.

What seems to happen is, that AWX tries to reuse those already closed connections over and over again. All attempts like this fail like stated above.

What didn´t help in our case was to configure the firewall to send a reset package to DB and AWX when closing "stale" sessions, in hope, AWX would then go ahead and establish a new session when needed next (which it currently just doesn´t do):
grafik

Possible fixes

  • some kind of keepalive package (ideally with configurable interval) to prevent the firewall from closing connections.
    (maybe is there aleady an option in place I´m not aware of?)
  • making AWX create new sessions if last used session is no longer open (if that does not produce too much overhead or concurrency issues).
  • Set a max session age on AWX side, see related issue Set CONN_MAX_AGE to better integrate with databases that time out idle connections #13481

Related issues

@2and3makes23
Copy link

2and3makes23 commented Dec 19, 2024

For now we used a work around* by setting up TCP keepalives on DB side to prevent open sessions from getting stale, which solves the issue for us.

*Would switch to an application side fix via CONN_MAX_AGE or similar, as soon as available, to have everything required to run AWX cleanly in one place.

@ AWX Team Thanks for all your great work on the AWX project! 🎄 🎄 🎄 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants