Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set CONN_MAX_AGE to better integrate with databases that time out idle connections #13481

Open
4 of 9 tasks
kdelee opened this issue Jan 27, 2023 · 3 comments
Open
4 of 9 tasks

Comments

@kdelee
Copy link
Member

kdelee commented Jan 27, 2023

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.

Feature type

Enhancement to Existing Feature

Feature Summary

I've had a few questions posed to me lately about how long we keep database connections open. I am still learning about the motivations behind the questions, but I think they stem from the fact that some managed postgres services on cloud providers set timeouts in postgres such as https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-IDLE-SESSION-TIMEOUT or people who use services like PGBouncer have settings that sever idle connections.

We addressed a huge source of idle database connections thanks to #11745
However, we still have a few other sources of persistent connections that can sit idle for indefinite periods of time.

These are:

  • the callback receiver has persistent database connections that it may recover if they get broken, but in general never closes them.
  • dispatch workers doing tasks other than running jobs leave connections open, @AlanCoding mentioned "dispatcher tasks close connections when they start, not when they finish. This seemed wrong to me, but I have not fixed it."

We may consider setting CONN_MAX_AGE https://docs.djangoproject.com/en/4.0/ref/databases/#persistent-connections to take advantage of Django's own logic for killing/restablishing database connections.

Select the relevant components

  • UI
  • API
  • Docs
  • Collection
  • CLI
  • Other

Steps to reproduce

We can examine how long we are keeping database connections open:

 SELECT 
    pidplication_name
    ,datnamehostname
    ,usenameport
    ,application_name
    ,client_hostname
    ,client_port
    ,backend_startity
    ,query_start
    ,query  e = 'awx' ORDER BY backend_start;
FROM pg_stat_activity
WHERE usename = 'awx' ORDER BY backend_start;

Current results

right now you would see connnections from the callback receivers hanging out for basically forever, and a dispatcher with idle workers would also keep database connections open after doing work, but if they never pick up anything new they just keep the old connection.

Sugested feature result

Recycle connections after a configurable amount of time by setting CONN_MAX_AGE

this relates to other work that could happen to in general make controller more reselient to losing a connection and re-establishing it.

Additional information

No response

@mimianddaniel
Copy link

I am using 23.7.0 with 4/3 task/web setup and using external backend with 200 Max connections.
I see a lot of idle wsrelay-awx-tasks connections close to 80-90% of 200 max connections at peak and they seem to timeout and the number drops to 30-40% level of max connections.
Same queries for all of them:
SELECT "conf_setting"."id", "conf_setting"."created", "conf_setting"."modified", "conf_setting"."key", "conf_setting"."value", "conf_setting"."user_id" FROM "conf_setting" WHERE ("conf_setting"."key" = 'LOG_AGGREGATOR_LOGGERS' AND "conf_setting"."user_id" IS NULL) ORDER BY "conf_setting"."id" ASC LIMIT 1
I tried using CONN_MAX_AGE on the client side and this had no effect. Is there normal behavior we expect to have?

@martin-micimo
Copy link

I can confirm this same query SELECT "conf_setting"."id", "conf_setting"."created", "conf_setting"."modified", "conf_setting"."key", "conf_setting"."value", "conf_setting"."user_id" FROM "conf_setting" WHERE ("conf_setting"."key" = 'LOG_AGGREGATOR_ENABLED' AND "conf_setting"."user_id" IS NULL) ORDER BY "conf_setting"."id" ASC LIMIT 1 is also not terminated on AWX 24.6.1 and PostgreSQL 15 external Database.

@2and3makes23
Copy link

2and3makes23 commented Dec 16, 2024

Found this while looking around in regard to #15632
In our case old (> 1h idle) DB connections are closed by firewall as a safety feature.

Something like CONN_MAX_AGE or some kind of keepalive package would probably fix #15632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants