Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server healthchecks create curl zombies #4114

Closed
jschnare opened this issue Feb 12, 2021 · 11 comments · Fixed by PrefectHQ/server#190 or PrefectHQ/server#191
Closed

Server healthchecks create curl zombies #4114

jschnare opened this issue Feb 12, 2021 · 11 comments · Fixed by PrefectHQ/server#190 or PrefectHQ/server#191

Comments

@jschnare
Copy link

Description

When I run the local server I get zombie curl processes, about 3 per second. They get cleaned up when I stop the server.

I suspect that it's the health checks added in #4041.

#2418 refers to running the agent under tini, which should fix the server containers as well.

Expected Behavior

Containers shouldn't generate zombie processes. Given enough time they'll fill the host's process list.

Reproduction

$ prefect backend server`
$ prefect server start --detach
$ while true; do echo -ne "$SECONDS\t"; pgrep -u root curl | wc -l; sleep 3; done
61802	47
61805	53
61808	58
61811	63
61814	69
61817	73
61821	79
61824	85
61827	89
61830	95
61833	99

Environment

$ docker -v
Docker version 20.10.3, build 48d30b5
$ docker-compose -v
docker-compose version 1.28.2, build 67630359
{
  "config_overrides": {
    "context": {
      "secrets": false
    },
    "server": {
      "telemetry": {
        "enabled": true
      }
    }
  },
  "env_vars": [],
  "system_information": {
    "platform": "Linux-5.4.0-58-generic-x86_64-with-debian-bullseye-sid",
    "prefect_backend": "server",
    "prefect_version": "0.14.8",
    "python_version": "3.7.9"
  }
}
@jschnare jschnare changed the title GraphQL healthcheck creates curl zombies Server healthchecks create curl zombies Feb 12, 2021
@zanieb
Copy link
Contributor

zanieb commented Feb 12, 2021

Thanks for the thorough report @jschnare!

@tothandor
Copy link

tothandor commented Feb 18, 2021

I can confirm that. Host falls over within a few hours.

@zanieb
Copy link
Contributor

zanieb commented Feb 18, 2021

We are working on a fix for this, it'll be in the next release.

@zanieb
Copy link
Contributor

zanieb commented Feb 19, 2021

Hey @jschnare -- could you give #4142 a quick test? I (weirdly) cannot reproduce this on my dev machine. If not I will try on another machine. Thanks!

@asmhack
Copy link

asmhack commented Feb 22, 2021

Have the same situation.
prefect version: 0.14.9
GCP 4.19.0-14-cloud-amd64 1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux

CONTAINER ID   NAME                       CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
a1468dabae7f   cli_ui_1                   0.00%     4.863MiB / 18.87GiB   0.03%     1.13kB / 0B       4.1kB / 4.1kB     5
e108edc07160   cli_towel_1                0.00%     54.67MiB / 18.87GiB   0.28%     20.8kB / 32.7kB   32.8kB / 0B       14
c9a3fd0d21b8   cli_apollo_1               3.13%     67.51MiB / 18.87GiB   0.35%     1.63MB / 13.8kB   442kB / 16.4kB    2397
0c177d7b2c7b   cli_graphql_1              2.18%     84.14MiB / 18.87GiB   0.44%     8.87kB / 96.2kB   1.37MB / 0B       2443
b2106c790e07   cli_hasura_1               2.61%     148.7MiB / 18.87GiB   0.77%     1.1MB / 1.31MB    0B / 0B           19
dff138a573af   cli_postgres_1             7.19%     52.78MiB / 18.87GiB   0.27%     1.24MB / 1.05MB   16.4kB / 1.91MB   10
5fa738b0818e   registry                   0.00%     10.98MiB / 18.87GiB   0.06%     1.52kB / 0B       17.9MB / 0B       13

Confirming that it's zombie curl processes, like these:

.....
0 Z root     32586 32586 13005  0  80   0 -     0 -      08:37 ?        00:00:00 [curl] <defunct>
0 Z root     32612 32612 13005  0  80   0 -     0 -      08:30 ?        00:00:00 [curl] <defunct>
0 Z root     32626 32626 13255  0  80   0 -     0 -      08:37 ?        00:00:00 [curl] <defunct>
0 Z root     32630 32630 13255  0  80   0 -     0 -      08:30 ?        00:00:00 [curl] <defunct>
0 Z root     32663 32663 13005  0  80   0 -     0 -      08:37 ?        00:00:00 [curl] <defunct>
0 Z root     32700 32700 13255  0  80   0 -     0 -      08:37 ?        00:00:00 [curl] <defunct>
0 Z root     32707 32707 13005  0  80   0 -     0 -      08:30 ?        00:00:00 [curl] <defunct>
0 Z root     32746 32746 13255  0  80   0 -     0 -      08:30 ?        00:00:00 [curl] <defunct>
0 Z root     32753 32753 13005  0  80   0 -     0 -      08:37 ?        00:00:00 [curl] <defunct>
....

@asmhack
Copy link

asmhack commented Feb 22, 2021

Hey @madkinsz, confirming, #4142 works prefectly for me.
Thank you!

@UPetit
Copy link

UPetit commented Feb 23, 2021

Hi @madkinsz,
I've seen that you published a release to the server related to this issue. Thank you for that 🙏
Is it possible to get the fix officially already or should we wait for the docker images to be released as well?

@zanieb
Copy link
Contributor

zanieb commented Feb 23, 2021

You can run prefect server start --version master but there will be a release that points prefect core to the new images today.

@ClementC
Copy link

Ha! Just stumbling on this issue, I was having the same problem (got almost locked out of my server...).
My hacky workaround was a minute-by-minute cronjob like follows: 😬

* * * * * parents_of_dead_kids=$(ps -ef | grep [d]efunct  | awk '{print $3}' | sort | uniq | egrep -v '^1$'); echo "$parents_of_dead_kids" | xargs kill

@zanieb
Copy link
Contributor

zanieb commented Feb 24, 2021

This should now be resolved in https://github.com/PrefectHQ/prefect/releases/tag/0.14.10

@UPetit
Copy link

UPetit commented Feb 24, 2021

This is resolved thanks! I have now around 24 pids for Apollo and 21 for GraphQL and they are not growing 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants