Running prefect local agent in a docker container leads to zombie apocalypse ;-) #2418

mcg1969 · 2020-04-26T03:58:23Z

Description

I'm running prefect agent inside of a Docker container with local execution. Each run of the process leaves a zombie process, a phenomenon which if left unchecked eventually causes deleterious effects. I noticed this because I was at one point unable to ssh into the node on which the container was running.

Expected Behavior

Somehow the completed processes should be harvested to remove the zombies.

Reproduction

My shell script does

exec prefect agent start -t $prefect_runner_token

(note: removing the exec doesn't help). Here's a simple script to create a flow that runs on a schedule:

import prefect
from prefect import Flow, task
from prefect.schedules import IntervalSchedule
from datetime import timedelta, datetime

import time

schedule = IntervalSchedule(
    start_date=datetime.utcnow() + timedelta(seconds=1),
    interval=timedelta(minutes=2),
)

@task
def run():
    logger = prefect.context.get("logger")
    results = []
    for x in range(3):
        results.append(str(x + 1))
        logger.info("Hello! run {}".format(x + 1))
        time.sleep(3)
    return results

with Flow("Hello", schedule=schedule) as flow:
    results = run()

flow.register(project_name="Hello")

Environment

The container is built on CentOS 7.3. It does not have an init process.

{
  "config_overrides": {},
  "env_vars": [],
  "system_information": {
    "platform": "Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-glibc2.10",
    "prefect_version": "0.10.4",
    "python_version": "3.8.2"
  }
}

The text was updated successfully, but these errors were encountered:

mcg1969 · 2020-04-26T04:31:28Z

What I am finding is that each run produces three subprocesses. The process with the smallest PID takes the longest to run as seems to be reaped eventually. The other two processes seem to exit more quickly but are never reaped. Thus each flow run adds net two zombies.

joshmeek · 2020-04-27T13:07:25Z

Congratulations @mcg1969 I think this means that you are patient zero! I will look into this behavior. What are you using as the base image for your container?

mcg1969 · 2020-04-27T13:34:31Z

I'm afraid I can't share the exact container, though I don't mind that you know it's the one that we use inside of Anaconda Enterprise, and @jcrist might have some familiarity with that. That said, it's based on a CentOS 7.3 base image, with Miniconda installed within. I'm happy to share the precise conda environment I was using too if that helps.

joshmeek · 2020-04-27T13:36:34Z

No worries! Was only wondering if it had some possible weird dependencies but this is enough information to go off of 😄

mcg1969 · 2020-04-27T13:37:13Z

Here's the conda environment, re-creatable with

conda create -n testprefect -c defaults -c conda-forge --file ...

The file:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=main
appdirs=1.4.3=pyh91ea838_0
asn1crypto=1.3.0=py38_0
ca-certificates=2020.1.1=0
certifi=2020.4.5.1=py38_0
cffi=1.14.0=py38h2e261b9_0
chardet=3.0.4=py38_1003
click=7.1.1=py_0
cloudpickle=1.2.2=py_0
croniter=0.3.30=py_0
cryptography=2.8=py38h1ba5d50_0
cytoolz=0.10.1=py38h7b6447c_0
dask-core=2.14.0=py_0
distributed=2.14.0=py38_0
docker-py=4.2.0=py38_0
docker-pycreds=0.4.0=py_0
heapdict=1.0.1=py_0
idna=2.9=py_1
ld_impl_linux-64=2.33.1=h53a641e_7
libedit=3.1.20181209=hc058e9b_0
libffi=3.2.1=hd88cf55_4
libgcc-ng=9.1.0=hdf63c60_0
libstdcxx-ng=9.1.0=hdf63c60_0
marshmallow=3.5.1=py_0
marshmallow-oneofschema=2.0.1=py_0
msgpack-python=1.0.0=py38hfd86e86_1
mypy_extensions=0.4.3=py38_0
ncurses=6.2=he6710b0_0
openssl=1.1.1g=h7b6447c_0
packaging=20.3=py_0
pendulum=2.1.0=py38_1
pip=20.0.2=py38_1
prefect=0.10.4=py_0
psutil=5.7.0=py38h7b6447c_0
pycparser=2.20=py_0
pyopenssl=19.1.0=py38_0
pyparsing=2.4.6=py_0
pysocks=1.7.1=py38_0
python=3.8.2=hcf32534_0
python-box=4.2.2=py_0
python-dateutil=2.8.1=py_0
python-slugify=3.0.4=py_0
pytz=2019.3=py_0
pytzdata=2019.3=py_0
pyyaml=5.3.1=py38h7b6447c_0
readline=8.0=h7b6447c_0
requests=2.23.0=py38_0
ruamel.yaml=0.16.10=py38h7b6447c_1
ruamel.yaml.clib=0.2.0=py38h7b6447c_0
setuptools=46.1.3=py38_0
six=1.14.0=py38_0
sortedcontainers=2.1.0=py38_0
sqlite=3.31.1=h62c20be_1
tabulate=0.8.3=py38_0
tblib=1.6.0=py_0
text-unidecode=1.3=py_0
tk=8.6.8=hbc83047_0
toml=0.10.0=pyh91ea838_0
toolz=0.10.0=py_0
tornado=6.0.4=py38h7b6447c_1
unidecode=1.1.1=py_0
urllib3=1.25.8=py38_0
websocket-client=0.57.0=py38_1
wheel=0.34.2=py38_0
xz=5.2.5=h7b6447c_0
yaml=0.1.7=had09818_2
zict=2.0.0=py_0
zlib=1.2.11=h7b6447c_3

mcg1969 · 2020-04-27T13:40:13Z

I wouldn't say there's anything about the container I would expect to cause problems. Anything is possible, of course. But the container doesn't have an init process.

mcg1969 · 2020-05-11T22:29:23Z

I have been able to verify that adding an init process like tini (https://github.com/krallin/tini) to the container, and running everything under that, reaps the zombies properly.

jcrist · 2020-05-12T17:47:35Z

Glad to hear it! Currently it looks like we're implicitly relying on the init process to prune orphaned processes (which IMO is fine, if not ideal). We could possibly fix this in the future, but for now I think I'm fine saying that we require an init process when using the local agent. Leaving it open though. Thanks for the report @mcg1969!

mcg1969 · 2020-05-12T17:54:30Z

I think that's reasonable—a doc fix would be great to consider!

lauralorenz · 2020-06-23T17:45:23Z

Just adding here from IRL convo: we think the docs note should be on the page describing the local agent.

lauralorenz added the bug Something isn't working label Apr 29, 2020

joshmeek added docs and removed bug Something isn't working labels May 19, 2020

lauralorenz assigned jcrist Jun 18, 2020

jcrist mentioned this issue Jul 7, 2020

Recommend tini when running local agent in docker #2925

Merged

3 tasks

cicdw closed this as completed in #2925 Jul 7, 2020

jschnare mentioned this issue Feb 12, 2021

Server healthchecks create curl zombies #4114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running prefect local agent in a docker container leads to zombie apocalypse ;-) #2418

Running prefect local agent in a docker container leads to zombie apocalypse ;-) #2418

mcg1969 commented Apr 26, 2020

mcg1969 commented Apr 26, 2020

joshmeek commented Apr 27, 2020

mcg1969 commented Apr 27, 2020

joshmeek commented Apr 27, 2020

mcg1969 commented Apr 27, 2020

mcg1969 commented Apr 27, 2020

mcg1969 commented May 11, 2020

jcrist commented May 12, 2020

mcg1969 commented May 12, 2020

lauralorenz commented Jun 23, 2020

Running prefect local agent in a docker container leads to zombie apocalypse ;-) #2418

Running prefect local agent in a docker container leads to zombie apocalypse ;-) #2418

Comments

mcg1969 commented Apr 26, 2020

Description

Expected Behavior

Reproduction

Environment

mcg1969 commented Apr 26, 2020

joshmeek commented Apr 27, 2020

mcg1969 commented Apr 27, 2020

joshmeek commented Apr 27, 2020

mcg1969 commented Apr 27, 2020

mcg1969 commented Apr 27, 2020

mcg1969 commented May 11, 2020

jcrist commented May 12, 2020

mcg1969 commented May 12, 2020

lauralorenz commented Jun 23, 2020