Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permanently enable installed collections #108

Closed
wants to merge 1 commit into from

Conversation

rhcarvalho
Copy link
Contributor

Forget the past:

$ docker run --rm centos/python-34-centos7 python --version
Python 3.4.2
$ docker run --rm --entrypoint=python centos/python-34-centos7 --version
Python 2.7.5

Introducing a world where users don't have to care about enabling SCL collections to run their software:

$ docker run --rm scl-enabled-python-34-centos7 python --version
Python 3.4.2
$ docker run --rm --entrypoint=python scl-enabled-python-34-centos7 --version
Python 3.4.2

Think oc exec my-python-powered-pod ./manage.py migrate, without having to wrap the call in a Bash shell...

With a similar patch applied to all images, one day we can drop the hacks with BASH_ENV, ENV and PROMPT_COMMAND and enjoy permanent, baked-in, enabled collections in Docker images.

The future is also bright for users trying to build images based on our images:

$ cat Dockerfile
FROM scl-enabled-python-34-centos7
RUN python --version
$ docker build .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM scl-enabled-python-34-centos7
 ---> acaf29cfe5ae
Step 2 : RUN python --version
 ---> Running in 8e5fd0781ae9
Python 3.4.2
 ---> 415c3472b26a
Removing intermediate container 8e5fd0781ae9
Successfully built 415c3472b26a

@bparees
Copy link
Collaborator

bparees commented Mar 30, 2016

@rhcarvalho so this is basically hardcoding the env variables that scl enable otherwise sets?

@hhorak can you or someone from the SCL team take a look at this? Is there anything missing from this that scl enable handles? I know we also had fears at one point about "yum" using the wrong python version if we do this, but @rhcarvalho has tested yum and it appears to tolerate python3.

@rhcarvalho
Copy link
Contributor Author

@rhcarvalho has tested yum and it appears to tolerate python3.

Not that it runs on Python 3. The image for Python 3.3 and Python 3.4 have both those versions and the system version, 2.7.x.

/usr/bin/yum has a shebang #!/usr/bin/python, so it is always executed by the system Python.


BTW

In the "scl-enabled" image, python is python3. It is still possible to use python2:

$ docker run --rm --entrypoint=python2 scl-enabled-python-34-centos7 --version
Python 2.7.5

@mfojtik
Copy link
Contributor

mfojtik commented Mar 30, 2016

@praiskup will something like work for postgresql/mysql SCL?

@mfojtik
Copy link
Contributor

mfojtik commented Mar 30, 2016

@rhcarvalho i experimented with something similar couple months ago... one caveat of doing this is that some collections require another collections enabled :-) for example nodejs also require v8.... ruby require nodejs, etc... we can hardcode this into Dockerfiles, but it might get messy. Generally I like this idea, but I think SCL folks are more familiar with SCL tooling and all pros/cons of doing this.

@rhcarvalho
Copy link
Contributor Author

some collections require another collections enabled :-) for example nodejs also require v8.... ruby require nodejs, etc...

The tool to generate this patch takes a diff of a clean environment and an environment where all collections installed in an image are enabled. So this is not something to worry about.

we can hardcode this into Dockerfiles, but it might get messy.

That's why I'm only proposing this because we have software to generate it for any Docker image you can get.

I have generated similar patches for all images we distribute, including CentOS and RHEL variants.

Generally I like this idea, but I think SCL folks are more familiar with SCL tooling and all pros/cons of doing this.

From one hand, our existing hacks put the user in an all-scl-enabled environment when using a shell (bash or sh -i), there'd be no change there.

What our hack today doesn't cover is life outside of a shell and extending the image, what this PR would hopefully help with.

@rhcarvalho
Copy link
Contributor Author

The assumptions here are:

  • All side-effects of running scl enable COLLECTION_NAME bash are only affecting the environment (backed up by this comment by @hhorak osc exec, Docker images and Software Collections openshift/origin#2001 (comment)).
  • Most likely, we'll write this once and never touch it for existing images, and for new images we'll use the envdiff tool to generate the environment patch.
    The assumption here is that released SCL versions are unlike to change the way they change the environment.
    If the assumption is off, we need to rerun the script with every SCL release to keep this diff up to date. If that's a real big risk, we can integrate a script that updates that patch automatically, or at least checks it, with every PR.

@hhorak
Copy link
Member

hhorak commented Mar 31, 2016

I've been thinking about this issue long time and it seems to be the only working solution to have the SCL enabled properly everywhere, anytime. I've asked other maintainers to look at it as well, whether there are not some other hidden issues.
So far, the only issue I realize is what was already mentioned -- the depended collections, which will make the variables definition much longer, but that should be it. Otherwise it should work, so I'd be fine with this change.
I haven't test it, but I think we can test quite easily whether the static definition is correct by running a bit smarter (and more hidden) variant of:

# run this
RUN scl enable <scl> env >a
ENV X_SCLS=...
RUN env >b
RUN diff a b

@rhcarvalho
Copy link
Contributor Author

@hhorak envdiff does that, but better than a simple text-based diff. I've explained all the details in How to reverse-engineer a Software Collection for Docker images.

As one can see in the example there, envdiff does handle enabling multiple collections at once, as is the case for the Ruby image.

LD_LIBRARY_PATH=/opt/rh/python27/root/usr/lib64 \
PATH=/opt/rh/python27/root/usr/bin:$PATH \
XDG_DATA_DIRS=/opt/rh/python27/root/usr/share \
PKG_CONFIG_PATH=/opt/rh/python27/root/usr/lib64/pkgconfig
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to "fine-tune" certain list variables to include a recursive reference to its previous value, even if it was previously undefined. Those wouldn't be possible to auto-detect.

For example:

-MANPATH=/opt/rh/python27/root/usr/share/man:
+MANPATH=/opt/rh/python27/root/usr/share/man:$MANPATH
# oddly, scl enable does leave a trailing ':' there!

-LD_LIBRARY_PATH=/opt/rh/python27/root/usr/lib64
+LD_LIBRARY_PATH=/opt/rh/python27/root/usr/lib64:$LD_LIBRARY_PATH

-XDG_DATA_DIRS=/opt/rh/python27/root/usr/share
+XDG_DATA_DIRS=/opt/rh/python27/root/usr/share:$XDG_DATA_DIRS

-PKG_CONFIG_PATH=/opt/rh/python27/root/usr/lib64/pkgconfig
+PKG_CONFIG_PATH=/opt/rh/python27/root/usr/lib64/pkgconfig:$PKG_CONFIG_PATH

The reason to do that is if we ever try to set a value for one of those variables before this opaque ENV instruction, or, harder to find out, a parent layer does set that variable, we'd be inadvertently overwriting the previous value.

To minimize human intervention, we can have a whitelist of variables that we want to give this special treatment, and have envdiff do the work for us.

@bparees @hhorak WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think we should definitely do that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RPMs security team pointed out that enable scriptlet needs to avoid exporting path with empty path segment (usually a trailing ':'). It's done by appending envar only if it is already defined, e.g. export LD_LIBRARY_PATH=/opt/rh/ruby200/root/usr/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

IMO we should do the same in the case of docker images too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lukaszachy. That sounds like a good idea. I'll see if Docker supports the syntax, I'm not sure about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice nice! It works 🎉 🎊

I did it for PATH:

$ grep \$\{PATH Dockerfile
    PATH=/opt/rh/rh-python34/root/usr/bin${PATH:+:${PATH}} \
$ docker build -t python-34-smart-path .
...
$ docker inspect --type=image python-34-smart-path | grep \"PATH=
            "PATH=/opt/rh/rh-python34/root/usr/bin:/opt/app-root/src/.local/bin/:/opt/app-root/src/bin:/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "PATH=/opt/rh/rh-python34/root/usr/bin:/opt/app-root/src/.local/bin/:/opt/app-root/src/bin:/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",

@ppisar
Copy link

ppisar commented Apr 1, 2016

There is a conceptual problem. The "scl enable" can do more things than just setting environment variables to same fixed value. The enable script is for each collection is an arbitrary shell script.

I have no idea if there is an collections that does more, but I know that we cannot exclude it for the future. What about nfs-mountable collections? Don't they touch file system on "scl enable"?

Another issue is mixing system scripts with collection interpreters. yum has /usr/bin/python shell bang. But have you checked all the other system scripts (rpm -q --whatrequires /usr/bin/env)? What will be the procedure for executing commands out of collection environment?

@rhcarvalho
Copy link
Contributor Author

There is a conceptual problem.

That's why we need the feedback from SCL specialists here :)

Last December I was brainstorming with @hhorak and it seemed at the time that if we had a way to set the proper environment variables in a Dockerfile we'd be heading into a "solution", or at least something better than the current state of affairs.

Of course maintaining the environment instruction manually would be a no go, but having a tool perhaps could possibly work?

The "scl enable" can do more things than just setting environment variables to same fixed value. The enable script is for each collection is an arbitrary shell script.

True thing. How does this affects us for the Docker images we ship and plan to ship?

I have no idea if there is an collections that does more

I have no idea either, we need SCL experts to validate this.

But have you checked all the other system scripts (rpm -q --whatrequires /usr/bin/env)?

No, I haven't. Help would be appreciated.

What will be the procedure for executing commands out of collection environment?

AFAIK, there is no such a thing as scl disable.

What is the procedure now? We're already in the situation that you exec into a container to a Bash shell and you have all collections enabled (via BASH_ENV, ENV and PROMPT_COMMAND hacks). There is no way out of that.

I would love to get help to improve the user experience of our images.

@mfojtik
Copy link
Contributor

mfojtik commented Apr 1, 2016

@ppisar what is the use-case where you need disable particular collection and get the system version of something inside a container? We can provide Docker images that does not have any SCL installed (just use the system version of the thing). So in my mind, if you need something like that, you will have to craft the image or layer up and existing image, unset all the envs and execute your command.
I think it would be even possible to do this inside container started from image with enabled collection by default (you just have to unset all vars).

@ppisar
Copy link

ppisar commented Apr 1, 2016

On Fri, Apr 01, 2016 at 04:24:49AM -0700, Rodolfo Carvalho wrote:

 The "scl enable" can do more things than just setting environment
 variables to same fixed value. The enable script is for each
 collection is an arbitrary shell script.

True thing. How does this affects us for the Docker images we ship and
plan to ship?

You can have a collection whose "enable" script does a computation. You will
not be able to rewrite it into a Dockerfile.

 But have you checked all the other system scripts (rpm -q
 --whatrequires /usr/bin/env)?

No, I haven't. Help would be appreciated.

For example, in Perl image, it reports:

python-libs-2.7.5-34.el7.x86_64
automake-1.13.4-3.el7.noarch

Now you need automake and python experts to evaluate if the two packages will
work with any future interpreters. Impossible to evaluate.

And it still does not cover executions from programs by execvp(3) or system(3).

 What will be the procedure for executing commands out of collection
 environment?

AFAIK, there is no such a thing as scl disable.

What is the procedure now? We're already in the situation that you exec
into a container to a Bash shell and you have all collections enabled
(via BASH_ENV, ENV and PROMPT_COMMAND hacks). There is no way out of
that.

If it were impossible, you would not file the pull request.

I'd like to quote Docker documentation:

The ENTRYPOINT gives a container its default nature or behavior

I think you are trying to bend Docker for something it goes against its
design.

@ppisar
Copy link

ppisar commented Apr 1, 2016

On Fri, Apr 01, 2016 at 04:58:31AM -0700, Michal Fojtik wrote:

@ppisar what is the use-case where you need disable particular
collection and get the system version of something inside a container?

Executing system tools that expects system versions? Don't you remember all
the hassle about How do I use yum from python3 image?

@rhcarvalho
Copy link
Contributor Author

I'd like to quote Docker documentation:

The ENTRYPOINT gives a container its default nature or behavior

I think you are trying to bend Docker for something it goes against its
design.

So if I use an official Docker image for Python (or any other stack), based on Ubuntu or Alpine, I can choose, I can docker exec -it ./manage.py shell to open an interactive shell in my Django application, but if I use a CentOS or RHEL image I need to enable collections before I do so?!

Do we want every user to do:

docker exec -it scl enable python34 ./manage.py shell

? Can't we do better than that?

The line gets much uglier if we'd be talking about the Ruby image with its many collections.

And what is the point of having a MySQL image that I cannot run mysql without going through hops to enable a collection?

I think that's terrible user experience that move users away from our images and we need to fix it.

@openshift-bot
Copy link
Collaborator

PR testing has started.

1 similar comment
@openshift-bot
Copy link
Collaborator

PR testing has started.

@torsava
Copy link
Member

torsava commented May 16, 2017

Do we want every user to do:

docker exec -it scl enable python34 ./manage.py shell
? Can't we do better than that?

Well, there's this alternative that's arguably a bit more user friendly:

docker exec -it container-entrypoint ./manage.py shell

However, I agree it would improve the user experience if we could get rid of this boiler-plate.

@rhcarvalho
Copy link
Contributor Author

I have no time to push this forward, closing.

@rhcarvalho rhcarvalho closed this Aug 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants