Skip to content
This repository has been archived by the owner on Apr 12, 2022. It is now read-only.

Feature request: Could the docker image allow more customization so that we don’t have to keep making custom images based on this one Elasticsearch #189

Closed
yokhahn opened this issue Aug 20, 2018 · 12 comments

Comments

@yokhahn
Copy link

yokhahn commented Aug 20, 2018

As I mentioned over in on the discuss site (https://discuss.elastic.co/t/feature-request-could-the-docker-image-allow-more-customization-so-that-we-dont-have-to-keep-making-custom-images-based-on-this-one/145210?u=yokhahn):

I'm trying to use the official docker image, via a helm chart. One of the problems that I keep running into is that the architecture of the current image requires customizing the docker image to make many configurations work (inheriting from the official image). I believe small changes to the image would prevent the need for this extra work.

For example, instead of requiring an overriding of the entry point script and then daisy chaining onto it in order to install the s3 repository plugin. Why not allow the original entrypoint script to check for the existence of a script at a certain path (e.g. /custom/my_custom_start)? That way customization could be done with a volume mount instead of re-creating the image.

I think this would be something like 4 lines.. (maybe I'll make a patch and a pull request.)
if [ -f /custom/user_init.sh ]
then
. /custom/user_init.sh
fi

Then I could make a file and have it mounted into that path with contents like:
/usr/share/elasticsearch/bin/elasticsearch-plugin install -b file:///path/to/plugin.zip
bin/elasticsearch-keystore add s3.client.default.access_key foo
bin/elasticsearch-keystore add s3.client.default.secret_key bar

This would let me setup secrets in a way that wouldn't be included in any images.
(I also saw this was a concern in another conversation:
https://discuss.elastic.co/t/failed-to-create-s3-repository-with-es-6-0/108224/10)

Thanks for any consideration.

@ghost
Copy link

ghost commented Aug 23, 2018

This is a cool idea. We generally discourage doing heavy run-time mutation of the container, but we also realise that some people really want to do it. I'm personally very reluctant to provide explicit support for certain run-time mutations, like plugin installs, for example, because it suggests that we think it's a good idea. (We really, really don't!). #26 #164

Your approach is nice from my perspective because it enables people to do what they prefer, but does so without "suggesting" risky approaches to people who haven't considered the dangers.

@yokhahn
Copy link
Author

yokhahn commented Aug 23, 2018

I think the idea that the container is one immutable thing is an admirable goal.

I do think a reason that people want to mutate the container is that you are a trusted base for getting a vetted version of the software. There is a danger that if there isn't an extension method provided, third parties will simply offer a bunch of competing mutations that address the needs of end-users, so they will use them. Then, when security updates come around the end-users will be not dependent on you, rather on the providers of these "mutated" containers, with little to no consistency on how those third-party containers will be upgraded. I think that's bad for the whole ecosystem.

Given the current plugin architecture in ElasticSearch, I don’t see many great options.
Even in an end-user self rolled setup, how do you reason about the plugin that is pulled by the elasticsearch-plugin utility? You just pull the latest at the time and hope it works, and then keep a local cache? Should "bin/elasticsearch-plugin install" require an explicit version, or at least take one as a parameter?

I believe in the principle that you should never pull a container from a "latest" tag. My understanding is to try to always use explicit versions and then test new versions with the setup before going to production. Likewise, plugins should not be just pulled from "latest" (and especially not from the network on instantiation), but I don’t currently see a good way to do this easily other than this sort of user shim. I think to be most beneficial to end-users, this container should offer an example of doing this the “least-worst” way.

Where I am, I wouldn't do a plugin straight from the network on instantiation of the container, exactly because of the testing needs that you are talking about. While I'm thinking about doing a plugin with the container, I would be testing a persistent volume(PV)+container before going to production. I think requiring a PV mount lends itself to this kind of thought, and doesn't require the creation of a new container image.

I'm working with a helm chart in Kubernetes. My thoughts for the helm chart would be to specify the url for the plugin related to the docker image that I would pull, that way they'd be connected in the chart. Logic would be something like on container startup:

(using an Init container)
if the PV is empty or the version of the plugin doesn't match the version specified in the chart:
 go get the plugin from the specific URL.for the version desired
 If this fails
  fail the initialization of the container
 else
  use the copy in the PV
fi 

(in the user wrapper mentioned above)
install plugin from path in PV
install keystores from kubernetes mounted secret

Are we on the same page? Have I misunderstood are you mis-trusting the plugins in general, or am I completely misunderstanding your position expressed in the links you provided?

Anyway, I appreciate your consideration for the idea/PR. Thanks

@xeraa
Copy link
Contributor

xeraa commented Aug 24, 2018

Even in an end-user self rolled setup, how do you reason about the plugin that is pulled by the elasticsearch-plugin utility? You just pull the latest at the time and hope it works, and then keep a local cache? Should "bin/elasticsearch-plugin install" require an explicit version, or at least take one as a parameter?

I'm a bit confused by your description of Elasticsearch plugins. The current implementation requires a plugin to be built for a specific Elasticsearch version (down to the patch level) — see elasticsearch.version in the plugin docs.

Likewise, plugins should not be just pulled from "latest" (and especially not from the network on instantiation), but I don’t currently see a good way to do this easily other than this sort of user shim.

For our own plugins it appears to be latest, but actually you are getting the specific release for your Elasticsearch version (for example when running bin/elasticsearch-plugin install analysis-icu it knows what your exact version is and thus which version of the plugin to fetch). If you don't want to install via the network or want to add a thirdparty plugin, you need to download the right version and then install it from a local path.

I generally agree with @Jarpy around the immutability of containers. Reasoning is much easier if you know what's in your image; otherwise it's a bit like opening Pandora's box, which is much harder to predict and control.

@yokhahn
Copy link
Author

yokhahn commented Aug 24, 2018

@xeraa Indeed, I think your explanation of the plugin structure shows I had a misunderstanding.
That said, with community contributed plugins do we have a guarantee the version will always match?
Even the documentation here appears to have a reference to an oudated Openstack Swift plugin.

But still in general, you are right, the version of the plugin downloader shipped with a particular version of elasticsearch refers to the matching plugin. Your container exists with components missing that we call plugins, but the plugins are part of the expected ElasticSearch artifact. How do you want users to get access to the plugins in a container?

The problem we're left with is that the equivalent of a dynamic linking step done over the network. I think we all agree that trying to dynamically put together an artifact at the instantiation of the container is a bad idea (what happens if there are network issues?). If not by putting them in a mounted volume, before the actual container is started, what do you want users who want to use elasticsearch in a container with plugins to do? Will a container need to be made for each possible combination of desired plugins?

We don't want to put all the plugins in the container image, that would make it too large. I argue that the link step can be done separately--the plugin can be downloaded into a PV, and then only when that succeeds the execution step can happen. It seems the safest way to do it, without making containers with all the permutations of possible plugin combinations included in the image.

@xeraa
Copy link
Contributor

xeraa commented Aug 24, 2018

That said, with community contributed plugins do we have a guarantee the version will always match?
Even the documentation here appears to have a reference to an oudated Openstack Swift plugin.

That's the downside of the approach: You won't be able to install a plugin if the version doesn't exactly match. It's more work for the maintainers, but it guarantees compatibility. We are also working on a better API to handle this in a more elegant fashion, but that's another story.

I think we all agree that trying to dynamically put together an artifact at the instantiation of the container is a bad idea (what happens if there are network issues?). If not by putting them in a mounted volume, before the actual container is started, what do you want users who want to use elasticsearch in a container with plugins to do? Will a container need to be made for each possible combination of desired plugins?

Our current assumption is that the majority of installations will use one of the two images that we are providing. If you need additional plugins, you will need to create a custom image.

I don't really see the problem around that. Even for your application images you want to build an image for every version, so that you can test it and then roll it out to your different environments. Only if you do that you can be certain that what you have been testing is also what you are shipping to production. And while application images might change very frequently, your datastores generally don't. We will normally do one or two releases a month including patches and depending on your base image and its attack surface you might have more. But with stateful services you want to verify that they are working with your data and application, so building a custom image sounds like the smallest task here (unless you use the YOLO deploy approach).

@yokhahn
Copy link
Author

yokhahn commented Aug 24, 2018

Our current assumption is that the majority of installations will use one of the two images that we are providing. If you need additional plugins, you will need to create a custom image.

I don't really see the problem around that. Even for your application images you want to build an image for every version, so that you can test it and then roll it out to your different environments. Only if you do that you can be certain that what you have been testing is also what you are shipping to production. And while application images might change very frequently, your datastores generally don't. We will normally do one or two releases a month including patches and depending on your base image and its attack surface you might have more. But with stateful services you want to verify that they are working with your data and application, so building a custom image sounds like the smallest task here (unless you use the YOLO deploy approach).

All this approach does is change the dependencies. It trades plugins in a PV for plugins in an added layer in a custom docker image. The PV requires less resources to make. Less moving parts--less things to go wrong.

In order to put a custom container in a K8s cluster for instance I'll need:

  1. a repository to store the custom docker image.
  2. a build process to make said image.
  3. manifests to deploy the image.

In order to PV with the plugin and secrets, I'll need:

  1. manifests to build the PV
  2. manifests to deploy the image.

Both solutions should be deployed in a test cluster before put in a production cluster. The YOLO (go with no testing) approach is a social problem. You can't use a technological solution to solve a social problem.

Because the PV only requires manifests, those manifests could be community supported in a helm chart (with all the values that a community supported item has). That can never happen with the requirement of a custom docker image (unless there become tons of publicly available plugin based knock off image , with all the security problems that could entail).

I think you're trying to take a dynamically linked artifact and shim it into a statically compiled result, with all the reduction of functionality that entails. If the immutible object is so desired, why stop at the container, should this not be done with the executable itself? Why bother with plugins, why not just have them added in at compile time and just require a recompile for each person that needs a plugin?


Stepping beyond the question of PV for the plugin, I will also still need to have a shim in the side to allow a secret to be mounted in order to get it added to the key-store at run time. Do you also want custom images made every time somebody needs to put a secret in a keystore for a containerized elasticsearch install?

P.S. How will the plugin architecture be changing that will help this problem?
Also, is there an automated way to get the address the plugin puller is going to use, since I will want to pull down a matching plugin into a pv, even if I make a custom image?

@xeraa
Copy link
Contributor

xeraa commented Aug 26, 2018

We discussed this internally and this is not something we want to support at the moment. We need to be a bit more conservative here since our official images are also covered under commercial support.

Having said that thanks a lot for your input! We really appreciate ideas from our community even though we cannot always implement them. Please share your issues or PRs in the future as well :)

P.S. How will the plugin architecture be changing that will help this problem?
Also, is there an automated way to get the address the plugin puller is going to use, since I will want to pull down a matching plugin into a pv, even if I make a custom image?

All you need to do is build a custom image FROM our images and add another layer with RUN to add a plugin. This is the only customization you need here and the one we support.

@xeraa xeraa closed this as completed Aug 26, 2018
@yokhahn
Copy link
Author

yokhahn commented Aug 27, 2018

@xeraa Ignoring the question of the plugin install, your method still doesn't solve the problem with the keystore.

If I have my own image that uses the s3 plugin, for example , I will still want to be able to add the keystore at the start of the container from a secrets volume in Kubernetes.
With this shim, you would be able to run elasticsearch-keystore in the /custom/user_init.sh.

Your method requires me to put my secrets in my container. Are you recommending that your customers put secrets in their containers?

@xeraa
Copy link
Contributor

xeraa commented Aug 27, 2018

You only need to mount the keystore at /usr/share/elasticsearch/config/elasticsearch.keystore — no need for a shell script or to put it into the image. Most orchestration systems explicitly support secret files, so this should just work like anywhere else. For Kubernetes I would look at their example for SSH keys and for Docker Compose it's something like this (not a complete config):

 elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:${VERSION}
    secrets:
      - source: elasticsearch.keystore
        target: /usr/share/elasticsearch/config/elasticsearch.keystore

@yokhahn
Copy link
Author

yokhahn commented Aug 27, 2018

@xeraa
I'd still have to use binaries in the container to setup the keystore, I believe, but a fair enough answer.
Thanks.

@yokhahn
Copy link
Author

yokhahn commented Aug 28, 2018

@xeraa Last thing about this, given your advice to make a secret of the keystore file. I'll go create a separate issue for this if you think it makes sense, but my question is should the elasticsearch-keystore cli have an option to operate on a given path other than the default (or perhaps stdout)? Since I'll be running it to generate that secret and then having to cherry pick it out of a fixed path?

(same thing for the elasticsearch-plugin command.. I'd like to have it just be able to download the file to a path instead of having to cherry pick it out, or look up the path of the zip filein the documentation. I like to keep offline copies of all the components I use)

@xeraa
Copy link
Contributor

xeraa commented Aug 30, 2018

Last thing about this, given your advice to make a secret of the keystore file. I'll go create a separate issue for this if you think it makes sense, but my question is should the elasticsearch-keystore cli have an option to operate on a given path other than the default (or perhaps stdout)? Since I'll be running it to generate that secret and then having to cherry pick it out of a fixed path?

Generally this should go to https://discuss.elastic.co instead of a GitHub issue, but for the sake of keeping everything in a single place I'll try to answer here:

  1. To create the keystore you could just use the tar.gz file of Elasticsearch and generate the store with that. This will put the keystore file in the config/ folder.
  2. You could just use the container image for this as well. The following example will mount the config directory in /Users/philipp/Desktop/test/config/ and this is where you can also find your keystore file after creating it. You can then copy that file wherever you need it and mount it under /usr/share/elasticsearch/config/elasticsearch.keystore in all your Elasticsearch containers.
≻ docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -v /Users/philipp/Desktop/test/config/:/usr/share/elasticsearch/config/ -it docker.elastic.co/elasticsearch/elasticsearch:6.4.0 /bin/bash
[root@1006ed50b646 elasticsearch]# ./bin/elasticsearch-keystore create
Created elasticsearch keystore in /usr/share/elasticsearch/config
[root@1006ed50b646 elasticsearch]# ./bin/elasticsearch-keystore add test
Enter value for test: 
[root@1006ed50b646 elasticsearch]# exit
exit
≻ cat config/elasticsearch.keystore
??lelasticsearch.keystore?@g?o!?$?K?Lf?w?VAEŠԨm?[?a6?B??? y?,!В}??Ħ?ǣ?AU=?C?:?o?
?W?O8?}U?;p?ӷ???cQ????7?JY?     2A?:???ZUY??2V?9?ϧ??(??0?q\

(same thing for the elasticsearch-plugin command.. I'd like to have it just be able to download the file to a path instead of having to cherry pick it out, or look up the path of the zip filein the documentation. I like to keep offline copies of all the components I use)

You can download all the plugins directly. Our official ones from https://artifacts.elastic.co/downloads/elasticsearch-plugins/ (for example https://artifacts.elastic.co/downloads/elasticsearch-plugins/ingest-geoip/ingest-geoip-6.4.0.zip) and thirdparty ones normally on GitHub.

I'm not sure you are gaining much with offline plugins. Since 6.4.0 we are checking the signatures during the installation process (elastic/elasticsearch#30800), but only for online installations. For offline installations you will need to check the signatures yourself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants