-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registry pruning improvements #3333
Comments
QE has uncovered at least 1 issue with image pruning: if you delete an image stream, we lose the path from stream -> image -> layer, so the code currently doesn't prune a layer in this case (https://bugzilla.redhat.com/show_bug.cgi?id=1235148). In the short term, we could modify the pruning code to fix this bug to at least delete the blobs (layer links would still exist, however). But there are still some gaps. For example, you can delete an image stream and then recreate it, and the registry's data about the deleted image stream (manifest signatures and layer links) are still there. Ultimately, what I would like to do is this:
@smarterclayton @deads2k @jwhonce @brenton @sdodson what do you think? |
On Jun 30, 2015, at 10:50 AM, Andy Goldstein [email protected] QE has uncovered at least 1 issue with image pruning: if you delete an In the short term, we could modify the pruning code to fix this bug to at Ultimately, what I would like to do is this:
@smarterclayton https://github.com/smarterclayton @deads2k As a counter argument - why not just periodically read all layers not — |
In the registry's code? I'd like to avoid walking the registry's storage if possible, since it may not be an efficient or cheap operation for storage backends like S3. I am working on a patch to at least be able to delete the blobs, relieving the storage burden. But it won't handle deleting the layer links. |
On Jun 30, 2015, at 11:27 AM, Andy Goldstein [email protected] As a counter argument - why not just periodically read all layers not In the registry's code? I'd like to avoid walking the registry's storage if I am working on a patch to at least be able to delete the blobs, relieving But don't we eventually have to have this check? What happens when a — |
Do we care if we miss a few? I could sort of see wanting to prune dead links, but without knowing how good the backing storage is, using it as a primary mechanism seems to open us up to severe performance issues. Also, lack of quoting makes it really hard to follow what you're saying. |
@smarterclayton If you must reply without quoting, can you top-post the comment to make it a little easier? |
It would be a good thing to have some code in the registry that deletes unreferenced data. The problem is that all the references to layers are stored in etcd in /images. The registry would need to get the list of all images (which contains the referenced layers), then walk each blob and/or repo layer link and determine if it's referenced or not. But I do also think that the proposed refactoring gives us a better implementation overall. |
I guess we could do this periodically (all code running in the registry):
Walking the blobs will be slightly more difficult, as it contains signatures and layers, without any indication as to file type. I guess we'd need to add all the manifests' signatures from all repos to the graph, and then at the end we'd be able to determine if there are any unreferenced blobs lying around that we can purge. I am concerned about how the performance would be, as it's directly related to the number of files (blobs, repos, signatures, layer links) in storage. |
And if we do ^^^, then we wouldn't need to store any additional information in etcd about deleted streams, images, layers, I wouldn't think. |
On Jun 30, 2015, at 11:38 AM, Andy Goldstein [email protected] It would be a good thing to have some code in the registry that deletes That could easily be a part of the prune command - I don't see that is a But I do also think that the proposed refactoring gives us a better — I think my point is that it is more important that we have a solid way to Why is unreferenced data in real storage not a problem? Are we positive |
I believe you can generate 1 during the prune, and the if necessary supply Performance can be solved. Ultimately this is a 2pc problem and you are On Jun 30, 2015, at 11:50 AM, Andy Goldstein [email protected] I guess we could do this periodically (all code running in the registry):
Walking the blobs will be slightly more difficult, as it contains I am concerned about how the performance would be, as it's directly related — |
Not easily. The prune command could get a list of all images in etcd, but there's no API in the registry for a client to drive walking blobs and layer links to determine if they're unreferenced. I really think this has to be initiated by the registry.
If you want to catch unreferenced images/repos/links, then I really think we need to write code in the registry to do this. |
Generate 1 what? Graph? |
Item 1 - the list of all referenced images and the layers they reference. On Jun 30, 2015, at 11:59 AM, Andy Goldstein [email protected] I believe you can generate 1 during the prune, and the if necessary supply Generate 1 what? Graph? — |
Ok... but right now I guess we could create another new customized route in the registry that takes graph data as input and uses it to determine what's unreferenced. I'd really prefer not to do that, however. Or, run the registry's internal pruning code less frequently - say once or twice a day instead of every hour. |
cc @liggitt too |
I'd like to come up with a list of action items and all agree on the path forward. To recap, here are the moving parts: OpenShift owns:
Registry owns:
A user runs If an image stream is deleted and prune is then called, we have no way of knowing that the deleted stream referenced images and/or layers that need to be pruned. #3521 is 1 way to partially address this, but we probably still want a full resync to take place in the registry. Here's my thinking about responsibilities for
registry
The registry removal task can run anywhere that has access to the registry's storage. This could be code in the registry itself or a sidecar container perhaps. To determine what is orphaned, the registry will need a graph just like What do you all think about this separation of concerns? |
I like the idea. Given that we run our plugins in the registry, could we simply build the reverse links to the blobs and/or the type information about the blobs in a separate location? That would alleviate performance concerns. |
I'd have to look into that. I'm not sure that it would be easy to do without some breaking changes to the upstream registry code, which we'd have to carry and cherry pick each time we rebase. Ah... doesn't look like upstream is writing the media types to disk yet - that's what we'd need (https://github.com/docker/distribution/blob/master/registry/storage/blobstore.go#L81). |
marked as p1 after convo with @ncdc - we are carrying upstream hacks to docker/distribution that makes it difficult to bump the godep |
The result is a "successfully" uploaded image without blobs stored in the registry. I can't think of any viable way of running stand-alone GC in registry without writes being disabled. |
@miminar as I've mentioned previously, we must continue to use the graph we build from the Kubernetes and OpenShift resources (pods, RCs, DCs, BCs, builds, ISes, etc) to have an accurate picture of what images and layers are or are not in use. While there is a race possible, it would look something like this:
At this point, assuming nothing referenced the image known as 172.x.x.x:5000/project/stream@revision1, it would be a candidate for pruning. Similarly, layer123 would be a candidate for pruning if no in use images reference it. The assumption here is that revision1 is an old image (e.g. > 30, 60, 90 days, whatever the prune setting is). And given that it's an old image, it's not in use, and nothing else references layer123, the only way we'd have a race is if someone was pushing a new image that happened to reference layer123 and the same time we were in the process of pruning it. It's certainly possible, and we probably should address it, eventually. @smarterclayton @pweil- your thoughts? |
It seems like a fairly unlikely occurrence that has a workaround (re-push the image). I imagine that (for now) pruning would be a regularly scheduled administrative task that could largely avoid this by scheduling it during low-use times. Obviously that doesn't work for every scenario. |
@ncdc What you say is true in current scenario where OpenShift instructs registry which repositories, manifests and layers can be removed. If a severity of this is is indeed low, I have no other objection. I'd just like to ensure we're on the same board and aware of the consequences. |
We should make a note of the behavior in the pruning definition and call it On Wed, Oct 7, 2015 at 10:14 AM, Michal Minar [email protected]
|
@miminar brings up a good point about my suggestion from above about how the registry prunes. I'll quote it here for convenience:
If, as I proposed above, running @smarterclayton @pweil- @miminar how does this sound? |
@ncdc That's a good idea. The same could be achieved without the need to query OpenShift. |
@miminar the manifest is not stored in the registry. It's stored in etcd in /images. |
@ncdc I thought manifest stored in registry and OpenShift's one are the same. I can load manifest from storage from inside of registry just fine. Is it of no use to us? |
@miminar they are, but we use a custom middleware in the registry to take over handling tags, manifests, etc: https://github.com/openshift/origin/blob/master/pkg/dockerregistry/server/repositorymiddleware.go#L101-L139. It's set up here: https://github.com/openshift/origin/blob/master/images/dockerregistry/config.yml#L16. Manifest data is only stored in OpenShift's etcd. When you load a manifest from inside the registry, you're actually getting it from OpenShift. |
Aha! I didn't know that. Thanks for pointing this out. |
The main issue now I'm facing is how to mark images as deleted without removing them physically during |
@miminar we should lay out the proposed design, with all the moving parts, before we do any coding |
👍 |
Sounds good. We also need to queue up the work to automate pruning. On Oct 7, 2015, at 11:29 AM, Andy Goldstein [email protected] @miminar https://github.com/miminar brings up a good point about my registry
The registry removal task can run anywhere that has access to the If, as I proposed above, running oadm prune images results in images being @smarterclayton https://github.com/smarterclayton @pweil- — |
Let me me give it a shot: Revised design for image and imagestream disposalDelete of imagestream marks it for deletion. New controller watching for streams being deleted will kick in. The controller will mark all its images for deletion and store its name to By the Delete is meant a deletion from command line like Delete of an image causes it to be just marked for deletion. Registry pruningIs a go-routine in
Image typeWill have a 2-phase deletion. It will get a new Marking for deletionMeans to set ImageStreamDeletionIs a new type stored in isTrash inaccessible from a cli as a resource. It has no extra attributes. Its ImageStreamAgain 2-phase deletion. Type will be extended for ImageStream controllerIs a new type that takes care of image stream's termination. Once the termination is complete, it finalizes the stream and removes it. Finalized image has empty |
You have to retrieve the complete list of images - both preserved and marked for deletion - to be able to determine which layers can be pruned. A layer can be pruned if and only if the only predecessors it has are images marked for deletion. If a layer has a predecessor that is a preserved image, you can't delete that layer. |
This needs to be a function that's easy to invoke from both an internally-scheduled timer in the registry process and from a separate executable. We want to have the flexibility to run this automatically inside the registry container or in a separate sidecar container in the registry pod. |
Based on conversations with Stephen Day from Docker, he expects us to use a repository-scoped I think calling "delete manifest" for a given repo/manifest is what should end up deleting the signature links. We'll need to be careful here, since I believe that call will initially route to our middleware. So we'll just need to have our middleware delegate to the |
Also please review #3521 as it relates to deleting and recreating image streams with the same name. I don't think we'll be able to include the repo uid in the repo storage path given that upstream doesn't want to expose the path mapper, but hopefully we can figure something out... |
@ncdc Design updated.
Yes, that's what I'm using now. |
Adding #7564 for cross reference. |
/close pretty sure hard-prune implemented some of the suggestions and here and prune itself looks a lot like the other discussions i read, with the exception of the proposed "mark+sweep" technique, though i think even if that is basically what oc adm prune does (it doesn't update the etcd image objects to mark them as being deleted, but it builds the full graph in cache and then goes through and deletes everything it should, including registry storage content) |
When pruning images and layers, the
oadm prune images
command:Instead of sending multiple http requests to the registry, it would be better to store information about which layers and signatures the registry needs to prune. The registry could periodically query OpenShift for that data and act on it.
This has multiple benefits:
HA note - if there are multiple replicas of the registry pod, it is probably a good idea to have the registry's pruning task scheduled a bit randomly. For example, replica 1 might run the job at 13 minutes after the hour, while replica 2 might run it at 27 minutes after the hour.
The text was updated successfully, but these errors were encountered: