Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop caching COPY layers #1408

Merged
merged 1 commit into from
Oct 1, 2020

Conversation

isker
Copy link
Contributor

@isker isker commented Aug 31, 2020

Resolves #1357

Description

Cached COPY layers are expensive in that they both need to be retrieved over the network and occupy space in the layer cache.

They are unnecessary in that we already have all resources needed to execute the COPY locally, and doing so is a trivial file-system operation. This is in contrast to RUN layers, which can do arbitrary and unbounded work.

The end result is that cached COPY commands were more expensive when cached, not less. Remove them.

Submitter Checklist

  • Includes unit tests
  • Adds integration tests if needed.

Reviewer Notes

  • The code flow looks good.
  • Unit tests and or integration tests added.

Release Notes

  • kaniko no longer caches COPY commands in its layer cache as the associated network and storage costs outweigh the costs of executing them locally.

@googlebot googlebot added the cla: yes CLA signed by all commit authors label Aug 31, 2020
@isker isker force-pushed the stop-caching-copy branch 2 times, most recently from 061250f to 41db1cc Compare August 31, 2020 22:29
@isker
Copy link
Contributor Author

isker commented Aug 31, 2020

Sorry, couldn't get all of the tests to run on my macbook so I decided to "test in CI". Clearly not working. The tests run as expected on a Linux box, will fix them there.

@isker isker force-pushed the stop-caching-copy branch from 41db1cc to e25cf1d Compare September 1, 2020 00:23
Cached COPY layers are expensive in that they both need to be retrieved
over the network and occupy space in the layer cache.

They are unnecessary in that we already have all resources needed to
execute the COPY locally, and doing so is a trivial file-system
operation.  This is in contrast to RUN layers, which can do
arbitrary and unbounded work.

The end result is that cached COPY commands were more expensive when
cached, not less.  Remove them.

Resolves GoogleContainerTools#1357
@isker isker force-pushed the stop-caching-copy branch from e25cf1d to 71f1d34 Compare September 1, 2020 00:30
@isker
Copy link
Contributor Author

isker commented Sep 1, 2020

Unit tests are good to go now, I removed some obsolete test cases and added a new one.

I think that integration test that failed is flakey? It was not failing previously.

@tejal29
Copy link
Contributor

tejal29 commented Sep 8, 2020

Unit tests are good to go now, I removed some obsolete test cases and added a new one.

I think that integration test that failed is flakey? It was not failing previously.

Thanks @isker i restarted the failed integration test. I opened a issue to look into flaky test #1416

@tejal29
Copy link
Contributor

tejal29 commented Sep 8, 2020

@isker If a file in the copy command changes, then the Run command will be invalidated since the composite key for the previous COPY layer will be different.
e.g. in the below dockefile,

FROM ...

COPY package.json package-lock.json ./
RUN npm ci 

The cache key for RUN npm ci depends on contents of file copied COPY package.json package-lock.json ./
If contents change the cache will change and hence Run npm ci will be executed locally.

Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code change looks good.
Pending Manual Verification

@isker
Copy link
Contributor Author

isker commented Sep 22, 2020

Hi @tejal29, anything else you need me to do here?

@tejal29
Copy link
Contributor

tejal29 commented Oct 1, 2020

Verified this is working on my build.

@tejal29
Copy link
Contributor

tejal29 commented Oct 1, 2020

Sorry for the delay @isker
This will be part of v1.2.0 release of kaniko.

@tejal29 tejal29 merged commit 1240333 into GoogleContainerTools:master Oct 1, 2020
@isker
Copy link
Contributor Author

isker commented Oct 1, 2020 via email

@tejal29
Copy link
Contributor

tejal29 commented Oct 6, 2020

@isker did you end up testing v1.2.0?

@isker
Copy link
Contributor Author

isker commented Oct 6, 2020

@tejal29 have not had time yet to do so but I will let you know the results when I do.

@tejal29
Copy link
Contributor

tejal29 commented Oct 6, 2020

sounds good. no one has reported issues yet!

@isker
Copy link
Contributor Author

isker commented Oct 8, 2020

@tejal29 with Kaniko 1.2.0, we're seeing the improvements we expect. Before, it took the same amount of time to build our image with caching on as with it off, because we were stuck downloading and uploading large COPY layers, with the added cost of maintaining the layer cache with caching on.

Afterward, we see a 30-40% improvement in build time with caching on, and of course our layer cache is smaller.

Thanks for your support on getting this merged and released!

@tejal29
Copy link
Contributor

tejal29 commented Oct 8, 2020

Thanks @isker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes CLA signed by all commit authors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why are COPY layers cacheable?
3 participants