-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposals/pipeline caching #113
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd would be interesting to be able to label caches. For example if in the same repository I have a front end application (JS) and and back end app (dotnet) maybe I'll want to use a single build definition and I want to be able to label those caches with unique names for the build to avoid restoring the wrong cache at the wrong place.
I can't wait to see Maven caching for Java builds! Great news! |
While this strategy is certainly something, it involves a lot of manual work (determine what to cache, cache it, restore cache). Why won't you do it the same as other CI providers do? Create a permanent disk per pipeline or something, and mount this disk to the hosted Agent on every build. This disk could cache packages, could host source code (source code retrieval sometimes takes a while, too), and whatever else you need. |
You can hash your cache I think thats a clean way of using it multiple times.. Once you hash your package.json for node_modules and your csproj files for nuget. I like this approach. It's a clean and simple way to include caching if needed.
And also caching time of 7 days is pretty long. It could be 48 hours. If you don't build your code often (multiple times a day), you even could wait 2,or 3 minutes more to get your build done. |
@eps1lon I use for now |
I seeing some weird errors in my cache using esy. The cache is pretty bug (almost 4 GB), this in turn seems to lead to build errors down the line. |
@eps1lon we get best gains installing cypress inside node modules and then caching all node modules. Can’t comment on attributes as we’ve been using windows fine. (Note 2: to get node modules fast we also 7zip it before caching) |
@lukeapage This is for an experimental project where I switched to yarn v2. There are no more node_modules/ and I'm sitting at 6s install times before Cypress. Do you include the Cypress binary in your node_modules/ or does that still need to be rebuilt on every run? |
I gave this a quick read-through, but what kinds of key lookup behaviors will be supported? Could each cache key be namespaced, where retrieval is prefix-matched. I know some other CI services only support write-only caches, so one must append a epoc counter to the end of the key/namespace to guarantee a new cache is written if desired, even when the derived key prefix is deterministically the same. https://circleci.com/docs/2.0/caching/#restoring-cache I think one reason for write-only caches is that: if the key is the exact same as an existing cache, then the CI worker knows it can skip store_cache step, saving time that would have been spent uploading or even compressing/hashing the cache blob. I'm not sure how viable this would be, but it might be cool if the caching was file-diff arware, so that it sort of rsync'ed only the files that differ from the closest cache, thus only those changed files would need to be compressed and transported over the network. The store and restore commands would use the key prefix lookup to find the closest cache digest-manifest, and use that to figure what needs to be pushed or pulled to the append cache layer. The CI backend could flatten the trailing layers to remove expired versions of the cache in a sliding window fashion. |
@ruffsl You are exactly right on all parts! 🥇
|
How to install cypress inside node_modules @lukeapage ? |
@marceloavf @eps1lon We also use CYPRESS_INSTALL_BINARY: 0 |
Tried it @lukeapage but I got these errors:
And I added these:
|
Can someone post a working example with cypress and yarn. When I use the task, may yarn task still takes 4 minutes after cache is downloaded and if I use cache-hit-var my yarn task which runs e2e can't be executed because yarn can't be found.
|
I do use this variable on top of the build
|
tried again to use CacheBeta@0 instead of RestoreAndSaveCache@1: result cache size is 498MB (vs 244MB) and more time to download it (+20sec) ... revert |
How can i make use of this feature in a Release pipeline (not yml based Build Pipeline)? |
@cforce You can't. At least for now this is only for build pipelines. |
Nice task! However the namespacing per branch by default is a bit too strict. It'd be nice to provide a custom branch "key" so that feature branches can re-use the cache. See f.ex. microsoft/azure-pipelines-tasks#11314. |
@marceloavf TARing support is in-progress so our task should behave more like RestoreAndSaveCache in that respect. @cforce, @wichert is correct that this only works in build pipelines at the moment. What's the scenario you're thinking about? @marcussonestedtbarium I responded in the issue you linked to 👍 |
Running a test-suite against a deployed after wards in the Release pipeline, where test-suite is build based on source code current state, using a lot maven artifacts (unchanged) same version in a series of releases, needs maven lib release artifacts caching to prevent unnecessary downloading again. O i need a "Maven Caching" feature usable in the Release Pipelines, separate or as part of a maven build task param. |
Can you add caching for the docker build image? |
Hi @underwaterhp93 - Please follow microsoft/azure-pipelines-tasks#11034 (comment) |
Hi, With the agent 2.160.0 release, we are rolling out TARing by default. The long time ask of preserving the symbolic links and the file attributes are being addressed. If you want to specifically not use TARing, then set the AZP_CACHING_CONTENT_FORMAT to Files. We'll be documenting this new environment variable. Thanks |
Hi, is there any way to have the Cache task cache files locally on a self-hosted agent? Our caches are very large (multiple Gb of game data) and our main pain point is waiting for these caches to download and upload. We run a self-hosted agent, so it seems like we should be able to have the agent save the cache locally instead. |
@Kleptine do you reset your self-hosted agent each time? If not then why not just implement your own caching, copy it to a temp folder, check a timestamp, and copy (or better, hardlink/symlink) the data back? |
We'd like to take advantage of some of the busting/tagging options of the CacheTask. A more recent cache from a different branch isn't compatible with other branches. Is the Cache Task open source, or is there code we could work from to implement our own? |
Super thin wrapper task is here: https://github.com/microsoft/azure-pipelines-tasks/tree/master/Tasks/CacheV2 It uses features built into the agent here: https://github.com/microsoft/azure-pipelines-agent/tree/master/src/Agent.Plugins/PipelineCache |
Thanks for the reference. Is there a clean way to modify the agent source and re-deploy it to our self-hosted agent? (ie. is there a build script somewhere that generates the installation tar?) For what it's worth, we would rather not have to modify the agent source to enable this. :) I saw that there is a line for "Local Caching Saved". For our runs it is always 0, is there something we haven't enabled? |
The caching system was really designed to be used with the caching infrastructure we have in the cloud. It does things like file- and block-level deduping, makes immutability guarantees, and other things that we haven't replicated with any kind of local infrastructure. Developer Community is the best place to file a feature request for agent-local caching. You are free to grab the agent source and do what you want, but you won't be in a supported state. For instance, when new features require an updated agent, we send down an agent-update request. This will (in the best case) clobber your custom agent or in the worst case, knock it offline. |
Thanks for the info. I'll add a feature request. |
Dear @mitchdenny @ What was the feature request for agent-local caching? |
This is our draft proposal for Pipeline Caching. The contents have been adapted from our internal planning docs but this will be the spec for the feature moving forward. Comments and suggestions welcome. You can see from the commit history that we started with a somewhat complex model and worked to simplify it so that it was easy to adopt. Features specified in the more complex model may be added later.