Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fallback keys (multiple restore keys) in Pipeline Caching #10842

Closed
johnterickson opened this issue Jul 5, 2019 · 14 comments
Closed

Enable fallback keys (multiple restore keys) in Pipeline Caching #10842

johnterickson opened this issue Jul 5, 2019 · 14 comments
Assignees

Comments

@johnterickson
Copy link
Contributor

Right now, a Pipeline Caching entry lookup can either be an exact hit or otherwise it is a miss. This can help with many scenarios, but is less effective for others.

Here's my thinking on how we expand from that:

On saving a cache entry, it will continue to be stored at a specific key with an immutable value, however, one will be able to query it with multiple exact keys or key prefixes. Here's an example:

variables:
  BUILD_KIT_CACHE: '$(Pipeline.Workspace)/buildkitcache'
steps:
  cache:
    path: $(System.DefaultWorkingDirectory)/buildkit
    key:
      - $(Build.SourceVersion)
        $(Build.SourceBranch)
        $(System.DefinitionId)
    fallbackKeys:
      - $(Build.SourceBranch)
        $(System.DefinitionId)
      - $(System.DefinitionId)

For saving, the cache entry will be stored at this "path":

$(System.DefinitionId)/$(Build.SourceBranch)/$(Build.SourceVersion)

Lookups will occur in the following order until an entry is found. Should a particular lookup return multiple entries, the one that was most recently created will be selected.

First lookup:  $(System.DefinitionId)/$(Build.SourceBranch)/$(Build.SourceVersion)
Second lookup: $(System.DefinitionId)/$(Build.SourceBranch)/**
Third lookup:  $(System.DefinitionId)/**
@johnterickson
Copy link
Contributor Author

We've iterated on how this will look in the YAML and there are a few changes.

  • Keys are a single-line instead of multiline. Each key segment is separated by a pipe character instead of a newline
  • keys are either a string or a pathy minimatch

An example:

steps:
- cache:
  key: $(Agent.OS) | Gemfile.lock | **/*.gemspec,!./junk/**
  path: vendor/bundle
  restoreKeys:
  - $(Agent.OS) | Gemfile.lock
  - $(Agent.OS)

@willsmythe
Copy link
Contributor

The equivalent "task" form of this would be ...

steps:
- task: Cache@0
  inputs:
    key: $(Agent.OS) | Gemfile.lock | **/*.gemspec,!./junk/**
    path: vendor/bundle
    restoreKeys: |
      $(Agent.OS) | Gemfile.lock
      $(Agent.OS)

@willsmythe
Copy link
Contributor

Related to fallback is the concept of a "rolling cache". This type of cache is usually produced in a compilation process that had hundreds or thousands of files which impact the cache contents. There is no well-defined file (or set of file) that the cache key can be based on --- the best identifier for the cache is basically the SHA of the commit being built.

Note: we probably need a separate issue to track this concept, but raising it here since it relates to restore keys and look up.

The following examples cover one way for matching on restore and how rolling might work ...

Example 1 (no restore keys)

steps:
- cache:
  key: a | b | c

Save key

a | b | c

Restore lookup

  1. a | b | c <-- exact match

Example 2 (with restore keys)

steps:
- cache:
  key: a | b | c
  restoreKeys:
  - a | b
  - a

Save key

a | b | c

Restore lookup

  1. a | b | c <-- exact match
  2. a | b <-- inexact match
  3. a <-- inexact match

Example 3 (rolling)

steps:
- cache:
  key: a
  rolling: true

Save key

a | $(Build.SourceVersion)

Restore lookup

  1. a | $(Build.SourceVersion) <-- exact match, which will almost never match unless the same commit was built in another branch
  2. a <-- inexact match

johnterickson added a commit to johnterickson/azure-pipelines-agent that referenced this issue Jul 24, 2019
TingluoHuang pushed a commit to microsoft/azure-pipelines-agent that referenced this issue Jul 25, 2019
@willsmythe willsmythe changed the title Enable fallback keys in Pipeline Caching Enable fallback keys (multiple restore keys) in Pipeline Caching Aug 5, 2019
@fadnavistanmay
Copy link
Contributor

This is released. Closing the issue.

@ruffsl
Copy link

ruffsl commented Aug 26, 2019

  • Keys are a single-line instead of multiline. Each key segment is separated by a pipe character instead of a newline

This kind of requires a lot of line noise if one wishes to safely escape strings in the key from being considered as paths, while also breaking longer key over yaml lines to improve readability and version control, as we must also insert the line wrapps after each pipe and escape the quotes for yaml.

parameters:
  key: 'build_cache_v1'
  workspace: '/opt/underlay_ws'
  path: '/opt/underlay_ws/build'

steps:
- task: CacheBeta@0
  inputs:
    key: "
      \"${{parameters.key}}\" | \
      \"$(Container.OS)\" | \
      \"$(Container.OSArchitecture)\" | \
      \"$(System.PullRequest.PullRequestNumber)\" | \
      ${{parameters.workspace}}/checksum.txt"
    path: ${{parameters.path}}
  displayName: Cache ${{parameters.key}}

@mahilleb-msft
Copy link
Member

@ruffsl - would this syntax work for you:

- task: CacheBeta@0
  inputs:
    key: >-
      "${{parameters.key}}" |
      "$(Container.OS)" |
      "$(Container.OSArchitecture)" |
      "$(System.PullRequest.PullRequestNumber)" |
      "${{parameters.workspace}}/checksum.txt"
    path: ${{parameters.path}}
  displayName: Cache ${{parameters.key}}

Cf. https://yaml-multiline.info/

@ruffsl
Copy link

ruffsl commented Aug 26, 2019

That is a lot more elegant! Thank you.
More examples: https://stackoverflow.com/a/21699210/2577586

@takluyver
Copy link

Did the rolling idea from example 3 in this comment get implemented, or only the restoreKeys bit? I can see you can make the same thing work with either, but rolling is quite a bit more concise.

@fadnavistanmay
Copy link
Contributor

@takluyver - the rolling idea has not been implemented yet, only the restoreKeys.

@takluyver
Copy link

OK, thanks. We can work with that. It would be good to see it in the docs, though - #5694. 🙂

@fadnavistanmay
Copy link
Contributor

Yup, we will be updating the docs soon. :)

@AceHack
Copy link

AceHack commented Apr 29, 2020

What does the new way look like?

@adrian-skybaker
Copy link

Yup, we will be updating the docs soon. :)

That was nearly 10 months ago. Perhaps you could leave github issues open until the documentation is done? Otherwise it seems documentation never actually happens? And without it are they really completed features?

@cawoodm
Copy link

cawoodm commented Nov 14, 2020

The documentation says a cache is "immutable" - created once and frozen in time and then points to this feature but I don't see any correlation between immutability and cache fallbacks.

Does this feature allow my cache to be loaded by a fixed key "foo", the directory updated by the pipeline (say by downloading new dependencies) and then the cache will be updated from the cache directory if the build changes the contents? In other words is the cache write once and does this feature change that?

Also, fwiw the path parameter doesn't seem to understand ~ so if you set path: ~/.gradle it balks:

There is a cache miss.
tar: /home/vsts/work/1/s/~/.gradle: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants