Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Cache the generators of patchUpdated configurations #2805

Merged

Conversation

2opremio
Copy link
Contributor

@2opremio 2opremio commented Jan 31, 2020

Fixes #2801

This change assumes that the generator commands of patchUpdated configurations
have a constant output throughout the execution of an synchronization/policy
update/image release operation. This was already assumed, to certain extent, by
patchUpdated's implementation since, for each workload/container update we run
the generators in order to refresh the patchFile.

In practice, this change caches the result of the generators throughout the
lifespan of the ConfigFile variable (which, currently, is the execution of a
synchronization, policy update or release operation). However, I'm not entirely
happy about this, since the lifespan of ConfigFile may change in the future. I
would rather explicitly track what the cache depends on but:

  • Using cache keys is probably not enough (Maybe use the Git Hash of the
    filesystem? But what if there are file modifications? and what about external
    network dependencies?).

  • ConfigFile is decoupled from the git repository implementation I would like
    it remain this way. So I would rather not invoke git commands in the
    ConfigFile implementation.

  • Alternatively to cache keys we could bookkeep the cache outside of
    ConfigFile but that's not easy since:

    1. We are not caching the final manifests, just the ones returned
      by the generators (which in patchUpdated isn't the same) which
      are not accessible external.
    2. The Config file is a few layers under the manifests.Store

Maybe I will find a better solution later on.

Anyways, this cache should be effective because update/release operations are
subdivided into individual workload/container operations. This smaller
operations can be numerous, and each of them requires obtaining the result of
the generators execution to recompute the patchFile. On top of that, every
image release operation has a verification stage, which also requires the
generators result.

It's worth noting that we cannot do the same sort of caching for
commandUpdated configurations, since the updater invocations are supposed to
cause side-effects in the generators' result. If it's of any consolation,
commandUpdated configurations don't invoke the generators for updates,
but, we do call the updaters which may be doing that under the hood. The
conclusion is that commandUpdated configurations should do the caching
themselves.

@2opremio 2opremio requested a review from squaremo January 31, 2020 01:31
@2opremio 2opremio changed the title Cache the result of the generator commands of patchUpdated configurations Cache the generators of patchUpdated configurations Jan 31, 2020
@2opremio 2opremio force-pushed the cache-generators-for-patchupdated-configs branch from 16ee168 to ddcfcac Compare January 31, 2020 01:32
@squaremo squaremo self-assigned this Mar 31, 2020
@squaremo squaremo force-pushed the cache-generators-for-patchupdated-configs branch from ddcfcac to 46213d6 Compare April 1, 2020 14:40
@squaremo squaremo marked this pull request as ready for review April 1, 2020 15:09
@squaremo squaremo force-pushed the cache-generators-for-patchupdated-configs branch from 46213d6 to 6af82a6 Compare April 1, 2020 15:13
Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more than a minor upheaval for ConfigFiles to outlive a particular checkout, so I am less worried about the caching. (Famous last words before coming back six months later to debug a mysterious problem ...)

This change assumes that the generator commands of `patchUpdated` configurations
have a constant output throughout the execution of an synchronization/policy
update/image release operation. This was already assumed, to certain extent, by
`patchUpdated`'s implementation since, for each workload/container update we run
the `generators` in order to refresh the `patchFile`.

In practice, this change caches the result of the generators throughout the
lifespan of the `ConfigFile` variable (which, currently, is the execution of a
synchronization, policy update or release operation). However, I'm not entirely
happy about this, since the lifespan of `ConfigFile` may change in the future. I
would rather explicitly track what the cache depends on but:

* Using cache keys is probably not enough (Maybe use the Git Hash of the
  filesystem? But what if there are file modifications? and what about external
  network dependencies?).

* `ConfigFile` is decoupled from the git repository implementation I would like
  it remain this way. So I would rather not invoke git commands in the
  `ConfigFile` implementation.

* Alternatively to cache keys we could bookkeep the cache outside of
  `ConfigFile` but that's not easy since:
  1. We are not caching the final manifests, just the ones returned
     by the generators (which in `patchUpdated` isn't the same) which
     are not accessible external.
  2. The `Config` file is a few layers under the `manifests.Store`

Maybe I will find a better solution later on.

Anyways, this cache is effective because update/release operations are
subdivided into individual workload/container operations. This smaller
operations can be numerous, and each of them requires obtaining the result of
the `generators` execution to recompute the `patchFile`. On top of that, every
image release operation has a verification stage, which also requires the
`generators` result.

It's worth noting that we cannot do the same sort of caching for
`commandUpdated` configurations, since the `updater` invocations are supposed to
cause side-effects in the `generators` result. If it's of any consolation,
`commandUpdated` configurations don't invoke the `generators` for updates,
but, we do call the `updaters` which may be doing that under the hood. The
conclusion is that `commandUpdated` configuarions should do the caching
themselves.
@squaremo squaremo force-pushed the cache-generators-for-patchupdated-configs branch from 6af82a6 to 7d07df0 Compare April 1, 2020 15:48
@squaremo squaremo merged commit 6e6f3a8 into fluxcd:master Apr 1, 2020
@2opremio 2opremio deleted the cache-generators-for-patchupdated-configs branch April 1, 2020 17:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manifest generation with Kustomize causes updates to timeout when including remote resources
2 participants