Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0137] Rework the events controller cache #6914

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

afrittoli
Copy link
Member

@afrittoli afrittoli commented Jul 10, 2023

Dependencies:

Once dependencies are merged, the PR will contain 1 commit only

Changes

Rework the events controller cache

Switch the cache implementation from a simple LRU to the bigcache
library which provides better performance and reduces the need
for garbage collection.

Further improve the cache by reducing the size of keys, by
adopting hashing of keys and thus storing smaller data in the cache.
Hashing also allows for detecting changes in a larger surface of
the data. While todays keys are based on the event type and
resource metadata, we can now include conditions in the equation
as well, to provide and enhanced logic for event sending.

Storing a resource metadata and its condition in the hashed keys
means we can detect immediately if no relevant change was made
in a resource, and quickly finish the reconcile cycle.
Additionally storing the event type along with the resource
metadata, means that we can detect if a specific message was
already sent for a specific resource.

To be able to rely on the cache to decide which event to send,
we need to ensure that reconcile cycles for the same resource
are never executed concurrently. To achieve this, the thread
number (or concurrency) is set to 1 in the controller.

Scalability needs can be addressed by scaling the events
controller horizontally. The sharding of resources across
instances of the reconciler is based on the resource key, so
the same resource will always be directed to the same controller
thus avoiding any cache miss.

Signed-off-by: Andrea Frittoli [email protected]

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

/kind misc

@tekton-robot tekton-robot added kind/misc Categorizes issue or PR as a miscellaneuous one. release-note-none Denotes a PR that doesnt merit a release note. labels Jul 10, 2023
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from afrittoli after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 10, 2023
@tekton-robot tekton-robot requested review from jerop and vdemeester July 10, 2023 12:22
@afrittoli afrittoli changed the title Tep0137 cache Rework the events controller cache Jul 10, 2023
@afrittoli afrittoli changed the title Rework the events controller cache [TEP-0137] Rework the events controller cache Jul 10, 2023
The events controllers for different resources (CustomRun, TaskRun
and PipelineRun) will be almost identical, with only the resource
type being different.

This commit refactors the CustomRun events controller to factor up
as much as possible of the reconciler, controller and test logic
so that we can reuse it in the upcoming commits for the other
resources.

I've done a slight change of strategy in the unit test structure
compared to what we do for the core controller tests.
A set of tests verifies as much as possible of the shared
functions, by mocking the event functionality away. These tests
are independent of the specific target format of the events.

Most of the functionality of the ReconcileKind functions is
handled in reconciler tests, which do not need a controller object
or a config map watcher to run, which reduces the complexity of
these tests without sacrifying coverage.

Finally, a smaller set of tests covers the controller -> reconciler
logic, so verify that our controller works well when invoking
the ReconcileKind indirectly through the generated package.

Signed-off-by: Andrea Frittoli <[email protected]>
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1/pipelinerun_types.go 89.5% 88.7% -0.8
pkg/apis/pipeline/v1beta1/pipelinerun_types.go 89.7% 88.9% -0.8
pkg/reconciler/events/cache/cache.go 83.3% 72.9% -10.4
pkg/reconciler/events/cache/cacheclient.go 7.1% 6.2% -0.9
pkg/reconciler/events/cloudevent/cloud_event_controller.go 84.6% 83.6% -1.0
pkg/reconciler/notifications/controller.go Do not exist 25.0%
pkg/reconciler/notifications/customrun/reconciler.go Do not exist 87.5%
pkg/reconciler/notifications/runtimeobject.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1/pipelinerun_types.go 89.5% 88.7% -0.8
pkg/apis/pipeline/v1beta1/pipelinerun_types.go 89.7% 88.9% -0.8
pkg/reconciler/events/cache/cache.go 83.3% 72.9% -10.4
pkg/reconciler/events/cache/cacheclient.go 7.1% 6.2% -0.9
pkg/reconciler/events/cloudevent/cloud_event_controller.go 84.6% 83.6% -1.0
pkg/reconciler/notifications/controller.go Do not exist 25.0%
pkg/reconciler/notifications/customrun/reconciler.go Do not exist 87.5%
pkg/reconciler/notifications/runtimeobject.go Do not exist 100.0%

Switch the cache implementation from a simple LRU to the bigcache
library which provides better performance and reduces the need
for garbage collection.

Further improve the cache by reducing the size of keys, by
adopting hashing of keys and thus storing smaller data in the cache.
Hashing also allows for detecting changes in a larger surface of
the data. While todays keys are based on the event type and
resource metadata, we can now include conditions in the equation
as well, to provide and enhanced logic for event sending.

Storing a resource metadata and its condition in the hashed keys
means we can detect immediately if no relevant change was made
in a resource, and quickly finish the reconcile cycle.
Additionally storing the event type along with the resource
metadata, means that we can detect if a specific message was
already sent for a specific resource.

To be able to rely on the cache to decide which event to send,
we need to ensure that reconcile cycles for the same resource
are never executed concurrently. To achieve this, the thread
number (or concurrency) is set to 1 in the controller.

Scalability needs can be addressed by scaling the events
controller horizontally. The sharding of resources across
instances of the reconciler is based on the resource key, so
the same resource will always be directed to the same controller
thus avoiding any cache miss.

Signed-off-by: Andrea Frittoli <[email protected]>
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1/pipelinerun_types.go 89.5% 88.7% -0.8
pkg/apis/pipeline/v1beta1/pipelinerun_types.go 89.7% 88.9% -0.8
pkg/reconciler/events/cache/cache.go 83.3% 72.9% -10.4
pkg/reconciler/events/cache/cacheclient.go 7.1% 6.7% -0.5
pkg/reconciler/events/cloudevent/cloud_event_controller.go 84.6% 83.6% -1.0
pkg/reconciler/notifications/controller.go Do not exist 25.0%
pkg/reconciler/notifications/customrun/reconciler.go Do not exist 87.5%
pkg/reconciler/notifications/runtimeobject.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1/pipelinerun_types.go 89.5% 88.7% -0.8
pkg/apis/pipeline/v1beta1/pipelinerun_types.go 89.7% 88.9% -0.8
pkg/reconciler/events/cache/cache.go 83.3% 72.9% -10.4
pkg/reconciler/events/cache/cacheclient.go 7.1% 6.7% -0.5
pkg/reconciler/events/cloudevent/cloud_event_controller.go 84.6% 83.6% -1.0
pkg/reconciler/notifications/controller.go Do not exist 25.0%
pkg/reconciler/notifications/customrun/reconciler.go Do not exist 87.5%
pkg/reconciler/notifications/runtimeobject.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

tekton-robot commented Jul 11, 2023

@afrittoli: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-tekton-pipeline-unit-tests 06cfd97 link true /test tekton-pipeline-unit-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@afrittoli afrittoli added this to the Pipelines v0.51 milestone Jul 25, 2023
@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2023
@tekton-robot
Copy link
Collaborator

@afrittoli: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jerop jerop removed this from the Pipelines v0.54 milestone Nov 9, 2023
@jerop jerop added this to the Pipelines v0.55 milestone Nov 9, 2023
@chitrangpatel
Copy link
Contributor

@afrittoli do you think we can get this in for this milestone? Or should I move it over to 0.57?

@pritidesai
Copy link
Member

This PR has not made any progress since Pipelines v0.51 when it was initiated introduced. Clearing it from the milestone. Please add back as appropriate. Thanks!

@pritidesai pritidesai removed this from the Pipelines v0.58 milestone Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/misc Categorizes issue or PR as a miscellaneuous one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesnt merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants