🏃 Proposal to extract cluster-specifics out of the Manager #1075

alvaroaleman · 2020-07-26T17:45:33Z

A proposal based on #950
/assign @mengqiy @vincepri @estroz

alvaroaleman · 2020-07-26T17:45:51Z

designs/move-cluster-specific-code-out-of-manager.md

+`runnables` are started.
+
+
+The new `ClusterConnector` interface will look like this:


Ideas for a better name are very welcome :)

Or maybePeerCluster or ClusterPeer? :)

Regarding naming: What about just Cluster? Its pretty much what this represents

Cluster seems fine (it's bugging me a little bit that it doesn't say that it's not super clear that this type doesn't set up a cluster, like envtest, but rather connects to a cluster, but don't consider that comment blocking)

Cluster it is now

alvaroaleman · 2020-07-26T17:50:06Z

designs/move-cluster-specific-code-out-of-manager.md

+
+```go
+if cc, isClusterConnector:= runnable.(clusterconnector.ClusterConnector); isClusterConnector {
+	m.caches = append(m.caches, cc.GetCache())


This makes the whole thing a bit weird, because we will end up starting both the ClusterConnectors Cache and the ClusterConnector separately. The alternate would be to require ppl to do a mgr.Add(clusterConnector.GetCache) which I would like to avoid because its unintuitive.

It's probably fine to do type assertion here, since this's internal implementation details.

The alternate would be to require ppl to do a mgr.Add(clusterConnector.GetCache) which I would like to avoid because its unintuitive.

+1

can't we just do:

type HasCaches interface { GetCache() ... } if getter, hasCaches := runnable.(hasCaches); hasCaches { ... }

(could make a additional method to avoid accidentally conflicting too)

kramerul · 2020-07-27T05:31:42Z

designs/move-cluster-specific-code-out-of-manager.md

+	GetWebhookServer() *webhook.Server
+}
+```
+


Would it be possible to separate this into 3 interfaces? In my point of view, this will make the API clearer. In multicluster environments, you have to call MultiClusterManager#Connect once for each connected cluster in contrast to have one implicit connection and having n-1 explicit connections.

type cluserconnector.ClusterConnector{ // same as above ... }} type MultiClusterManager interface { // Create a new connection to a cluster. Creates a new instance of ClusterConnector and adds it to this manager Connect(config *rest.Config, name string, opts ...Option) cluserconnector.ClusterConnector // same as Manager from proposal except cluserconnector.ClusterConnector Add(Runnable) error Elected() <-chan struct{} SetFields(interface{}) error AddMetricsExtraHandler(path string, handler http.Handler) error AddHealthzCheck(name string, check healthz.Checker) error AddReadyzCheck(name string, check healthz.Checker) error Start(<-chan struct{}) error GetWebhookServer() *webhook.Server } type Manager interface { cluserconnector.ClusterConnector MultiClusterManager }

Introducing a factory method Connect inside MultiClusterManager would also solve the problems of having to use type assertions (see comment of @alvaroaleman).

@kramerul Generally I like the idea (although I would probably embedd the Manager ito the MultiClusterManager and not the other way round) but it opens up an interesting question: What config will we use for leader election?

Right now LeaderElection is the reason why it IMHO makes sense to define one primary cluster which we implicitly do by requiring a config to construct a manager.

@alvaroaleman, I haven't thought about this aspect. But it's absolutely correct, that you need to have a primary connection for leader election. In this case I would agree to have only one Manager interface.

The Connect method could ease the usage of the API.

I like the idea of having an explicit interface for MultiCluster manager.

I would probably embedd the Manager ito the MultiClusterManager and not the other way round)

+1

A possible alternative is:

type ClusterConnector{ // same as above } // For single cluster use-case only type Manager interface { // same as above } type MultiClusterManager interface { Manager // Create a new connection to a secondary cluster. Creates a new instance of ClusterConnector and adds it to this manager. Each ClusterConnector should have its unique name. ConnectSecondaryCluster(config *rest.Config, name string, opts ...Option) (cluserconnector.ClusterConnector, error) // Look up the ClusterConnector by its unique name. GetClusterConnectorFor(name string) (cluserconnector.ClusterConnector, error) }

I've added it like this:

type MultiClusterManager interface { Manager // Add another named cluster to the MultiClusterManager. The name // must be unique. The MultiClusterManager will wait for all clusters // caches to be started before starting anytything else. AddCluster(config *rest.Config, name string) error // GetClusters returns all Clusters this MultiClusterManager knows about. // The cluster used to construct the MultiClusterManager is named `primary` // and will be used for LeaderElection, if enabled. GetClusters() map[string]clusterconnector.ClusterConnector }

In all the multi-cluster controllers I've built so far, the actual controller just wants to watch in all clusters but doesn't necessarily know the number or names of them ahead of time which is why a method that returns a map of all clusters is IMHO more useful. If there is one that gets a special treatment, this is still possible.

A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?

A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

IMHO this proposal should do nothing to prevent that but it does not need to do anything yet to simplify that. Once we want such a functionality, we would probably add a RemoveCluster to the interface.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?

I guess we could probably change AddCluster to have a UseForLeaderElection bool opt if leader election is enabled, use that cluster (and error out when starting and note exactly one cluster is configured to be used for leader election). WDYT?

On second thought, after trying to update the sample, I would like to not make a MultiClusterManager part of this proposal, because it opens up a set of questions I would like to not answer just yet:

The Builder needs to be extended to have something like WatchesFromCluster(source, "cluster-name", handler), otherwise we need to blindly dereference elements from the map[clusterName]cluster.Cluster, potentially getting NPDs

The Builder needs to be extended to have something like WatchesFromAllClustersExcept(source, "exepcted-cluster-name", handler)

Probably more use cases for watches I didn't think about

There should be a nice way to pass on all clients for all clusters to a controller that doesn't involve writing a loop (probably the multi cluster client abstraction that Solly suggested?)

When adding official support for building a controller that watches an arbitraty number of clusters, we also need an official way of encoding the cluster name into the reconcile.Request

And probably more.

IMHO, extracting the clusterspecifics out of the manager won't block any of the work mentioned above but allows us to start working on the topic without requiring us to already anticipate all use cases (And is already a big improvement over the status quo).

ok, fine with doing that in a follow-up

designs/move-cluster-specific-code-out-of-manager.md

mengqiy · 2020-07-28T18:35:03Z

designs/move-cluster-specific-code-out-of-manager.md

+
+	// SetFields will set any dependencies on an object for which the object has implemented the inject
+	// interface - e.g. inject.Client.
+	SetFields(interface{}) error


Not sure what will happen when the embeded ClusterConnector also has SetFields method

AFAIK this is fine as long as they have the same signature

mengqiy · 2020-07-28T18:56:30Z

designs/move-cluster-specific-code-out-of-manager.md

+	GetWebhookServer() *webhook.Server
+}
+```
+


I like the idea of having an explicit interface for MultiCluster manager.

I would probably embedd the Manager ito the MultiClusterManager and not the other way round)

+1

A possible alternative is:

type ClusterConnector{ // same as above } // For single cluster use-case only type Manager interface { // same as above } type MultiClusterManager interface { Manager // Create a new connection to a secondary cluster. Creates a new instance of ClusterConnector and adds it to this manager. Each ClusterConnector should have its unique name. ConnectSecondaryCluster(config *rest.Config, name string, opts ...Option) (cluserconnector.ClusterConnector, error) // Look up the ClusterConnector by its unique name. GetClusterConnectorFor(name string) (cluserconnector.ClusterConnector, error) }

mengqiy · 2020-07-28T18:56:34Z

designs/move-cluster-specific-code-out-of-manager.md

+
+```go
+if cc, isClusterConnector:= runnable.(clusterconnector.ClusterConnector); isClusterConnector {
+	m.caches = append(m.caches, cc.GetCache())


It's probably fine to do type assertion here, since this's internal implementation details.

The alternate would be to require ppl to do a mgr.Add(clusterConnector.GetCache) which I would like to avoid because its unintuitive.

+1

mengqiy · 2020-07-28T19:00:44Z

/assign @DirectXMan12 Thoughts?

DirectXMan12

sorry for the late comments. Mostly looks good, couple of things inline

DirectXMan12 · 2020-08-20T23:55:38Z

designs/move-cluster-specific-code-out-of-manager.md

+`runnables` are started.
+
+
+The new `ClusterConnector` interface will look like this:


Cluster seems fine (it's bugging me a little bit that it doesn't say that it's not super clear that this type doesn't set up a cluster, like envtest, but rather connects to a cluster, but don't consider that comment blocking)

DirectXMan12 · 2020-08-20T23:58:16Z

designs/move-cluster-specific-code-out-of-manager.md

+type ClusterConnector interface {
+	// SetFields will set cluster-specific dependencies on an object for which the object has implemented the inject
+	// interface, specifically inject.Client, inject.Cache, inject.Scheme, inject.Config and inject.APIReader
+	SetFields(interface{}) error


tangentially related aside: I'd like to see if we can maybe refactor the internal DI stuff a bit before 1.0 -- the complete lack of type signature, etc bugs me a bit, but I've yet to figure out a better answer.

Its an unrelated topic but IMHO we should try to get rid of all of it, because it makes changes to it runtime errors and not compile-time errors which is very bad

agreed. I'd like to see a proposal for how to cleanly do that (this is not to sound dismissive or snarky -- I genuinely would like to see a proposal :-) )

DirectXMan12 · 2020-08-21T00:00:18Z

designs/move-cluster-specific-code-out-of-manager.md

+	GetScheme() *runtime.Scheme
+
+	// Start starts the ClusterConnector
+	Start(<-chan struct{}) error


are we going to integrate the context work here?

Yeah, this proposal is orthogonal to the context work, the MultiClusterManager will make use of the same Runnable interface as the normal Manager, whatever that is at time of implementation

DirectXMan12 · 2020-08-21T00:05:31Z

designs/move-cluster-specific-code-out-of-manager.md

+	GetWebhookServer() *webhook.Server
+}
+```
+


A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?

DirectXMan12 · 2020-08-21T00:08:08Z

designs/move-cluster-specific-code-out-of-manager.md

+
+```go
+if cc, isClusterConnector:= runnable.(clusterconnector.ClusterConnector); isClusterConnector {
+	m.caches = append(m.caches, cc.GetCache())


can't we just do:

type HasCaches interface { GetCache() ... } if getter, hasCaches := runnable.(hasCaches); hasCaches { ... }

(could make a additional method to avoid accidentally conflicting too)

DirectXMan12 · 2020-08-21T00:10:25Z

designs/move-cluster-specific-code-out-of-manager.md

+		return reconcile.Result, err
+	}
+
+	if err := r.mirrorClusterClient.Get(context.TODO(), r.NamespacedName, &corev1.Secret); err != nil {


Tangent: Not that we need to tackle it here, but we could even create a multicluster client that (ab)used the unused ClusterName field (maybe dangerous), or used a client option, context field, etc to avoid needing multiple clients.

That sounds like a great idea (but I would prefer to keep that as an orthogonal work item)

DirectXMan12 · 2020-08-21T00:10:50Z

designs/move-cluster-specific-code-out-of-manager.md

+	}
+
+	if err := r.mirrorClusterClient.Get(context.TODO(), r.NamespacedName, &corev1.Secret); err != nil {
+		if !kerrors.IsNotFound(err) {


Nit: client.IgnoreNotFound(err) exists ;-)

I don't understand how that is of use, we have to return on NotFound so its handled differently from no error

oh, the pattern just becomes if client.IgnoreNotFound(err) != nil. Mainly just avoids remembering another k8s package to import.

designs/move-cluster-specific-code-out-of-manager.md

DirectXMan12 · 2020-08-21T00:13:32Z

designs/move-cluster-specific-code-out-of-manager.md

+		panic(err)
+	}
+
+	mirrorClusterConnector, err := clusterconnector.New(cfg2)


should we leave the possibility for options here?

IMHO we should use functional opts once we have a need for that

ack, need to specify that in the signature so it's not breaking.

We should probably do a review before v1 and figure out spots where we should put functional options in (e.g. manager.New probably wants functional options).

alvaroaleman · 2020-09-03T18:38:14Z

Everyone, I think I have responded to all feedback, PTAL

DirectXMan12

very minor nits, otherwise

/approve

(will add the LGTM once nits are addressed, since it'll be removed anyway. Really would love "lgtm with nits" in k8s :-/)

designs/move-cluster-specific-code-out-of-manager.md

DirectXMan12 · 2020-09-11T22:31:10Z

designs/move-cluster-specific-code-out-of-manager.md

+		panic(err)
+	}
+
+	mirrorClusterConnector, err := clusterconnector.New(cfg2)


ack, need to specify that in the signature so it's not breaking.

We should probably do a review before v1 and figure out spots where we should put functional options in (e.g. manager.New probably wants functional options).

k8s-ci-robot · 2020-09-11T22:32:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [DirectXMan12,alvaroaleman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alvaroaleman · 2020-09-14T15:10:01Z

@DirectXMan12 updated to incorporate your suggestions

alvaroaleman · 2020-09-14T15:23:41Z

/retest

DirectXMan12 · 2020-09-21T23:04:01Z

/lgtm

DirectXMan12 · 2020-09-21T23:06:55Z

test failure is #1171

DirectXMan12 · 2020-09-22T00:26:21Z

/retest

ashishvya · 2021-01-28T06:32:48Z

@alvaroaleman Is this proposal got implemented? If not do you have any tracking ticket for it.

alvaroaleman · 2021-01-28T13:16:00Z

@ashishvya it is implemented

taragu · 2022-06-27T18:36:44Z

@alvaroaleman could you kindly link the PR for implementing this proposal? I struggle to find where it's implemented.

alvaroaleman · 2022-06-27T18:54:57Z

@taragu the implementation of this is in pkg/cluster, you can use the git history to find the relevant PRs

k8s-ci-robot assigned estroz, mengqiy and vincepri Jul 26, 2020

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 26, 2020

k8s-ci-robot requested review from mengqiy and pwittrock July 26, 2020 17:45

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 26, 2020

alvaroaleman commented Jul 26, 2020

View reviewed changes

alvaroaleman force-pushed the add branch from 9063c9f to bf3e187 Compare July 26, 2020 17:46

alvaroaleman commented Jul 26, 2020

View reviewed changes

alvaroaleman mentioned this pull request Jul 26, 2020

POC: Move cluster-specific code out of the manager #950

Closed

kramerul reviewed Jul 27, 2020

View reviewed changes

jiachengxu reviewed Jul 27, 2020

View reviewed changes

designs/move-cluster-specific-code-out-of-manager.md Show resolved Hide resolved

vincepri reviewed Jul 27, 2020

View reviewed changes

designs/move-cluster-specific-code-out-of-manager.md Show resolved Hide resolved

mengqiy reviewed Jul 28, 2020

View reviewed changes

alvaroaleman force-pushed the add branch from bf3e187 to 18681d4 Compare August 1, 2020 15:17

christopherhein mentioned this pull request Aug 12, 2020

[VC] Support MultiClusterManager once proposal is implemented kubernetes-retired/multi-tenancy#1012

Closed

DirectXMan12 suggested changes Aug 21, 2020

View reviewed changes

alvaroaleman force-pushed the add branch 2 times, most recently from f4e21db to 33f39b4 Compare September 3, 2020 18:24

DirectXMan12 approved these changes Sep 11, 2020

View reviewed changes

🏃 Proposal to extract cluster-specifics out of the Manager

a612390

alvaroaleman force-pushed the add branch from 33f39b4 to a612390 Compare September 14, 2020 15:09

k8s-ci-robot assigned DirectXMan12 Sep 21, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2020

k8s-ci-robot merged commit ea6a506 into kubernetes-sigs:master Sep 22, 2020

coderanger mentioned this pull request Feb 11, 2021

KEDA for multi-cluster use-case kedacore/keda#1587

Open

		`runnables` are started.


		The new `ClusterConnector` interface will look like this:

🏃 Proposal to extract cluster-specifics out of the Manager #1075

🏃 Proposal to extract cluster-specifics out of the Manager #1075

Conversation

alvaroaleman commented Jul 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kramerul Jul 27, 2020 • edited Loading

Choose a reason for hiding this comment

alvaroaleman Jul 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvaroaleman Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mengqiy commented Jul 28, 2020

DirectXMan12 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvaroaleman commented Sep 3, 2020

DirectXMan12 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 11, 2020

alvaroaleman commented Sep 14, 2020

alvaroaleman commented Sep 14, 2020

DirectXMan12 commented Sep 21, 2020

DirectXMan12 commented Sep 21, 2020

DirectXMan12 commented Sep 22, 2020

ashishvya commented Jan 28, 2021

alvaroaleman commented Jan 28, 2021

taragu commented Jun 27, 2022

alvaroaleman commented Jun 27, 2022

kramerul Jul 27, 2020 •

edited

Loading

alvaroaleman Jul 27, 2020 •

edited

Loading

alvaroaleman Sep 3, 2020 •

edited

Loading