Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Frontend-Backend separation support for providing multi-cluster joint management. #5605

Closed
siaimes opened this issue Aug 18, 2021 · 3 comments

Comments

@siaimes
Copy link
Contributor

siaimes commented Aug 18, 2021

What would you like to be added:
Frontend-Backend separation support for providing multi-cluster joint management.
Why is this needed:
For large-scale data centers, it is unwise to deploy all nodes into one k8s cluster, so that if the cluster fails, all machines will not be able to provide services normally. However, splitting all nodes into multiple clusters will bring additional burdens to Operations management and end-users.
Without this feature, how does the current module work:
A Frontend mangas a Backend, thus forming a cluster, and a cluster maintains a user database.
Components that may involve changes:
Separate the Frontend from the cluster and provide multiple Backend access capabilities. When a user submits a job, before selecting a virtual cluster, first select a cluster, so there is no significant difference in usage from before. But for Operations management, the probability of SPOF will be reduced. Version upgrades can also be done in batches to minimize uncertainty.

@suiguoxin
Copy link
Member

@siaimes Thanks for the proposal. Is this what you need ? We have supported job transfer among different clusters since v1.4.

@siaimes
Copy link
Contributor Author

siaimes commented Aug 23, 2021

@suiguoxin Thank you for your reply.

Job transfer is not what I wanted.

For clarity, I drew a sketch as an illustration.

image

We can provide an option that allows users to configure a frontend (a single-node cluster or a node in a cluster) and connect to the back-end of other clusters.

The benefits are as follows:

  1. Split a large k8s cluster into multiple small k8s clusters to reduce the probability that a single point of failure will cause the entire system to become unusable (The frontend node needs high availability);
  2. We can provide a unified user management system to reduce the pressure on Operations management;
  3. As long as the frontend node and the master node can access each other, the deployment of large-scale clusters with nodes distributed in multiple data centers is supported;
  4. The version upgrade can be rolled separately for each k8s cluster to ensure that the frontend is always available.

@mydmdm
Copy link
Contributor

mydmdm commented Nov 18, 2021

seems the Job specialization in #4801 is what you need as the protocol and backend support. However, due to some resource limits, we haven't gotten a chance to implement. Hope this can inspire your ideas.

@siaimes siaimes closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants