Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritized scheduling of critical cluster addon pods #62

Closed
8 of 20 tasks
aronchick opened this issue Jul 25, 2016 · 12 comments
Closed
8 of 20 tasks

Prioritized scheduling of critical cluster addon pods #62

aronchick opened this issue Jul 25, 2016 · 12 comments
Assignees
Labels
sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Milestone

Comments

@aronchick
Copy link
Contributor

aronchick commented Jul 25, 2016

Description

Kubernetes has "cluster addon pods" that provide system services but do not run on the master node. Some of them are critical to have fully functional cluster: Heapster, DNS, UI. Users can break their cluster by evicting a critical addon (either manually or as a side effect of an other operation like upgrade) which possibly can become pending (for example when the cluster is highly utilized). To avoid such situation we introduce a rescheduler component that runs on the master and guarantees that critical addons are scheduled assuming the cluster is big enough. It does this by watching for pending critical pods and evicting other pods to make room for the pending one(s).

Design Proposal: kubernetes/kubernetes#29195

Progress Tracker

  • Before Alpha
    • Write and maintain draft quality doc. This step was skipped, instead a design proposal was circulated (see below)
    • Design Approval
    • Write (code + tests + docs) then get them merged. See kubernetes/contrib/rescheduler repo, plus Salt configuration for Rescheduler kubernetes#30870
      • Code needs to be disabled by default. Verified by code OWNERS
      • Minimal testing
      • Minimal docs
        • cc @kubernetes/docs on docs PR
        • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
        • New apis: Glossary Section Item in the docs repo: kubernetes/kubernetes.github.io
      • Update release notes
  • Before Beta
    • Testing is sufficient for beta
    • User docs with tutorials
      • Updated walkthrough / tutorial in the docs repo: kubernetes/kubernetes.github.io
      • cc @kubernetes/docs on docs PR
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Thorough API review
      • cc @kubernetes/api
  • Before Stable
    • docs/proposals/foo.md moved to docs/design/foo.md
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Soak, load testing
    • detailed user docs and examples
      • cc @kubernetes/docs
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

FEATURE_STATUS is used for feature tracking and to be updated by @kubernetes/feature-reviewers.
FEATURE_STATUS: IN_DEVELOPMENT

More advice:

Design

  • Once you get LGTM from a @kubernetes/feature-reviewers member, you can check this checkbox, and the reviewer will apply the "design-complete" label.

Coding

  • Use as many PRs as you need. Write tests in the same or different PRs, as is convenient for you.
  • As each PR is merged, add a comment to this issue referencing the PRs. Code goes in the http://github.com/kubernetes/kubernetes repository,
    and sometimes http://github.com/kubernetes/contrib, or other repos.
  • When you are done with the code, apply the "code-complete" label.
  • When the feature has user docs, please add a comment mentioning @kubernetes/feature-reviewers and they will
    check that the code matches the proposed feature and design, and that everything is done, and that there is adequate
    testing. They won't do detailed code review: that already happened when your PRs were reviewed.
    When that is done, you can check this box and the reviewer will apply the "code-complete" label.

Docs

  • Write user docs and get them merged in.
  • User docs go into http://github.com/kubernetes/kubernetes.github.io.
  • When the feature has user docs, please add a comment mentioning @kubernetes/docs.
  • When you get LGTM, you can check this checkbox, and the reviewer will apply the "docs-complete" label.
@aronchick aronchick added this to the v1.4 milestone Jul 25, 2016
@idvoretskyi
Copy link
Member

cc @kubernetes/sig-scheduling

@idvoretskyi idvoretskyi added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Jul 25, 2016
@philips
Copy link
Contributor

philips commented Jul 25, 2016

@davidopp @piosz can you please fill in the description further? I think the level of context and description #63 provides is sufficient.

@davidopp
Copy link
Member

OK I updated it.

@davidopp
Copy link
Member

davidopp commented Aug 26, 2016

Status: this feature is finished except for docs. The code is in contrib/rescheduler and it will be automatically deployed in clusters, requesting 10m cpu and 100Mi memory.

This feature does not have an API (perhaps you could call its flags an API, but not in the usual Kubernetes API sense) so I am not sure whether to call this alpha, beta, or GA. I guess I'll call it alpha.

@goltermann
Copy link
Contributor

@davidopp re: alpha/beta/stable - are we planning on doing more work here in 1.5 or have significant iterations to do? Do we have reason to recommend not using it in production clusters? Basically - why isn't this GA/stable?

@davidopp
Copy link
Member

davidopp commented Aug 26, 2016

Here are some random thoughts:

  • I'm wary of labeling any feature as GA/stable in its first release.
  • We might want it to run by default in GKE
  • It doesn't have a user-facing API.
  • It will eventually be replaced with a proper priority/preemption scheme that will probably work differently (i.e. be driven by the scheduler rather than a standalone component as here).

I'm fine calling it beta, though I think we should try to limit the number of features that go directly to beta.

@janetkuo
Copy link
Member

janetkuo commented Sep 2, 2016

@davidopp @piosz Are the docs ready? Please update the docs in https://github.com/kubernetes/kubernetes.github.io, and then add PR numbers and check the docs box in the issue description

@piosz
Copy link
Member

piosz commented Sep 6, 2016

@janetkuo PR kubernetes/website#1170 in flight

@davidopp
Copy link
Member

@piosz Should we close this issue? IIUC we're not planning to do more work on it, and it runs by default in all clusters (GKE and open-source). If you agree, please close.

@piosz
Copy link
Member

piosz commented Nov 11, 2016

Yes we should close this. I don't have enough permissions to do it.

@ericchiang
Copy link
Contributor

This feature still seems to be marked alpha (at least the annotation is still marked alpha).

Has this been abandoned without plans to move it to beta?

@davidopp
Copy link
Member

davidopp commented May 5, 2017

It will not move to beta. It will be subsumed by #268.

ingvagabund pushed a commit to ingvagabund/enhancements that referenced this issue Apr 2, 2020
…olm-folder

Move olm registry enhancement to correct folder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

8 participants