Cluster access for Red Hat (part 2, 1000 physical nodes) #22

jeremyeder · 2016-11-03T16:02:47Z

First Name

Jeremy

Last Name

Eder

Email

[email protected]

Company/Organization

Red Hat

Job Title

Engineer

Project Title

Deploying 1000 nodes of OpenShift on the CNCF Cluster (Part 2)

What existing problem or community challenge does this work address? ( Please include any past experience or lessons learned )

We are interested in:
Working through the operational concepts necessary to handle a large bare metal scale-out environment.
Comparing the behavior of Kubernetes on OpenStack with Kubernetes on bare metal.
Run our newly developed workload generators and test suite
Utilizing newer features in Kubernetes to make use of bare metal hardware features.

Briefly describe the project

To compliment our earlier work on the CNCF lab (https://cncf.io/news/blogs/2016/08/deploying-1000-nodes-openshift-cncf-cluster-part-1) we would like to propose a full-lab scale test scenario once the CNCF lab is at full capacity. We will look to quantify improved performance when running on bare metal instead of virtualized. We will conduct some specific HTTP load testing and storage (persistent volume) performance testing.

Do you intend to measure specific metrics during the work? Please describe briefly

Yes, we will use our pbench framework https://github.com/distributed-system-analysis/pbench to capture metrics on each run. We expect this to involve Prometheus, a CNCF project, to the extent that we use it for gathering Kubernetes API server metrics.

Which members of the CNCF community and/or end-users would benefit from your work?

Kubernetes, Prometheus, end users who are looking to run high performance workloads on bare metal environments. Also fluentd if that is accepted (OpenShift uses fluentd for logging).

Is the code that you’re going be running 100% open source? If so, what is the URL or URLs where it is located?

Yes: https://github.com/openshift

Do you commit to publishing your results and upstreaming the open source code resulting from your work? Do you agree to this within 2 months of cluster use?

Yes, we have already open-sourced everything we write and we have shared significant amounts of data via blog and public-speaking engagements at industry conferences.

Will your testing involve containers? If not, could it? What would be entailed in changing your processes to containerize your workload?

Yes.

Are there identified risks which would prevent you from achieving significant results in the project ?

Not that we are aware of. We have good experience handling OpenShift at scale and we are proposing a two-phase approach where we prototype on 100 nodes (this proposal) with an adjacently-scheduled phase at full-lab scale of 1000 nodes.

Have you requested CNCF cluster resources or access in the past? If ‘no’, please skip the next three questions.

Yes.

Please list project titles associated with prior CNCF cluster usage.

Deploying 1000 nodes of OpenShift on the CNCF Cluster (Part 1)

Please list contributions to open source initiatives for projects listed in the last question. If you did not upstream the results of the open source initiative in any of the projects, please explain why.

Over 30 bugs were filed across projects such as Kubernetes, OpenShift and Ansible.

Have you ever been denied usage of the cluster in the past? If so, please explain why.

No.

Please state your contributions to the open source community and any other relevant initiatives

Red Hat is a fully open-source company. Red Hat is a platinum founding member of CNCF, a contributor to docker, kubernetes, openshift origin, and many more.

Number of nodes requested (minimum 20 nodes, maximum 500 nodes). In Q3, maximum increases to 1000 nodes.

1000 nodes. (We realize that there will be slightly less than 1000 available for us to use).

Duration of request (minimum 24 hours, maximum 2 weeks)

2 weeks at least.

Please schedule this immediately after #21 so that we can retain our existing environment, and expand it on to the additional nodes.

With or Without an operating system (Restricted to CNCF pre-defined OS and versions)?

With, RHEL7.3

How will this testing advance cloud native computing (specifically containerization, orchestration, microservices or some combination).

We are working to push beyond control plane scalability to simulate realistic bare metal scenarios. This will include loading applications that represent an accurate mix of what we have seen in the wild. Being able to do this at higher scale levels will help us to discover best practices from an architecture standpoint as well as to help validate capacity planning formulas to see if they hold up at higher scale and load levels.

Any other relevant details we should know about while preparing the infrastructure?

bprestonlf · 2016-11-04T11:58:38Z

+1

cncfclusterteam · 2016-11-04T12:15:17Z

The soonest date to deliver 1000 nodes is end of Q1 '17

jeremyeder · 2016-11-09T21:49:50Z

Thanks -- we will wait until then.

jeremyeder · 2016-12-19T11:45:08Z

Hi @cncfclusterteam -- could you please let us know the timing/likelihood of this allotment please?

jeremyeder · 2017-03-10T19:44:36Z

Hi @cncfclusterteam -- could you please let us know when this allotment might occur?

jeremyeder · 2017-03-15T15:46:29Z

Hello -- we've had some delays in getting hardware (see conversations on slack in #cluster). Also we found out that there's only ~375 nodes that could be allocated to us (we received tghem on March 14). Is there any chance our turn on this gear could be extended by 1-2 weeks?

jeremyeder · 2017-03-20T17:53:10Z

@cncfclusterteam ping

cncfclusterteam · 2017-03-21T10:18:16Z

@jeremyeder please continue to use the nodes allocated to have full test results

jeremyeder · 2017-04-03T18:25:42Z

We are still missing one important test (pod density). We are trying to complete that this week.

The majority of test results have been posted here:
https://www.cncf.io/blog/2017/03/28/deploying-2048-openshift-nodes-cncf-cluster-part-2/

cncfclusterteam · 2017-04-04T11:28:17Z

@jeremyeder the nodes are yours till next Monday (10.04.2017), good luck with the test and thank you for the blog report! :)

cncfclusterteam · 2017-04-11T07:35:58Z

Hi @jeremyeder,

We hope the time spent with the cluster has been productive. I am writing to inform you that we would like to clean up the nodes for next tenants. Please let us know when we can take them back to the free pool.

Thank you,
CNCF Cluster Team

jeremyeder · 2017-04-11T13:57:17Z

Hi @cncfclusterteam We've gotten all our data off. Thank you very much for access to the gear!

cncfclusterteam mentioned this issue Nov 9, 2016

Cluster access for Red Hat (part 1, 100 physical nodes) #21

Closed

cncfclusterteam added the in progress label Mar 21, 2017

cncfclusterteam mentioned this issue Mar 21, 2017

Current cluster status #20

Closed

cncfclusterteam closed this as completed Apr 18, 2017

cncfclusterteam removed the in progress label Apr 18, 2017

vielmetti added this to the OpenShift milestone Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster access for Red Hat (part 2, 1000 physical nodes) #22

Cluster access for Red Hat (part 2, 1000 physical nodes) #22

jeremyeder commented Nov 3, 2016

bprestonlf commented Nov 4, 2016

cncfclusterteam commented Nov 4, 2016 •

edited

Loading

jeremyeder commented Nov 9, 2016

jeremyeder commented Dec 19, 2016

jeremyeder commented Mar 10, 2017

jeremyeder commented Mar 15, 2017

jeremyeder commented Mar 20, 2017

cncfclusterteam commented Mar 21, 2017

jeremyeder commented Apr 3, 2017

cncfclusterteam commented Apr 4, 2017

cncfclusterteam commented Apr 11, 2017

jeremyeder commented Apr 11, 2017

Cluster access for Red Hat (part 2, 1000 physical nodes) #22

Cluster access for Red Hat (part 2, 1000 physical nodes) #22

Comments

jeremyeder commented Nov 3, 2016

First Name

Last Name

Email

Company/Organization

Job Title

Project Title

What existing problem or community challenge does this work address? ( Please include any past experience or lessons learned )

Briefly describe the project

Do you intend to measure specific metrics during the work? Please describe briefly

Which members of the CNCF community and/or end-users would benefit from your work?

Is the code that you’re going be running 100% open source? If so, what is the URL or URLs where it is located?

Do you commit to publishing your results and upstreaming the open source code resulting from your work? Do you agree to this within 2 months of cluster use?

Will your testing involve containers? If not, could it? What would be entailed in changing your processes to containerize your workload?

Are there identified risks which would prevent you from achieving significant results in the project ?

Have you requested CNCF cluster resources or access in the past? If ‘no’, please skip the next three questions.

Please list project titles associated with prior CNCF cluster usage.

Please list contributions to open source initiatives for projects listed in the last question. If you did not upstream the results of the open source initiative in any of the projects, please explain why.

Have you ever been denied usage of the cluster in the past? If so, please explain why.

Please state your contributions to the open source community and any other relevant initiatives

Number of nodes requested (minimum 20 nodes, maximum 500 nodes). In Q3, maximum increases to 1000 nodes.

Duration of request (minimum 24 hours, maximum 2 weeks)

With or Without an operating system (Restricted to CNCF pre-defined OS and versions)?

How will this testing advance cloud native computing (specifically containerization, orchestration, microservices or some combination).

Any other relevant details we should know about while preparing the infrastructure?

bprestonlf commented Nov 4, 2016

cncfclusterteam commented Nov 4, 2016 • edited Loading

jeremyeder commented Nov 9, 2016

jeremyeder commented Dec 19, 2016

jeremyeder commented Mar 10, 2017

jeremyeder commented Mar 15, 2017

jeremyeder commented Mar 20, 2017

cncfclusterteam commented Mar 21, 2017

jeremyeder commented Apr 3, 2017

cncfclusterteam commented Apr 4, 2017

cncfclusterteam commented Apr 11, 2017

jeremyeder commented Apr 11, 2017

cncfclusterteam commented Nov 4, 2016 •

edited

Loading