Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster access for Red Hat (part 2, 1000 physical nodes) #22

Closed
jeremyeder opened this issue Nov 3, 2016 · 12 comments
Closed

Cluster access for Red Hat (part 2, 1000 physical nodes) #22

jeremyeder opened this issue Nov 3, 2016 · 12 comments
Milestone

Comments

@jeremyeder
Copy link

First Name

Jeremy

Last Name

Eder

Email

[email protected]

Company/Organization

Red Hat

Job Title

Engineer

Project Title

Deploying 1000 nodes of OpenShift on the CNCF Cluster (Part 2)

What existing problem or community challenge does this work address? ( Please include any past experience or lessons learned )

We are interested in:
Working through the operational concepts necessary to handle a large bare metal scale-out environment.
Comparing the behavior of Kubernetes on OpenStack with Kubernetes on bare metal.
Run our newly developed workload generators and test suite
Utilizing newer features in Kubernetes to make use of bare metal hardware features.

Briefly describe the project

To compliment our earlier work on the CNCF lab (https://cncf.io/news/blogs/2016/08/deploying-1000-nodes-openshift-cncf-cluster-part-1) we would like to propose a full-lab scale test scenario once the CNCF lab is at full capacity. We will look to quantify improved performance when running on bare metal instead of virtualized. We will conduct some specific HTTP load testing and storage (persistent volume) performance testing.

Do you intend to measure specific metrics during the work? Please describe briefly

Yes, we will use our pbench framework https://github.com/distributed-system-analysis/pbench to capture metrics on each run. We expect this to involve Prometheus, a CNCF project, to the extent that we use it for gathering Kubernetes API server metrics.

Which members of the CNCF community and/or end-users would benefit from your work?

Kubernetes, Prometheus, end users who are looking to run high performance workloads on bare metal environments. Also fluentd if that is accepted (OpenShift uses fluentd for logging).

Is the code that you’re going be running 100% open source? If so, what is the URL or URLs where it is located?

Yes: https://github.com/openshift

Do you commit to publishing your results and upstreaming the open source code resulting from your work? Do you agree to this within 2 months of cluster use?

Yes, we have already open-sourced everything we write and we have shared significant amounts of data via blog and public-speaking engagements at industry conferences.

Will your testing involve containers? If not, could it? What would be entailed in changing your processes to containerize your workload?

Yes.

Are there identified risks which would prevent you from achieving significant results in the project ?

Not that we are aware of. We have good experience handling OpenShift at scale and we are proposing a two-phase approach where we prototype on 100 nodes (this proposal) with an adjacently-scheduled phase at full-lab scale of 1000 nodes.

Have you requested CNCF cluster resources or access in the past? If ‘no’, please skip the next three questions.

Yes.

Please list project titles associated with prior CNCF cluster usage.

Deploying 1000 nodes of OpenShift on the CNCF Cluster (Part 1)

Please list contributions to open source initiatives for projects listed in the last question. If you did not upstream the results of the open source initiative in any of the projects, please explain why.

Over 30 bugs were filed across projects such as Kubernetes, OpenShift and Ansible.

Have you ever been denied usage of the cluster in the past? If so, please explain why.

No.

Please state your contributions to the open source community and any other relevant initiatives

Red Hat is a fully open-source company. Red Hat is a platinum founding member of CNCF, a contributor to docker, kubernetes, openshift origin, and many more.

Number of nodes requested (minimum 20 nodes, maximum 500 nodes). In Q3, maximum increases to 1000 nodes.

1000 nodes. (We realize that there will be slightly less than 1000 available for us to use).

Duration of request (minimum 24 hours, maximum 2 weeks)

2 weeks at least.

Please schedule this immediately after #21 so that we can retain our existing environment, and expand it on to the additional nodes.

With or Without an operating system (Restricted to CNCF pre-defined OS and versions)?

With, RHEL7.3

How will this testing advance cloud native computing (specifically containerization, orchestration, microservices or some combination).

We are working to push beyond control plane scalability to simulate realistic bare metal scenarios. This will include loading applications that represent an accurate mix of what we have seen in the wild. Being able to do this at higher scale levels will help us to discover best practices from an architecture standpoint as well as to help validate capacity planning formulas to see if they hold up at higher scale and load levels.

Any other relevant details we should know about while preparing the infrastructure?

@bprestonlf
Copy link

+1

@cncfclusterteam
Copy link
Contributor

cncfclusterteam commented Nov 4, 2016

The soonest date to deliver 1000 nodes is end of Q1 '17

@jeremyeder
Copy link
Author

Thanks -- we will wait until then.

@jeremyeder
Copy link
Author

Hi @cncfclusterteam -- could you please let us know the timing/likelihood of this allotment please?

@jeremyeder
Copy link
Author

Hi @cncfclusterteam -- could you please let us know when this allotment might occur?

@jeremyeder
Copy link
Author

Hello -- we've had some delays in getting hardware (see conversations on slack in #cluster). Also we found out that there's only ~375 nodes that could be allocated to us (we received tghem on March 14). Is there any chance our turn on this gear could be extended by 1-2 weeks?

@jeremyeder
Copy link
Author

@cncfclusterteam ping

@cncfclusterteam
Copy link
Contributor

@jeremyeder please continue to use the nodes allocated to have full test results

@jeremyeder
Copy link
Author

We are still missing one important test (pod density). We are trying to complete that this week.

The majority of test results have been posted here:
https://www.cncf.io/blog/2017/03/28/deploying-2048-openshift-nodes-cncf-cluster-part-2/

@cncfclusterteam
Copy link
Contributor

@jeremyeder the nodes are yours till next Monday (10.04.2017), good luck with the test and thank you for the blog report! :)

@cncfclusterteam
Copy link
Contributor

Hi @jeremyeder,

We hope the time spent with the cluster has been productive. I am writing to inform you that we would like to clean up the nodes for next tenants. Please let us know when we can take them back to the free pool.

Thank you,
CNCF Cluster Team

@jeremyeder
Copy link
Author

Hi @cncfclusterteam We've gotten all our data off. Thank you very much for access to the gear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants