Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA: Draft plan to reach high availability #1477

Closed
MarkAckert opened this issue Jun 25, 2020 · 9 comments
Closed

HA: Draft plan to reach high availability #1477

MarkAckert opened this issue Jun 25, 2020 · 9 comments
Assignees

Comments

@MarkAckert
Copy link
Member

MarkAckert commented Jun 25, 2020

Taking prior component HA research, create a plan for Zowe to become highly available in sysplex environments.

Proof of concept the plan

@MarkAckert MarkAckert added this to the 20PI3S5 milestone Jun 25, 2020
@jackjia-ibm jackjia-ibm changed the title HA: Draft plan to reach high availability (sysplex) HA: Draft plan to reach high availability Jul 27, 2020
@jackjia-ibm
Copy link
Member

Continue from #1468.

The second version of Zowe-HA-Draft.docx.
The first version of Zowe-HA-Architecture-View.pptx.

There are pending works marked in the documentation. As suggested by Steve, we need more details regarding hybrid setup of z/OS+Containers.

@jackjia-ibm
Copy link
Member

The 3rd version of Zowe-HA-Draft.docx and the 2nd version of Zowe-HA-Architecture-View.pptx.

After several discussion related to Zowe Launcher, packaging, certificates, cross memory server, the implementation plan is added into the draft.

@1000TurquoisePogs
Copy link
Member

Maybe I am misunderstanding, but this document seems to imply that we need to write pretty different code on z/os versus docker. It would be inefficient to maintain both if they are too different, but I think it is possible to have more common code than what I understand from the draft document.
Hope to get these questions resolved before writing any code:

If you are running Zowe in docker container(s), Caching API can be replaced with etcd.
The interface to read/update and delete key/value pairs are same as etcd.

etcd uses gRPC. Does that mean the caching API is a gRPC server?
Does this mean we will be making an abstraction layer for vsam/activemq/redis that communicates via gRPC?
Is the caching API going to have the same abilities as etcd, or just a subset?

About data storage, it says:

Zowe Caching API supports multiple backend persistent method. You can choose from below:
VSAM Data Set: when you are running Zowe in a single z/OS system, or in Parallel Sysplex, VSAM is default persistent option. Zowe configuration process can help you prepare the data set.
ActiveMQ: when you are running Zowe in Docker containers, ActiveMQ is the default persistent option. It is also possible to be used on z/OS Sysplex but require manual configuration.
Redis: is also supported if you can download, install and configure redis cluster on your z/OS system.

This says activemq is for docker instead of etcd? Is one better than the other?

Would we really launch with initial support for all 3, and then also have etcd outside of the caching api?

Is there an advantage to making the caching API be a gRPC server versus a REST server? For consistency should etcd be behind the caching api rather than an alternative to the caching api? It will probably make documentation & scripting confusing otherwise.

So then we have 4: VSAM, ActiveMQ, Redis, etcd. Do we need 4?

  • VSAM sounds simple.
  • since activemq and redis both exist on z/os and linux, is one better?
  • etcd doesnt yet exist on z/os. is it worth adding support just for docker if activemq and redis are cross-platform?

If you are running Zowe in docker container(s), the failover is provided by the container orchestrator, like Kubernetes or OpenShift.
Zowe launcher is not required when we run Zowe in Kubernetes or Openshift.

How does kubernetes and openshift replace zowe launcher responsibilities?
I think of kubernetes and openshift as filling an HA/FT role similar to what sysplex does for z/os. But within the sysplex, zowe launcher is needed. I believe zowe launcher is per-lpar. A container is like an lpar for us. Within the container, I think zowe launcher is still needed. Zowe launcher has decision making that is based on pids for unix servers. That would not change in docker. Portions of zowe launcher should be multiplatform so that the user experience is consistent on or off z/os. On z/os, zowe launcher writes to a job log. Off z/os, zowe launcher could write to a file.

@jackjia-ibm
Copy link
Member

Thanks Sean for the comments. I think adding etcd confused many things. I agree with you and think we shouldn't lock onto one solution like etcd. I will remove the etcd related requirements to make Caching API mode simple/generic and neutral.

The docker container section is more about components running into separated containers, not the all-in-one container. I treat the all-in-one container for development purpose and the fastest way to have a taste of Zowe. But for production, i think running components in separated containers is more flexible to scale and lifecycle of those pods can be handled by K8s natively.

@1000TurquoisePogs
Copy link
Member

The docker container section is more about components running into separated containers, not the all-in-one container. I treat the all-in-one container for development purpose and the fastest way to have a taste of Zowe. But for production, i think running components in separated containers is more flexible to scale and lifecycle of those pods can be handled by K8s natively.

That is not the current plan for docker, as it is almost the opposite of CUPIDS and would therefore require a significant documentation rework, new code, new testing. The current docker is all-in-one because it uses CUPIDS to configure which components run. zowe launcher can be in there to assist with component uptime, and the entire container uptime could be handled by kubernetes or openshift but we should save that for a phase 2 or 3 because it's very different work from the other tasks.

@1000TurquoisePogs
Copy link
Member

gRPC or not, the caching API solution seems like it involves a network.
On z/OS: when you eventually have instances of zowe on different lpars, where does the caching api live? 1 per instance or is it a central & single point of failure?
On docker: when you eventually have instance of zowe on different containers, where does it live? 1 per container or as its own container?
etcd in particular mentioned it had raft clustering, so redundancy is possible with it.

@jackjia-ibm
Copy link
Member

jackjia-ibm commented Sep 23, 2020

Caching API will be also registered under Discovery Service and routed by Gateway, so both redundancy and failover will be handled similar like other components. The only downside could be the caching api will be exposed to external, I have security concern on this and will try to find more details. :(

And thanks all the valuable feedbacks and suggestions. Here is the 4th version of Zowe-HA-Draft.docx and 3rd version of Zowe-HA-Architecture-View.pptx.

These are the changes comparing to previous version(s):

  • removed applying etcd interface to Caching API, and address Caching API will be regular RESTful API.
  • added coupling facility as one possible adaptor for caching api.
  • added ZIS/ZSS to the architecture view.

Pending questions:

  • Support plan for other persistent method for Caching API. Is VSAM good enough for phase 1.
  • If APIML authentication without z/OSMF, will it make differences on pre-reqs on Sysplex. Will sharing the same SAF DB be good enough?
  • How Zowe Launcher works in containerized Zowe.

I'm still trying how to organize the lines, please forgive my PowerPoint skills.

@jackjia-ibm
Copy link
Member

This is the latest (5th) version of Zowe-HA-Draft.docx.

The changes are:

  • Confirmed only VSAM data set is the first supported persistent mechanism for Caching API. We will figure out the support plan for other methods before we implement them.
  • Minor changes to address authentication/authorization is not introduced in first stage of Caching API.
  • Changed whether Zowe Launcher is required for containerization needs further discussion.

The architecture view is not changed, remains same as 3rd version Zowe-HA-Architecture-View.pptx.

@jackjia-ibm
Copy link
Member

For the purpose of preparing the draft, this github issue had been fulfilled. We have implementation issues to track progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants