[Feature] flink cluster new status tracking #2425

xujiangfeng001 · 2023-03-10T01:54:09Z

Search before asking

I had searched in the feature and found no similar feature requirement.

Description

After discussion with the developers, it was decided that the new status function points for Flink cluster are as follows:

Heartbeat detection:
a. the heartbeat detection capability of the cluster is added. Different modes (yarn-session, remote, k8s-session) have different url acquisition methods, REST polling, and status acquisition.
Status update:
a. the cluster adds a status field to record the cluster status (running, shutdown, lost).
Failure alarm&failover:
a. If cluster shutdown or lost is detected, an alarm will be sent.
b. If the job is running on the cluster, the job will alarm in batches. At this time, it is necessary to prevent the job from alarming.
c. The cluster does not fail over itself.
d. The running job on the cluster will trigger the failover mechanism. At this time, you need to interrupt the failover mechanism of the job and set the job status to lost.
The operation logic changes:
a. Cluster deletion: the cluster is not bound to any job, and the cluster is stopped.
b. Cluster stop: no jobs are bound on the cluster, or all the bound jobs are not running.
c. Job start: if the job is in remote, k8s-session, yarn-session mode, you need to check whether the cluster bound to it is running, and then you can start it.
d. Job addition: if it is remote, k8s-session, yarn-session mode, the cluster selection drop-down box needs to filter the cluster that is not started, leaving only the cluster that is running.
e. Job modification: If the original mode is remote, k8s-session, yarn-session, and the bound cluster is not running, you cannot save it. You can only select another mode or switch to a running cluster.

The following is task splitting:

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

RocMarshal · 2023-03-10T05:44:24Z

Good job~ 👍

RocMarshal mentioned this issue Mar 10, 2023

[Bug] Flink cluster status cannot be updated #1906

Closed

3 tasks

wolfboys assigned xujiangfeng001 Mar 18, 2023

wolfboys added the feature/accepted This feature request is accepted label Mar 18, 2023

wolfboys mentioned this issue Mar 18, 2023

[OnlineMeeting&Mar.18]StreamPark community meeting Topic collect #2461

Open

3 tasks

RocMarshal mentioned this issue Apr 22, 2023

[ISSUE-2498][Feature] [SubTask] The cluster supports remote and yarn session heartbeat monitoring #2675

Merged

wolfboys mentioned this issue Oct 24, 2023

Tracking issues of StreamPark 2.2.0 #3278

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] flink cluster new status tracking #2425

[Feature] flink cluster new status tracking #2425

xujiangfeng001 commented Mar 10, 2023 •

edited

Loading

RocMarshal commented Mar 10, 2023

[Feature] flink cluster new status tracking #2425

[Feature] flink cluster new status tracking #2425

Comments

xujiangfeng001 commented Mar 10, 2023 • edited Loading

Search before asking

Description

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

RocMarshal commented Mar 10, 2023

xujiangfeng001 commented Mar 10, 2023 •

edited

Loading