-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve] Flink cluster status monitoring improvement #2826
Conversation
cc @xujiangfeng001 @RocMarshal PTAL, thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for @wolfboys @xujiangfeng001 driving this pr.
I left a few of comments, PTAL in your free time.
Thank you~
return ClusterState.RUNNING; | ||
default: | ||
return ClusterState.STOPPED; | ||
if (status == FinalApplicationStatus.UNDEFINED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the yarn application was accepted by yarn resourcemanager, the FinalApplicationStatus is UNDEFINED
too, but the flink cluster isn't RUNNING
This case was mentioned in #2541 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the yarn application was accepted by yarn resourcemanager, the FinalApplicationStatus is
UNDEFINED
too, but the flink cluster isn'tRUNNING
This case was mentioned in #2541 (comment)
I also have questions about this logic and further discussion is needed. @xujiangfeng001
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my negligence. The corresponding logic was not modified during the implementation of this piece of content. The logic here does not need to determine the finalStatus
of the application, only the application state
needs to be determined as running
to consider the flink cluster
as running
.
public enum YarnApplicationState {
/** Application which was just created. */
NEW,
/** Application which is being saved. */
NEW_SAVING,
/** Application which has been submitted. */
SUBMITTED,
/** Application has been accepted by the scheduler */
ACCEPTED,
/** Application which is currently running. */
RUNNING,
/** Application which finished successfully. */
FINISHED,
/** Application which failed. */
FAILED,
/** Application which was terminated by a user or admin. */
KILLED
}
<select id="countJobsByClusterId" resultType="java.lang.Integer" parameterType="java.lang.Long"> | ||
select | ||
count(1) | ||
from t_flink_app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to filter the status of the application here? #2809 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to filter the status of the application here? #2809 (comment)
Thanks for your feedback. I agree with you. I think We should filter the status of the application
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to filter the status of the application here? #2809 (comment)
Thanks for your feedback. I agree with you. I think We should filter the status of the application
It is difficult to directly query the data table for this issue. I think we need a cache to record the running jobs
of each flink cluster
.
Hi @wolfboys @RocMarshal ,I left a few of comments. Please take a look at it when you have time. |
streampark-console/streampark-console-service/src/main/assembly/script/schema/mysql-schema.sql
Outdated
Show resolved
Hide resolved
<select id="countJobsByClusterId" resultType="java.lang.Integer" parameterType="java.lang.Long"> | ||
select | ||
count(1) | ||
from t_flink_app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to filter the status of the application here? #2809 (comment)
Thanks for your feedback. I agree with you. I think We should filter the status of the application
It is difficult to directly query the data table for this issue. I think we need a cache to record the running jobs
of each flink cluster
.
return ClusterState.RUNNING; | ||
default: | ||
return ClusterState.STOPPED; | ||
if (status == FinalApplicationStatus.UNDEFINED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my negligence. The corresponding logic was not modified during the implementation of this piece of content. The logic here does not need to determine the finalStatus
of the application, only the application state
needs to be determined as running
to consider the flink cluster
as running
.
public enum YarnApplicationState {
/** Application which was just created. */
NEW,
/** Application which is being saved. */
NEW_SAVING,
/** Application which has been submitted. */
SUBMITTED,
/** Application has been accepted by the scheduler */
ACCEPTED,
/** Application which is currently running. */
RUNNING,
/** Application which finished successfully. */
FINISHED,
/** Application which failed. */
FAILED,
/** Application which was terminated by a user or admin. */
KILLED
}
cc @RocMarshal @xujiangfeng001 PTAL |
ClusterState state = getClusterStateFromFlinkAPI(flinkCluster); | ||
if (ClusterState.isRunningState(state)) { | ||
public ClusterState getClusterState(FlinkCluster flinkCluster) { | ||
ClusterState state = FAILED_STATES.getIfPresent(flinkCluster.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a problem with the logic in this method: here, in Yarn Session mode, it won't go jobmanagerUrl
request status, only yarn restful api
requests will be processed. This is not in line with our original intention of designing the jobmanagerUrl
field.
Hi @wolfboys ,this generally looks good, I left a comment. If I make a mistake, please correct me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wolfboys & @xujiangfeng001 .
LGTM +1.
[Improve] Flink cluster status monitoring improvement