-
Notifications
You must be signed in to change notification settings - Fork 549
[Rest Server] Expose task role completion policy #879
Conversation
Obsolete `killOnAnyComplete` and use task role completion policy.
0ee9807
to
0e98e97
Compare
job-tutorial/README.md
Outdated
| `taskRole.portType.portNumber` | Integer, required | Number of ports for the specific type | | ||
| `taskRole.command` | String, required | Executable command for tasks in the task role, can not be empty | | ||
| `taskRole.minFailedTaskCount` | Integer, optional | Number of failed tasks to kill the entire job, no less than 1 | | ||
| `taskRole.minSucceededTaskCount` | Integer, optional | Number of succeeded tasks to kill the entire job, no less than 1 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to link to launcher user manual for details, such as
https://github.com/Microsoft/pai/blob/master/frameworklauncher/doc/USERMANUAL.md#ApplicationCompletionPolicy
And at the bottom link to the entire doc:
https://github.com/Microsoft/pai/blob/master/frameworklauncher/doc/USERMANUAL.md #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated #Closed
rest-server/src/config/job.js
Outdated
minSucceededTaskCount: Joi.number() | ||
.integer() | ||
.min(1) | ||
.optional(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To distinguish field NOT EXIST and field is set to NULL (To allow user can set minFailedTaskCount = null).
Can we make:
minFailedTaskCount.default(1)
minSucceededTaskCount.default(null)
And change this:
'applicationCompletionPolicy': {
'minFailedTaskCount': data.taskRoles[i].minFailedTaskCount,
'minSucceededTaskCount': data.taskRoles[i].minSucceededTaskCount,
}, #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default value means if user doesn't set the field then the field will be set to default value, which is the same as current code.
We can make minFailedTaskCount.default(1)
but can't make minSucceededTaskCount.default(null)
because minSucceededTaskCount
is limited to integer in schema. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user set the minFailedTaskCount = null, from below line:
'minFailedTaskCount': data.taskRoles[i].minFailedTaskCount || 1,
What will you send to launcher? null or 1?
I expect you also send null to launcher.
In reply to: 202507760 [](ancestors = 202507760)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the minFailedTaskCount
is limited to integer, user can not set it to null
. I suggest we can use 0
to replace null
in config file.
And what's difference between minFailedTaskCount=null
and minFailedTaskCount=taskNumber
, are they the same? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, 2 taskrole A and B, and B is always running,
If you set A.minFailedTaskCount=null, then if all task in A are failed, the whole job will still run.
If you set A.minFailedTaskCount=A.taskNumber, then if all task in A are failed, the whole job will fail.
In reply to: 203646504 [](ancestors = 203646504)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use 0 to support null? Because minFailedTaskCount is greater or equal than 1 at present. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 0 is the last resort, since it is different from Launcher User Manaual, user will see misleading wiki. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And 0 means any failed task (because it is larger than 0) will trigger the whole app fail. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use -1 to substitute null if js really cannot accept null as int value.
In reply to: 203690975 [](ancestors = 203690975)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, allow null
in minFailedTaskCount
and minSucceededTaskCount
#Closed
Seems there are other places in PAI using "killAllOnCompletedTaskNumber", could you please search and clean up them? #Closed |
I have removed all |
In reply to: 405002761 [](ancestors = 405002761) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, #839 should be update after this pr merged.
@yqwang-ms Confirmed that it will work without #839. #Closed |
@Gerhut Thanks for confirm #Closed |
minSucceededTaskCount: Joi.number() | ||
.integer() | ||
.min(1) | ||
.allow(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, js allow null to be int value? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about how to validate one field on both int and null. #Closed
.integer() | ||
.min(1) | ||
.allow(null) | ||
.default(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test below things:
If user set the minFailedTaskCount = null, from below line:
'minFailedTaskCount': data.taskRoles[i].minFailedTaskCount,
What will you send to launcher?
I expect you also send null to launcher.
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. #Closed
.integer() | ||
.min(1) | ||
.allow(null) | ||
.default(null), | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have default, do we still need optional? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If user do not fill the field, the default value will be used. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you support backward compatibility? |
Obsolete
killOnAnyComplete
and use task role completion policy.