-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry mechanics to pallet-scheduler
#3060
Add retry mechanics to pallet-scheduler
#3060
Conversation
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
bot bench-all pallet -v PIPELINE_SCRIPTS_REF=mak-bench-all-pallet --pallet=pallet_scheduler |
@mordamax https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5029575 was started for your command Comment |
@mordamax Command |
bot help |
Here's a link to docs |
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
@@ -481,8 +499,15 @@ pub mod pallet { | |||
/// will be removed from the schedule. | |||
/// | |||
/// Tasks which need to be scheduled for a retry are still subject to weight metering and | |||
/// agenda space, same as a regular task. Periodic tasks will have their periodic schedule | |||
/// put on hold while the task is retrying. | |||
/// agenda space, same as a regular task. If a periodic task fails, it will be scheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be worth mentioning that we talking about possible recoverable fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view on this is that we shouldn't mention in the pallet docs what type of calls this applies to because the pallet itself cannot make the distinction between a recoverable or unrecoverable fail. Only a user can do that and they should update their retry config accordingly.
@@ -1124,11 +1258,17 @@ impl<T: Config> Pallet<T> { | |||
}, | |||
Err(()) => Err((Overweight, Some(task))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this and the Err
arm above you abandon the retry entry if there is any.
I would move the retry handling one level above, for this you just need to add for Ok
varian an info if it failed or not.
or at least take it before execute_dispatch
.
please write tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this and the Err arm above you abandon the retry entry if there is any
The Err
arms above deal with completely different problems:
Err(Unavailable)
is a task which may never be able to run and it may never even be attempted againErr(Overweight)
is a task which couldn't run because the scheduler ran out of weight, but it will be run the next time the scheduler services agendas; the scheduler runs agendas from previous blocks if they didn't execute completely, in order of block number from earliest up until present
This means that an overweight task is not abandoned and will be run again soon, but also that there is absolutely nothing that we can do about it. There is no fallback for the current block if we run out of weight. This is the scheduler's behavior before this PR and I believe the retry mechanism has nothing to do with this.
I would move the retry handling one level above, for this you just need to add for Ok varian an info if it failed or not.
Most of the logic is common for failed or successful tasks, there would be a lot of copied code. But if you just want to refactor this, I could move the common logic out of the match
.
or at least take it before
execute_dispatch
Tasks are scheduled for retries if they actually run but fail in their execution. By definition, there is no way to handle this before running execute_dispatch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the task is removed if Unavailable
, but now I see that we keep it in the storage. This is why I proposed to remove the retry as well. And also I see now that Overweight
does not change it's address.
substrate/frame/scheduler/src/lib.rs
Outdated
/// derived from the original task's configuration, but will have a lower value for | ||
/// `remaining` than the original `total_retries`. Tasks scheduled as a result of a retry of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why you think it should be this way?
the way I think about it, is I want some call to be executed 5 times, and when any of it fail, I wanna retry it 3 times. I wanna be sure that every call will be given chance to be retried 3 times. If first exhausts all retries for some unexpected reason, I do not wanna leave the rest calls without retries. It is hard for me to model such situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you described it is the way it works.
A periodic task that fails at any point during its periodic execution will have a clone of itself scheduled as a retry. That clone will have a retry configuration attached with remaining
being total_retries - 1
at first. The original task still has its own unchanged retry configuration. The retry task will keep being retries until it runs out of retries. The periodic task keeps running periodically, potentially spawning other retry clones if it fails again.
For non-periodic tasks, there is no clone, they become the retry task because there no point in keeping clones around for something which will never run again.
In short, the point of the doc comment is to say that a retry configuration with remaining == total_retries
means it's the original task. A retry configuration with remaining < total_retries
means it's a retry clone.
I will try to make it clearer in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I do not know your motivation for a RP from you fork, but it makes it harder for me to pool you branch and review from my local machine. |
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
@@ -1124,11 +1258,17 @@ impl<T: Config> Pallet<T> { | |||
}, | |||
Err(()) => Err((Overweight, Some(task))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the task is removed if Unavailable
, but now I see that we keep it in the storage. This is why I proposed to remove the retry as well. And also I see now that Overweight
does not change it's address.
if weight | ||
.try_consume(T::WeightInfo::schedule_retry(T::MaxScheduledPerBlock::get())) | ||
.is_err() | ||
{ | ||
Self::deposit_event(Event::RetryFailed { | ||
task: (when, agenda_index), | ||
id: task.maybe_id, | ||
}); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, that it would be better to have this in service_task
context. Now the schedule_retry
is failable/no-op, which is not clear from service_task
context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside is that you can't know in service_task
ahead of time if the task will fail and that it has a retry config. You could check that you can consume the weight anyway, but you might be preventing users from running their tasks when it wasn't necessary.
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just wondering if you considered extending Scheduled
struct with RetryConfig
fields instead of creating a new Retries
storage item. In that way it would natively support retries when schedule()
without the need of doing it in two steps (schedule()
+ set_retry()
)
Signed-off-by: georgepisaltu <[email protected]>
Signed-off-by: georgepisaltu <[email protected]>
I considered it but I decided against it. This is a niche mechanic which we don't expect to be widely used right now. Doing it in |
9346019
Fixes paritytech#3014 This PR adds retry mechanics to `pallet-scheduler`, as described in the issue above. Users can now set a retry configuration for a task so that, in case its scheduled run fails, it will be retried after a number of blocks, for a specified number of times or until it succeeds. If a retried task runs successfully before running out of retries, its remaining retry counter will be reset to the initial value. If a retried task runs out of retries, it will be removed from the schedule. Tasks which need to be scheduled for a retry are still subject to weight metering and agenda space, same as a regular task. Periodic tasks will have their periodic schedule put on hold while the task is retrying. --------- Signed-off-by: georgepisaltu <[email protected]> Co-authored-by: command-bot <>
Fixes #3014
This PR adds retry mechanics to
pallet-scheduler
, as described in the issue above.Users can now set a retry configuration for a task so that, in case its scheduled run fails, it will be retried after a number of blocks, for a specified number of times or until it succeeds.
If a retried task runs successfully before running out of retries, its remaining retry counter will be reset to the initial value. If a retried task runs out of retries, it will be removed from the schedule.
Tasks which need to be scheduled for a retry are still subject to weight metering and agenda space, same as a regular task. Periodic tasks will have their periodic schedule put on hold while the task is retrying.