Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disallow retryInterval when fail over is enabled #10521

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jndiName=JNDI name
jndiName.desc=JNDI name.

missedTaskThreshold=Missed task threshold for fail over
missedTaskThreshold.desc=Amount of time beyond the expected start of a task execution to reserve for running it. Other members are prevented from running the task before the expiration of this interval. If the interval elapses without successful execution of the task, then the task execution is considered missed, enabling another member to attempt to run it. This enables fail over.
missedTaskThreshold.desc=The amount of time after the expected start of a task run to reserve for running the task. Other members are prevented from running the task before the expiration of this interval. If the interval elapses without the task running successfully, or the task rolls back, then the task run is considered missed, enabling another member to attempt to run it. Missed task threshold values within the supported range of 100 seconds to 9000 seconds (2.5 hours) enable failover.

pollInterval=Poll interval
pollInterval.desc=Interval at which the executor looks for tasks in the persistent store to run. If unspecified and fail over is enabled, a poll interval is automatically computed. If fail over is not enabled, the default is -1, which disables all polling after the initial poll.
Expand All @@ -50,7 +50,7 @@ pollSize=Poll size
pollSize.desc=The maximum number of task entries to find when polling the persistent store for tasks to run. If unspecified, there is no limit.

retryInterval=Retry interval
retryInterval.desc=The amount of time that must pass between consecutive retries of a failed task. The retry interval applies only to the server on which the failure occurred. When failover is enabled, servers that did not see the failure retry at their next poll. When failover is not enabled, the first retry occurs immediately on the same server, and at the retry interval thereafter. In the absence of a configured value, a default is used. If failover is enabled, the default retry interval is computed from the poll interval and the missed task threshold. If failover is not enabled, the default is 1 minute.
retryInterval.desc=The amount of time that must pass between consecutive retries of a failed task. The retry interval applies only when failover is disabled. When failover is enabled, servers retry at their next poll. When failover is not enabled, the first retry occurs immediately on the same server, and at the retry interval thereafter. The default retry interval is 1 minute.

retryLimit=Retry limit
retryLimit.desc=Limit of consecutive retries for a task that has failed or rolled back, after which the task is considered permanently failed and does not attempt further retries. A value of -1 allows for unlimited retries.
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,9 @@ CWWKC1520.out.of.range=CWWKC1520E: Configured value {0} for {1} is not within th
CWWKC1520.out.of.range.explanation=A value that is outside of the allowed range is configured for the specified property.
CWWKC1520.out.of.range.useraction=Configure a value within the allowed range.

CWWKC1521.less.than.min=CWWKC1521E: Configured value {0} for {1} cannot be less than the configured value for {2}, which is {3}.
CWWKC1521.less.than.min.explanation=Configuring a larger retryInterval encourages other servers to retry the failed task instead of the server on which it failed.
CWWKC1521.less.than.min.useraction=Configure the retryInterval property so that it is larger than the missedTaskThreshold.
CWWKC1521.not.compatible=CWWKC1521E: The {0} configuration attribute is not valid when the {1} configuration attribute is enabled.
CWWKC1521.not.compatible.explanation=The specified configuration attributes are not compatible.
CWWKC1521.not.compatible.useraction=Update the configuration to remove one of the configuration attributes.

CWWKC1540.thread.cannot.submit.tasks=CWWKC1540E: You cannot schedule persistent tasks from the current thread context.
CWWKC1540.thread.cannot.submit.tasks.explanation=Schedule persistent tasks only from a thread that is associated with an application or feature with a serializable class loader identity.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,10 @@ else if (missedTaskThreshold < TimeUnit.MINUTES.toSeconds(30))
}
pollInterval = pollIntrvl;

// Default the retry interval to match the poll interval (or lacking that, the missed task threshold) when fail over is enabled.
// Default the retry interval to disabled when fail over is enabled.
if (retryIntrvl == null) {
if (missedTaskThreshold > 0) {
retryInterval = enableTaskExecution && pollInterval > 0 ? pollInterval : TimeUnit.SECONDS.toMillis(missedTaskThreshold);
retryInterval = -1; // disabled
} else {
retryInterval = TimeUnit.MINUTES.toMillis(1); // the old default for single-server, which cannot be changed
}
Expand All @@ -130,12 +130,13 @@ else if (missedTaskThreshold < TimeUnit.MINUTES.toSeconds(30))
if (pollInterval < -1 || missedTaskThreshold > 0 && (!ignoreMin && pollInterval < 100000 && pollInterval != -1 || pollInterval > 9000000)) // disallow below 100 seconds and above 2.5 hours
throw new IllegalArgumentException(Tr.formatMessage(tc, "CWWKC1520.out.of.range",
toString(pollInterval, TimeUnit.MILLISECONDS), "pollInterval", "100s", "2h30m"));
if (retryInterval < 0)
if (retryInterval < 0 && missedTaskThreshold == -1)
throw new IllegalArgumentException("retryInterval: " + retryInterval + "ms");
else if (missedTaskThreshold > 0 && retryIntrvl != null && retryLimit != 0 && !ignoreMin && retryIntrvl < missedTaskThreshold * 1000)
throw new IllegalArgumentException(Tr.formatMessage(tc, "CWWKC1521.less.than.min",
toString(retryInterval, TimeUnit.MILLISECONDS), "retryInterval",
"missedTaskThreshold", toString(missedTaskThreshold, TimeUnit.SECONDS)));
else if (retryInterval >= 0 && missedTaskThreshold > 0) {
// Allow the configuration of the built-in EJB persistent timers executor, but otherwise reject enablement of retryInterval when fail over is enabled.
if (!(retryInterval == TimeUnit.SECONDS.toMillis(300) && "defaultEJBPersistentTimerExecutor".equals(id)))
throw new IllegalArgumentException(Tr.formatMessage(tc, "CWWKC1521.not.compatible", "retryInterval", "missedTaskThreshold"));
}
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,8 @@ public int hashCode() {
* @param consecutiveFailureCount number of consecutive task failures
* @param config snapshot of persistent executor configuration
* @param taskName identity name for the task
* @param expectedStart the expected start time of the task execution
*/
private void processRetryableTaskFailure(Throwable failure, ClassLoader loader, short consecutiveFailureCount, Config config, String taskName, long expectedStart) {
private void processRetryableTaskFailure(Throwable failure, ClassLoader loader, short consecutiveFailureCount, Config config, String taskName) {
taskName = taskName == null || taskName.length() == 0 || taskName.length() == 1 && taskName.charAt(0) == ' ' ? String.valueOf(taskId) // empty task name
: taskId + " (" + taskName + ")";
TaskStore taskStore = persistentExecutor.taskStore;
Expand Down Expand Up @@ -177,17 +176,17 @@ private void processRetryableTaskFailure(Throwable failure, ClassLoader loader,
updates.setConsecutiveFailureCount(consecutiveFailureCount);
updates.setResult(persistentExecutor.serialize(taskFailure));
updates.setState((short) (TaskState.ENDED.bit | TaskState.FAILURE_LIMIT_REACHED.bit));
if (config.missedTaskThreshold > 0)
updates.setClaimExpiryOrPartition(-1); // immediately allow another server to claim the task
TaskRecord expected = new TaskRecord(false);
expected.setId(taskId);
taskStore.persist(updates, expected);
} else {
// -1 indicates the task is no longer in the persistent store
retry = consecutiveFailureCount != -1;

if (retry) {
String seconds = consecutiveFailureCount == 1 && config.missedTaskThreshold == -1 || config.retryInterval == 0L //
? "0" //
: NumberFormat.getInstance().format(config.retryInterval / 1000.0);
if (retry && config.missedTaskThreshold == -1) {
String seconds = consecutiveFailureCount == 1 || config.retryInterval == 0L ? "0" : NumberFormat.getInstance().format(config.retryInterval / 1000.0);
if (failure == null)
Tr.warning(tc, "CWWKC1500.task.rollback.retry", persistentExecutor.name, taskName, seconds);
else
Expand All @@ -213,23 +212,17 @@ private void processRetryableTaskFailure(Throwable failure, ClassLoader loader,
retry = true;
}

if (retry == true) {
if (retry && config.missedTaskThreshold == -1) {
// Retry the first failure immediately when fail over is disabled
if (consecutiveFailureCount == 1 && config.missedTaskThreshold < 0 || config.retryInterval == 0L)
persistentExecutor.scheduledExecutor.submit(this);
else {
long delay = config.retryInterval;
if (config.missedTaskThreshold > 0) {
// Avoid rescheduling before the current claim runs out
long elapsed = System.currentTimeMillis() - expectedStart;
long remainingClaimed = config.missedTaskThreshold - elapsed;
if (remainingClaimed > delay)
delay = remainingClaimed + 1000;
}
persistentExecutor.scheduledExecutor.schedule(this, delay, TimeUnit.MILLISECONDS);
}
} else {
persistentExecutor.inMemoryTaskIds.remove(taskId);
}

}

/**
Expand Down Expand Up @@ -630,7 +623,6 @@ public void run() {

runningTaskState.remove();

long expectedStart = expectedExecTime;
try {
tranMgr.setTransactionTimeout(0); // clear the value so we don't impact subsequent transactions on this thread

Expand All @@ -642,7 +634,7 @@ public void run() {
tranMgr.rollback();
if (config == null)
config = persistentExecutor.configRef.get();
processRetryableTaskFailure(failure, loader, nextFailureCount, config, taskName, expectedStart);
processRetryableTaskFailure(failure, loader, nextFailureCount, config, taskName);
} else {
if (taskIdForPropTable != null)
try {
Expand Down Expand Up @@ -687,7 +679,7 @@ public void run() {
failure = x;

// Retry the task if an error occurred
processRetryableTaskFailure(failure, loader, nextFailureCount, config, taskName, expectedStart);
processRetryableTaskFailure(failure, loader, nextFailureCount, config, taskName);
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ public static void setUp() throws Exception {
originalConfig = server.getServerConfiguration();
ServerConfiguration config = originalConfig.clone();
PersistentExecutor myScheduler = config.getPersistentExecutors().getBy("jndiName", "concurrent/myScheduler");
myScheduler.setPollInterval("2h30m"); // the test case does not expect polling, so set a large value that will never be reached
myScheduler.setRetryInterval("6s");
myScheduler.setInitialPollDelay("2s");
myScheduler.setPollInterval("2s500ms"); // a couple of tests require polling in order to perform retries
myScheduler.setMissedTaskThreshold("6s");
myScheduler.setExtraAttribute("ignore.minimum.for.test.use.only", "true");
server.updateServerConfiguration(config);
Expand All @@ -97,6 +97,8 @@ public static void tearDown() throws Exception {
if (server.isStarted())
server.stopServer("CWWKC1500W", //Task rolled back
"CWWKC1501W", //Task rolled back due to failure ...
"CWWKC1502W", //Task rolled back, retry time unspecified
"CWWKC1503W", //Task rolled back due to failure ..., retry time unspecified
"CWWKC1510W", //Task rolled back and aborted
"CWWKC1511W", //Task rolled back and aborted. Failure is ...
"DSRA0174W"); //Generic Datasource Helper
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ public static void setUp() throws Exception {
persistentExecutor.setExtraAttribute("ignore.minimum.for.test.use.only", "true");
persistentExecutor.setMissedTaskThreshold("5s");
persistentExecutor.setPollInterval("3s");
persistentExecutor.setInitialPollDelay("1s");

PersistentExecutor belowMinMissedTaskThresholdExecutor = new PersistentExecutor();
belowMinMissedTaskThresholdExecutor.setId("belowMinMissedTaskThresholdExecutor");
Expand Down Expand Up @@ -141,15 +142,15 @@ public static void setUp() throws Exception {
exceedsMaxPollIntervalExecutor.setInitialPollDelay("-1");
config.getPersistentExecutors().add(exceedsMaxPollIntervalExecutor);

PersistentExecutor retryIntervalBelowMissedTaskThresholdExecutor = new PersistentExecutor();
retryIntervalBelowMissedTaskThresholdExecutor.setId("retryIntervalBelowMissedTaskThresholdExecutor");
retryIntervalBelowMissedTaskThresholdExecutor.setJndiName("concurrent/retryIntervalBelowMissedTaskThreshold");
retryIntervalBelowMissedTaskThresholdExecutor.setTaskStoreRef("DBTaskStore");
retryIntervalBelowMissedTaskThresholdExecutor.setMissedTaskThreshold("1m45s");
retryIntervalBelowMissedTaskThresholdExecutor.setPollInterval("28m");
retryIntervalBelowMissedTaskThresholdExecutor.setRetryInterval("14s");
retryIntervalBelowMissedTaskThresholdExecutor.setInitialPollDelay("-1");
config.getPersistentExecutors().add(retryIntervalBelowMissedTaskThresholdExecutor);
PersistentExecutor retryIntervalAndMissedTaskThresholdBothEnabledExecutor = new PersistentExecutor();
retryIntervalAndMissedTaskThresholdBothEnabledExecutor.setId("retryIntervalAndMissedTaskThresholdBothEnabled");
retryIntervalAndMissedTaskThresholdBothEnabledExecutor.setJndiName("concurrent/retryIntervalAndMissedTaskThresholdBothEnabled");
retryIntervalAndMissedTaskThresholdBothEnabledExecutor.setTaskStoreRef("DBTaskStore");
retryIntervalAndMissedTaskThresholdBothEnabledExecutor.setMissedTaskThreshold("145s");
retryIntervalAndMissedTaskThresholdBothEnabledExecutor.setPollInterval("28m");
retryIntervalAndMissedTaskThresholdBothEnabledExecutor.setRetryInterval("3m14s");
retryIntervalAndMissedTaskThresholdBothEnabledExecutor.setInitialPollDelay("-1");
config.getPersistentExecutors().add(retryIntervalAndMissedTaskThresholdBothEnabledExecutor);

config.getDataSources().getById("SchedDB").getConnectionManagers().get(0).setMaxPoolSize("10");
server.updateServerConfiguration(config);
Expand Down Expand Up @@ -431,16 +432,16 @@ public void testRetryFailedTaskNoAutoPurgeFEWithPolling() throws Exception {
}

/**
* testRetryIntervalBelowMissedTaskThreshold - attempt to use a persistent executor where the retryInterval value is less than
* the missedTaskThreshold. Expect IllegalArgumentException with a translatable message.
* testRetryIntervalAndMissedTaskThresholdBothEnabled - attempt to use a persistent executor where the retryInterval and
* the missedTaskThreshold are both configured. Expect IllegalArgumentException with a translatable message.
*/
@Test
public void testRetryIntervalBelowMissedTaskThreshold() throws Exception {
public void testRetryIntervalAndMissedTaskThresholdBothEnabled() throws Exception {
server.setMarkToEndOfLog();

runInServlet("testRetryIntervalBelowMissedTaskThreshold");
runInServlet("testRetryIntervalAndMissedTaskThresholdBothEnabled");

List<String> errorMessages = server.findStringsInLogsUsingMark("CWWKE0701E.*14s", server.getConsoleLogFile());
List<String> errorMessages = server.findStringsInLogsUsingMark("CWWKE0701E.*CWWKC1521E", server.getConsoleLogFile());
if (errorMessages.isEmpty())
throw new Exception("Error message not found in log.");

Expand All @@ -449,8 +450,7 @@ public void testRetryIntervalBelowMissedTaskThreshold() throws Exception {
if (!errorMessage.contains("IllegalArgumentException")
|| !errorMessage.contains("CWWKC1521E")
|| !errorMessage.contains("retryInterval")
|| !errorMessage.contains("missedTaskThreshold")
|| !errorMessage.contains("105s"))
|| !errorMessage.contains("missedTaskThreshold"))
throw new Exception("Problem with substitution parameters in message " + errorMessage);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ public static void setUp() throws Exception {
ServerConfiguration config = originalConfig.clone();
PersistentExecutor persistentExecutor = config.getPersistentExecutors().getBy("jndiName", "concurrent/myScheduler");
persistentExecutor.setExtraAttribute("ignore.minimum.for.test.use.only", "true");
persistentExecutor.setInitialPollDelay("-1");
persistentExecutor.setInitialPollDelay("2s");
persistentExecutor.setMissedTaskThreshold("4s");
config.getDataSources().getById("SchedDB").getConnectionManagers().get(0).setMaxPoolSize("10");
server.updateServerConfiguration(config);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -864,13 +864,13 @@ public void testRetryFailedTaskNoAutoPurge(HttpServletRequest request, PrintWrit
}

/**
* testRetryIntervalBelowMissedTaskThreshold - attempt to use a persistent executor where the retryInterval value is less than
* the missedTaskThreshold. The detailed error message that is logged is tested by the caller of this method.
* testRetryIntervalAndMissedTaskThresholdBothEnabled - attempt to use a persistent executor where the retryInterval and
* the missedTaskThreshold are both enabled. The detailed error message that is logged is tested by the caller of this method.
*/
public void testRetryIntervalBelowMissedTaskThreshold(HttpServletRequest request, PrintWriter out) throws Exception {
public void testRetryIntervalAndMissedTaskThresholdBothEnabled(HttpServletRequest request, PrintWriter out) throws Exception {
try {
PersistentExecutor misconfiguredExecutor = InitialContext.doLookup("concurrent/retryIntervalBelowMissedTaskThreshold");
throw new Exception("Should not be able to obtain misconfigured persistentExecutor where the retryInterval value is less than the missedTaskThreshold. " + misconfiguredExecutor);
PersistentExecutor misconfiguredExecutor = InitialContext.doLookup("concurrent/retryIntervalAndMissedTaskThresholdBothEnabled");
throw new Exception("Should not be able to obtain misconfigured persistentExecutor where the retryInterval and missedTaskThreshold are both enabled. " + misconfiguredExecutor);
} catch (NamingException x) {
// expected
}
Expand Down
Loading