Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker Low Heap Memory Task Killer #21254

Merged
merged 1 commit into from
Nov 2, 2023

Conversation

swapsmagic
Copy link
Contributor

@swapsmagic swapsmagic commented Oct 26, 2023

Description

With this experimental feature, we want worker to kill tasks in case its running in low memory
(configurable) and during full GC not able to free up enough memory (configurable). For such scenario
worker identifies task consuming the most memory and kill it.

There are two strategies implemented in the change when task killer kicked in

  1. During full GC if worker is not able to free enough memory (configured) and heap usage is above configured threshold
  2. During frequent full GCs (configured) if worker not able to free enough memory (configured)

Motivation and Context

n Meta, we noticed instances where if we run worker on a low memory machine and
it runs high memory intensive workload, there are times when worker is running low
memory and full GC not able to free up enough memory. This result in worker OOMs when
worker continue progressing with the work trying to capture more memory.

Impact

Able to prevent worker OOM and able to run queries successfully instead of killing all running queries.

Test Plan

Ran shadow in presto verifier where with the feature disable all queries died due to REMOTE_TASK_ERROR or REMOTE_HOST_GONE due to worker OOM. With the feature enabled, subset of the queries failed with INSUFFICIENT_RESOURCES due to worker heap memory is high. And subset able to finish successfully.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add High Memory Task Killer which troggers if worker is running out of memory and garbage collection not able to reclaim enough memory. The feature is enabled by `experimental.task.high-memory-task-killer-enabled`.  There are two strategies for the task killer which can be set via `experiemental.task.high-memory-task-killer-strategy`. (default: `FREE_MEMORY_ON_FULL_GC`, `FREE_MEMORY_ON_FREQUENT_FULL_GC`).
   * Free memory on full garbage collection strategy triggers task killer if worker is running low memory and full GC is not able to reclaim enough memory. Low memory threhold is set by  `experimental.task.high-memory-task-killer-heap-memory-threshold`  and Full GC reclaim memory threhold is set by `experimental.task.high-memory-task-killer-reclaim-memory-threshold`
   * Free memory on frequent full garbage collection strategy triggers task killer if worker is having frequent garbage collection and enough memory is not reclaimed during the garbage collections. Frequent garbage collection is identified via `experimental.task.high-memory-task-killer-frequent-full-gc-duration-threshold` and reclaim memory threshold is configured via `experimental.task.high-memory-task-killer-reclaim-memory-threshold`

@swapsmagic swapsmagic requested a review from a team as a code owner October 26, 2023 23:12
@swapsmagic swapsmagic requested a review from presto-oss October 26, 2023 23:12
@agrawaldevesh
Copy link
Contributor

How is this feature being tested ? Can we also add some monitoring / metrics ?

}
}

private void onGCNotification(Notification notification)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be put into the GcStatsMonitor itself that already exists ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a new class due the existing one is for monitoring the GC activily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing back a bit: this one does that too except it acts on the gc signals

DataSize afterGcDataSize = info.getAfterGcTotal();
long garbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes();

if (isLowMemory() && garbageCollectedBytes < lowMemoryTaskKillThreshold.toBytes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only kill when we haven't been able to reclaim for a while. Like for a couple of times ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping track of the memory state is not accurate with this functionality as we won't have visibility into between last full GC and current one if the worker able to free up memory or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following. Why can you not keep the previous bytes after full gc in a class instance variable along with timestamp ?

DataSize afterGcDataSize = info.getAfterGcTotal();
long garbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes();

if (isLowMemory() && garbageCollectedBytes < lowMemoryTaskKillThreshold.toBytes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly confused: lowMemoryTaskKillThreshhold is used to multiply with max memory in isLowMemory method and is also used as an absolute value in the RHS ? Is it a ratio ? If not, why is GC bytes compared with it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowMemoryTaskKillThreshhold is meant to provide the memory threshold that if full GC able to reclaim then the task killer won't kick in. The other threshold is lowMemoryTaskKillerMemoryThreshold is used to mark when do we want to consider worker is running low memory.

if (queryId.isPresent()) {
List<SqlTask> activeTasksToKill = activeQueriesToTasksMap.get(queryId.get());
for (SqlTask sqlTask : activeTasksToKill) {
sqlTask.failed(new PrestoException(EXCEEDED_HEAP_MEMORY_LIMIT, "Worker heap memory limit exceeded"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more stats in this error msg ?

DataSize afterGcDataSize = info.getAfterGcTotal();
long garbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes();

if (isLowMemory() && garbageCollectedBytes < lowMemoryTaskKillThreshold.toBytes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some kind of a "quiet period" after it has killed a query to observe the effects ? Like how do we prevent this from going rogue ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full GC won't happen once the query has been killed unless there are more memory hungry queries running and consuming memory resulting in the same scenario. And in that case we want it to kick in and kill them to avoid jvm oom.

).sorted(comparator.reversed())
.map(Map.Entry::getKey)
.collect(toImmutableList());
return Optional.of(queryIdsSortedByMemoryUsage.get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use maxBy or such instead of sorting ?

new AbstractMap.SimpleEntry<>(entry.getKey(), entry.getValue().stream()
.map(SqlTask::getTaskInfo)
.map(TaskInfo::getStats)
.mapToLong(stats -> stats.getUserMemoryReservationInBytes() + stats.getSystemMemoryReservationInBytes())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add stats.getRevocableMemoryReservationInBytes() too?

@swapsmagic swapsmagic force-pushed the low_memory_query_killer branch 5 times, most recently from 94e8205 to 38fb6d3 Compare November 2, 2023 19:06
Copy link
Contributor

@NikhilCollooru NikhilCollooru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some unit tests ?

@swapsmagic swapsmagic force-pushed the low_memory_query_killer branch from 38fb6d3 to ed9516f Compare November 2, 2023 19:16
return lowMemoryTaskKillerEnabled;
}

@Config("experimental.task.low-memory-task-killer-enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be enabled is the other two configs are set appropriately? Trying to avoid a third Boolean config about this features

ManagementFactory.getPlatformMBeanServer().removeNotificationListener(objectName, gcNotificationListener);
}
catch (JMException ignored) {
log.error("Error removing notification: " + ignored);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.error(ignored, msg) to log the stack trace maybe

}
}

private boolean shouldTriggerTaskKiller(GarbageCollectionNotificationInfo info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the check for info.isMajorGc be done inside here ? Otherwise this method is only called for full gcs.

DataSize beforeGcDataSize = info.getBeforeGcTotal();
DataSize afterGcDataSize = info.getAfterGcTotal();

if (taskKillerStrategy == FREE_MEMORY_ON_FREQUENT_FULL_GC) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called FREE_MEMORY_... or just TRIGGER_ON_ or even KILL_ON_... ? It doesn't free any memory explicitly other than failing queries.

lastFullGCCollectedBytes = currentGarbageCollectedBytes;
}
else if (taskKillerStrategy == FREE_MEMORY_ON_FULL_GC) {
if (isLowMemory() && beforeGcDataSize.toBytes() - afterGcDataSize.toBytes() < reclaimMemoryThreshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use currentGarbageCollectedBytes here ... ?

Which strategy are we going to roll out with initially ?

return Optional.empty();
}

Comparator<Map.Entry<QueryId, Long>> comparator = Comparator.comparingLong(Map.Entry::getValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the .reversed here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to having static vs non static method, this is not straight moving the reversed at the top so going to address in followup PR.

lastFullGCCollectedBytes = currentGarbageCollectedBytes;
}
else if (taskKillerStrategy == FREE_MEMORY_ON_FULL_GC) {
if (isLowMemory() && beforeGcDataSize.toBytes() - afterGcDataSize.toBytes() < reclaimMemoryThreshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the check for isLowMemory be done common to both the approaches ?

long currentGarbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes();
long currentFullGCTimestamp = ticker.read();

if (isFrequentFullGC(lastFullGCTimestamp, currentFullGCTimestamp) && !hasFullGCFreedEnoughBytes(currentGarbageCollectedBytes)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the initial case of lastFullGCTimestamp = 0 handled ?

lastFullGCCollectedBytes = currentGarbageCollectedBytes;
}
else if (taskKillerStrategy == FREE_MEMORY_ON_FULL_GC) {
if (isLowMemory() && beforeGcDataSize.toBytes() - afterGcDataSize.toBytes() < reclaimMemoryThreshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second approach of FREE_MEM_ON_FULL_GC is very similar to the first one (even with the reclaimMemoryThreshhold check -- its just time invariant in that it does not look at the prev full gc.

Not a big deal, but can we commonify some code ?

@swapsmagic swapsmagic force-pushed the low_memory_query_killer branch from ed9516f to b07023b Compare November 2, 2023 22:11
In Meta, we noticed instances where if we run worker on a low memory machine and
it runs high memory intensive workload, there are times when worker is running low
memory and full GC not able to free up enough memory. This result in worker OOMs when
worker continue progressing with the work trying to capture more memory.

With this experimental feature, we want worker to kill tasks in case its running in low memory
(configurable) and during full GC not able to free up enough memory (configurable). For such scenario
worker identifies task consuming the most memory and kill it.

There are two strategies implemented in the change when task killer kicked in
1. During full GC if worker is not able to free enough memory (configured) and heap usage is above configured threshold
2. During frequent full GCs (configured) if worker not able to reclaim enough memory (configured)
@swapsmagic swapsmagic force-pushed the low_memory_query_killer branch from b07023b to a89e614 Compare November 2, 2023 22:18
@swapsmagic swapsmagic merged commit bd3cf42 into prestodb:master Nov 2, 2023
@swapsmagic swapsmagic deleted the low_memory_query_killer branch November 2, 2023 23:25
@wanglinsong wanglinsong mentioned this pull request Dec 8, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants