-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker Low Heap Memory Task Killer #21254
Conversation
How is this feature being tested ? Can we also add some monitoring / metrics ? |
} | ||
} | ||
|
||
private void onGCNotification(Notification notification) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be put into the GcStatsMonitor itself that already exists ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce a new class due the existing one is for monitoring the GC activily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing back a bit: this one does that too except it acts on the gc signals
DataSize afterGcDataSize = info.getAfterGcTotal(); | ||
long garbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes(); | ||
|
||
if (isLowMemory() && garbageCollectedBytes < lowMemoryTaskKillThreshold.toBytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only kill when we haven't been able to reclaim for a while. Like for a couple of times ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping track of the memory state is not accurate with this functionality as we won't have visibility into between last full GC and current one if the worker able to free up memory or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following. Why can you not keep the previous bytes after full gc in a class instance variable along with timestamp ?
DataSize afterGcDataSize = info.getAfterGcTotal(); | ||
long garbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes(); | ||
|
||
if (isLowMemory() && garbageCollectedBytes < lowMemoryTaskKillThreshold.toBytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am slightly confused: lowMemoryTaskKillThreshhold is used to multiply with max memory in isLowMemory method and is also used as an absolute value in the RHS ? Is it a ratio ? If not, why is GC bytes compared with it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowMemoryTaskKillThreshhold
is meant to provide the memory threshold that if full GC able to reclaim then the task killer won't kick in. The other threshold is lowMemoryTaskKillerMemoryThreshold
is used to mark when do we want to consider worker is running low memory.
if (queryId.isPresent()) { | ||
List<SqlTask> activeTasksToKill = activeQueriesToTasksMap.get(queryId.get()); | ||
for (SqlTask sqlTask : activeTasksToKill) { | ||
sqlTask.failed(new PrestoException(EXCEEDED_HEAP_MEMORY_LIMIT, "Worker heap memory limit exceeded")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more stats in this error msg ?
DataSize afterGcDataSize = info.getAfterGcTotal(); | ||
long garbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes(); | ||
|
||
if (isLowMemory() && garbageCollectedBytes < lowMemoryTaskKillThreshold.toBytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have some kind of a "quiet period" after it has killed a query to observe the effects ? Like how do we prevent this from going rogue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full GC won't happen once the query has been killed unless there are more memory hungry queries running and consuming memory resulting in the same scenario. And in that case we want it to kick in and kill them to avoid jvm oom.
).sorted(comparator.reversed()) | ||
.map(Map.Entry::getKey) | ||
.collect(toImmutableList()); | ||
return Optional.of(queryIdsSortedByMemoryUsage.get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use maxBy or such instead of sorting ?
new AbstractMap.SimpleEntry<>(entry.getKey(), entry.getValue().stream() | ||
.map(SqlTask::getTaskInfo) | ||
.map(TaskInfo::getStats) | ||
.mapToLong(stats -> stats.getUserMemoryReservationInBytes() + stats.getSystemMemoryReservationInBytes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add stats.getRevocableMemoryReservationInBytes()
too?
94e8205
to
38fb6d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some unit tests ?
presto-main/src/main/java/com/facebook/presto/memory/HighMemoryTaskKillerStrategy.java
Outdated
Show resolved
Hide resolved
38fb6d3
to
ed9516f
Compare
return lowMemoryTaskKillerEnabled; | ||
} | ||
|
||
@Config("experimental.task.low-memory-task-killer-enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be enabled is the other two configs are set appropriately? Trying to avoid a third Boolean config about this features
ManagementFactory.getPlatformMBeanServer().removeNotificationListener(objectName, gcNotificationListener); | ||
} | ||
catch (JMException ignored) { | ||
log.error("Error removing notification: " + ignored); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.error(ignored, msg) to log the stack trace maybe
} | ||
} | ||
|
||
private boolean shouldTriggerTaskKiller(GarbageCollectionNotificationInfo info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the check for info.isMajorGc be done inside here ? Otherwise this method is only called for full gcs.
DataSize beforeGcDataSize = info.getBeforeGcTotal(); | ||
DataSize afterGcDataSize = info.getAfterGcTotal(); | ||
|
||
if (taskKillerStrategy == FREE_MEMORY_ON_FREQUENT_FULL_GC) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called FREE_MEMORY_... or just TRIGGER_ON_ or even KILL_ON_... ? It doesn't free any memory explicitly other than failing queries.
lastFullGCCollectedBytes = currentGarbageCollectedBytes; | ||
} | ||
else if (taskKillerStrategy == FREE_MEMORY_ON_FULL_GC) { | ||
if (isLowMemory() && beforeGcDataSize.toBytes() - afterGcDataSize.toBytes() < reclaimMemoryThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use currentGarbageCollectedBytes here ... ?
Which strategy are we going to roll out with initially ?
return Optional.empty(); | ||
} | ||
|
||
Comparator<Map.Entry<QueryId, Long>> comparator = Comparator.comparingLong(Map.Entry::getValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put the .reversed here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to having static vs non static method, this is not straight moving the reversed at the top so going to address in followup PR.
lastFullGCCollectedBytes = currentGarbageCollectedBytes; | ||
} | ||
else if (taskKillerStrategy == FREE_MEMORY_ON_FULL_GC) { | ||
if (isLowMemory() && beforeGcDataSize.toBytes() - afterGcDataSize.toBytes() < reclaimMemoryThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the check for isLowMemory be done common to both the approaches ?
long currentGarbageCollectedBytes = beforeGcDataSize.toBytes() - afterGcDataSize.toBytes(); | ||
long currentFullGCTimestamp = ticker.read(); | ||
|
||
if (isFrequentFullGC(lastFullGCTimestamp, currentFullGCTimestamp) && !hasFullGCFreedEnoughBytes(currentGarbageCollectedBytes)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the initial case of lastFullGCTimestamp = 0 handled ?
lastFullGCCollectedBytes = currentGarbageCollectedBytes; | ||
} | ||
else if (taskKillerStrategy == FREE_MEMORY_ON_FULL_GC) { | ||
if (isLowMemory() && beforeGcDataSize.toBytes() - afterGcDataSize.toBytes() < reclaimMemoryThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second approach of FREE_MEM_ON_FULL_GC is very similar to the first one (even with the reclaimMemoryThreshhold check -- its just time invariant in that it does not look at the prev full gc.
Not a big deal, but can we commonify some code ?
presto-main/src/main/java/com/facebook/presto/memory/HighMemoryTaskKiller.java
Show resolved
Hide resolved
ed9516f
to
b07023b
Compare
In Meta, we noticed instances where if we run worker on a low memory machine and it runs high memory intensive workload, there are times when worker is running low memory and full GC not able to free up enough memory. This result in worker OOMs when worker continue progressing with the work trying to capture more memory. With this experimental feature, we want worker to kill tasks in case its running in low memory (configurable) and during full GC not able to free up enough memory (configurable). For such scenario worker identifies task consuming the most memory and kill it. There are two strategies implemented in the change when task killer kicked in 1. During full GC if worker is not able to free enough memory (configured) and heap usage is above configured threshold 2. During frequent full GCs (configured) if worker not able to reclaim enough memory (configured)
b07023b
to
a89e614
Compare
Description
With this experimental feature, we want worker to kill tasks in case its running in low memory
(configurable) and during full GC not able to free up enough memory (configurable). For such scenario
worker identifies task consuming the most memory and kill it.
There are two strategies implemented in the change when task killer kicked in
Motivation and Context
n Meta, we noticed instances where if we run worker on a low memory machine and
it runs high memory intensive workload, there are times when worker is running low
memory and full GC not able to free up enough memory. This result in worker OOMs when
worker continue progressing with the work trying to capture more memory.
Impact
Able to prevent worker OOM and able to run queries successfully instead of killing all running queries.
Test Plan
Ran shadow in presto verifier where with the feature disable all queries died due to REMOTE_TASK_ERROR or REMOTE_HOST_GONE due to worker OOM. With the feature enabled, subset of the queries failed with INSUFFICIENT_RESOURCES due to worker heap memory is high. And subset able to finish successfully.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.