Update Aggregator and AggregatorFactory interfaces to improve mem estimates #12073

kfaraz · 2021-12-15T15:45:51Z

Fixes #12022

Description

The current implementation of memory estimation in OnHeapIncrementalIndex:

uses guessAggregatorHeapFootprint() to calculate the max memory an metric aggregator can use
multiplies the above value by the number of aggregators (same as number of aggregated rows or number of unique row keys)

Current implementation of StringDimensionIndexer

encodes dimension values and calculates size of encoded values as well as original dimension values every time, irrespective of whether the dimension values are already added to the dictionary

Because of the above, the memory usage tends to be over-estimated which leads to more persistence cycles than necessary.

This PR replaces the max estimation mechanism with getting the actual incremental memory used by the aggregator or indexer at each invocation of aggregate or encode respectively.

Changes

Add new flag useMaxMemoryEstimates in the task context. This overrides the same flag in DefaultTaskConfig i.e. druid.indexer.task.default.context map
useMaxMemoryEstimates = true(default value) denotes the current method of estimation
Add method AggregatorFactory.factorizeWithSize() that returns an AggregatorAndSize which contains
- the Aggregator instance
- initial size in bytes of the aggregator
Add method Aggregator.aggregateWithSize()
- returns a long representing the incremental memory used by this aggregation
- default implementation calls aggregate() and returns 0
Remove method DimensionIndexer.estimateEncodedKeyComponentSize()
Update the method DimensionIndexer.processRowValsToKeyComponent() to return EncodedKeyComponent<EncodedType> which contains:
- EncodedType keyComponent: e.g. int[] for StringDimensionIndexer, Long for LongDimensionIndexer
- long effectiveSizeBytes: Effective size of the key component.
Update OnHeapIncrementalIndex to use the new estimations only if useMaxMemoryEstimates = false
Update Aggregator impls

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

lgtm-com · 2022-01-03T05:38:27Z

This pull request introduces 2 alerts when merging 780e50da8202e31f69d9c6ceacdd42ed4f14e57c into fe71fc4 - view on LGTM.com

new alerts:

1 for Result of multiplication cast to wider type
1 for Useless null check

...es/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchAggregator.java

imply-cheddar · 2022-01-04T04:56:43Z

...es/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchAggregator.java

+      sketchField.setAccessible(true);
+    }
+    catch (NoSuchFieldException | ClassNotFoundException e) {
+      LOG.error(e, "Could not initialize 'sketchField'");


This will only happen if someone happens to have loaded a new/different version of sketches than is actually depended on by this current code. If that happens, this error will put something into the logs that will be ignored (people don't look at logs until something actually explodes) and then silently ignore things. When they are silently ignored, the estimation becomes incorrect and potentially starts causing OOMs where OOMs didn't exist previously. If this happens, it will be super hard to track down why it happened.

I would recommend that we actually explode loudly throwing an error out of the static initializer (which should effectively kill the process from actually starting in the first place). If we want a way for someone to say "I know what I'm doing, ignore this please", we can add an extra config that the error message in the exception points to as a way to ignore things.

Okay. For now, we can just fail loudly. The additional config can be done as a follow up.

...es/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchAggregator.java

processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java

processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java

server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorConfig.java

lgtm-com · 2022-01-11T18:14:42Z

This pull request introduces 3 alerts when merging 348f17e into eb0bae4 - view on LGTM.com

new alerts:

2 for Unused format argument
1 for Result of multiplication cast to wider type

imply-cheddar · 2022-01-12T05:25:54Z

processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java

@@ -478,6 +479,11 @@ public IncrementalIndexAddResult add(InputRow row) throws IndexSizeExceededExcep
    return add(row, false);
  }

+  public IncrementalIndexAddResult add(InputRow row, boolean skipMaxRowsInMemoryCheck) throws IndexSizeExceededException
+  {
+    return add(row, false, true);


This appears to be ignoring skipMaxRowsInMemoryCheck is that intentional? Probably worth a comment as to why that's intentional if it is.

Thanks a lot for catching this! I must have missed it.

clintropolis

overall makes sense 👍

@kfaraz have you done any measurement of the performance impact before/after this change so we know what we are getting ourselves into? Not sure which benchmarks would be most appropriate off the top of my head

clintropolis · 2022-01-12T11:59:25Z

processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java

@@ -160,4 +173,16 @@ public int getIdForNull()
      lock.readLock().unlock();
    }
  }
+
+  private long getObjectSize(@Nonnull T object)


hmm, this method is presumptuous and breaks the contract of this class being generic. I think a size estimator function should be passed into this method, it needs to be public so that callers can override it, or maybe it should be abstract and some StringDimensionDictionary should be implemented to minimize function calls since its going to be a pretty hot method

Thanks for pointing this out! I will see how we can make this cleaner.

Used the StringDimensionDictionary suggestion, although I have not made it abstract so that implementations using the DimensionDictionary can continue to use it as the base concrete class.

clintropolis · 2022-01-12T11:59:51Z

processing/src/main/java/org/apache/druid/query/aggregation/AggregatorAndSize.java

+public class AggregatorAndSize
+{
+
+  // TODO: include default overhead for object sizes


nit: unresolved todo

Fixed. Addressed in the caller.

clintropolis · 2022-01-12T12:05:35Z

processing/src/main/java/org/apache/druid/segment/DimensionIndexer.java

@@ -127,7 +127,7 @@
   * @return An array containing an encoded representation of the input row value.
   */
  @Nullable
-  EncodedKeyComponentType processRowValsToUnsortedEncodedKeyComponent(@Nullable Object dimValues, boolean reportParseExceptions);
+  EncodedDimensionValue<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyComponent(@Nullable Object dimValues, boolean reportParseExceptions);


nit: javadoc params and return type needs updated.. but this isn't really new, it basically needed updated long ago, it is not always an array 😅

clintropolis · 2022-01-12T12:18:11Z

server/src/main/java/org/apache/druid/segment/realtime/plumber/RealtimePlumber.java

@@ -271,6 +271,7 @@ private Sink getSink(long timestamp)
          config.getAppendableIndexSpec(),
          config.getMaxRowsInMemory(),
          config.getMaxBytesInMemoryOrDefault(),
+          true,


i've noticed a few places that aren't wired up to config are using true, so using the new behavior with no way out, is that intentional? this one in particular probably doesn't matter all that much these days (i hope), but i'm less sure about all of them, and it isn't consistent because i see some hard coded false in there too

The value of the flag useMaxMemoryEstimates = true represents old behaviour.

The hard coding has been done only for the following classes:

RealtimePlumber and related classes (hopefully not used anymore)

OnHeapIncrementalIndexBenchmark (accidentally hardcoded to false, fixing this)

It should not be hard-coded anywhere else except maybe tests.

clintropolis · 2022-01-12T12:24:34Z

processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java

@@ -557,12 +573,17 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow row)
        DimensionIndexer indexer = desc.getIndexer();
        Object dimsKey = null;
        try {
-          dimsKey = indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true);
+          final EncodedDimensionValue<?> encodedDimensionValue
+              = indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true);


i guess due to the way this refactor was done (compared to aggs) there is no real way to turn off calculating the estimates, even if we aren't using them. maybe it doesn't matter if there is no/minimal performance impact

Yes, I was not too sure about this either. I will take another look and see if we can separate the two flows without too much duplication.

Updated DimensionDictionary and StringDimensionIndexer to turn off estimations if not needed.

kfaraz · 2022-01-12T13:40:36Z

have you done any measurement of the performance impact before/after this change so we know what we are getting ourselves into?

Thanks for the review, @clintropolis !
I don't have enough perf numbers yet which is why I have put the changes behind a flag for the time being.
I am working on the perf evaluation. Once we are satisfied with the numbers, we can get rid of the flag altogether.

abhishekagarwal87

Looks good overall. You should also document the approach that was finalized in the proposal and this implementation is based on. I think AggregatorFactory#factorizeWithSize is probably a good place for writing that doc and overall approach.

abhishekagarwal87 · 2022-01-12T14:46:54Z

processing/src/main/java/org/apache/druid/query/aggregation/Aggregator.java

@@ -42,6 +42,12 @@
 {
  void aggregate();

+  default long aggregateWithSize()


this method needs javadocs

Thanks for the reminder, @abhishekagarwal87. I am adding javadocs wherever missing.

what is the contract of this value, I don't see the word estimate in here, but think it probably should be... should implementors over-estimate if exact sizing is not possible or is under-estimating fine? Should there be a warning that the default estimate is used? (i imagine this would be very noisy if it is done per aggregate call... so don't really recommend doing it here or anything...)

Should there be a warning that the default estimate is used?

It would make sense to give this warning when factorizeWithSize is overridden but aggregateWithSize is not. In such a case, we might be significantly underestimating the memory usage.

As you said, doing it here might be noisy. A viable approach could be to have factorizeWithSize return a wrapper Aggregator which does not allow the regular aggregate (will be addressed in the subsequent PR).

abhishekagarwal87 · 2022-01-12T14:50:21Z

processing/src/main/java/org/apache/druid/query/aggregation/AggregatorAndSize.java

+  // TODO: include default overhead for object sizes
+
+  private final Aggregator aggregator;
+  private final long initialSizeBytes;


can you add more info like is this total on-heap footprint that includes JVM object overhead or that overhead is not considered in this initial size?

Added. It should account for JVM object overhead too.

abhishekagarwal87 · 2022-01-14T14:12:25Z

processing/src/main/java/org/apache/druid/segment/EncodedDimensionValue.java

+{
+  @Nullable
+  private final K value;
+  private final long incrementalSizeBytes;


can you add a comment on what this thing is the incremental size of

abhishekagarwal87 · 2022-01-14T14:41:40Z

processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java

@@ -160,4 +173,16 @@ public int getIdForNull()
      lock.readLock().unlock();
    }
  }
+
+  private long getObjectSize(@Nonnull T object)


abhishekagarwal87 · 2022-01-14T16:13:09Z

...main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchAggregatorFactory.java

@@ -80,6 +81,14 @@ public Aggregator factorize(ColumnSelectorFactory metricFactory)
    return new SketchAggregator(selector, size);
  }

+  @Override
+  public AggregatorAndSize factorizeWithSize(ColumnSelectorFactory metricFactory)


this needs some documentation.

kfaraz · 2022-01-17T06:30:59Z

You should also document the approach that was finalized in the proposal and this implementation is based on.

@abhishekagarwal87 , I have added an overview of the approach in OnHeapIncrementalIndex.
I have also added javadocs for the methods in Aggregator and AggregatorFactory, but they contain information only relevant to those methods.

Please let me know if this is sufficient.

lgtm-com · 2022-01-17T06:46:12Z

This pull request introduces 1 alert when merging 9908c0b into b55f7a2 - view on LGTM.com

new alerts:

1 for Result of multiplication cast to wider type

…p_mem_estimate

clintropolis · 2022-01-25T11:19:13Z

processing/src/main/java/org/apache/druid/query/aggregation/Aggregator.java

@@ -42,6 +42,12 @@
 {
  void aggregate();

+  default long aggregateWithSize()


what is the contract of this value, I don't see the word estimate in here, but think it probably should be... should implementors over-estimate if exact sizing is not possible or is under-estimating fine? Should there be a warning that the default estimate is used? (i imagine this would be very noisy if it is done per aggregate call... so don't really recommend doing it here or anything...)

clintropolis · 2022-01-25T11:22:40Z

processing/src/main/java/org/apache/druid/query/aggregation/AggregatorFactory.java

+   *
+   * @return AggregatorAndSize which contains the actual aggregator and its initial size.
+   */
+  public AggregatorAndSize factorizeWithSize(ColumnSelectorFactory metricFactory)


same question about contract about returned sizes. Also, I wonder if there is anything we could do to make sure this method is overridden if aggregateWithSize is implemented, so that the initial size is not the max size...

Updated the javadoc to advise on the required estimation.

Also, I wonder if there is anything we could do to make sure this method is overridden if aggregateWithSize is implemented

I guess it is okay even if it isn't overridden because we would only be overestimating which would not cause failures, only somewhat poorer estimates.

The other way around is probably more of an issue i.e. overriding factorizeWithSize but not aggregateWithSize. In such a case, factorizeWithSize would give a small initial size, which would never increase because aggregateWithSize would always return 0.

In either case, this issue is a problem if we use the new behaviour, i.e. useMaxMemoryEstimates = false. I guess we could address this is in a subsequent PR.

clintropolis · 2022-01-25T11:27:16Z

processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java

  {
    final int[] encodedDimensionValues;
    final int oldDictSize = dimLookup.size();
+    final long oldDictSizeInBytes = useMaxMemoryEstimates ? 0 : dimLookup.sizeInBytes();


nit: this is only used inside of the else near the end of the method

We need the size of the dictionary before adding the dimension values.
At the end, we take the final size of dictionary and check the diff.

clintropolis · 2022-01-25T11:36:23Z

processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java

  {
+    super(useMaxMemoryEstimates ? new DimensionDictionary<>() : new StringDimensionDictionary());


nit: this seems strange/confusing, why wouldn't StringDimensionIndexer always use StringDimensionDictionary here? it seems like could just pass in a value of false to control the value of computeOnHeapSize instead of sometimes not using StringDimensionDictionary

You are right, it is weird.
Fixed it.

clintropolis · 2022-01-25T11:46:34Z

...es/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchAggregator.java

+  static {
+    try {
+      SKETCH_FIELD = Class.forName("org.apache.datasketches.theta.UnionImpl")
+                          .getDeclaredField("gadget_");
+      SKETCH_FIELD.setAccessible(true);
+    }
+    catch (NoSuchFieldException | ClassNotFoundException e) {
+      throw new ISE(e, "Could not initialize SketchAggregator");
+    }


this seems worth a comment, and maybe a link to the code?

exception in static initialization blocks can surface as very weird errors.
http://javaeesupportpatterns.blogspot.com/2012/07/javalangnoclassdeffounderror-how-to.html.

Maybe we can just move this initialization to the constructor? We need not worry too much about thread safety since it's ok even if SKETCH_FIELD gets constructed twice.

clintropolis

👍

abhishekagarwal87 · 2022-02-01T12:13:18Z

@kfaraz CI failures might be legit. can you fix those before merging?

kfaraz changed the title ~~[WIP] Update Aggregator and AggregatorFactory interfaces to fix mem estimates~~ [WIP] Update Aggregator and AggregatorFactory interfaces to improve mem estimates Dec 15, 2021

kfaraz marked this pull request as ready for review January 3, 2022 04:00

imply-cheddar suggested changes Jan 4, 2022

View reviewed changes

Improve mem estimation in Aggregator and DimensionIndexer

348f17e

kfaraz force-pushed the fix_heap_mem_estimate branch from 2eaeab1 to 348f17e Compare January 11, 2022 17:03

kfaraz changed the title ~~[WIP] Update Aggregator and AggregatorFactory interfaces to improve mem estimates~~ Update Aggregator and AggregatorFactory interfaces to improve mem estimates Jan 11, 2022

Fix LGTM errors

78a2f0e

imply-cheddar approved these changes Jan 12, 2022

View reviewed changes

Fix SketchAggregatorFactoryTest

598cfb5

clintropolis reviewed Jan 12, 2022

View reviewed changes

abhishekagarwal87 reviewed Jan 14, 2022

View reviewed changes

Fix estimations, add javadoc

9908c0b

kfaraz added 9 commits January 17, 2022 14:17

Separate old and new flow based on useMaxMemoryEstimates

919ad0e

Merge branch 'master' of https://github.com/apache/druid into fix_hea…

0cda85b

…p_mem_estimate

Fix InputSourceSampler error

766abca

Fix StringDimensionIndexerBenchmark

44d4cb4

Add test for SketchAggregator.aggregateWithSize()

3ce9ed2

Add StringDimensionIndexerTest

ba131ab

Merge branch 'master' of https://github.com/apache/druid into fix_hea…

fae22c9

…p_mem_estimate

Fix coverage and tests

a83316f

Fix propagation of context flag useMaxMemoryEstimates

86a6e93

kfaraz requested review from clintropolis and abhishekagarwal87 January 21, 2022 03:46

Remove unused DimensionDesc constructor

000a9bf

clintropolis reviewed Jan 25, 2022

View reviewed changes

clintropolis mentioned this pull request Jan 25, 2022

perf: indexing: Introduce a bulk getValuesInto function to read values #12105

Merged

9 tasks

Initialize static fields in SketchAggregator through SketchModule

e5ca2eb

clintropolis approved these changes Feb 1, 2022

View reviewed changes

clintropolis added Area - Ingestion Area - Operations Design Review labels Feb 1, 2022

abhishekagarwal87 approved these changes Feb 1, 2022

View reviewed changes

kfaraz added 2 commits February 2, 2022 12:15

Fix spotbug failure

6be53be

Fix SketchAggregationTest

614a58b

kfaraz merged commit e648b01 into apache:master Feb 3, 2022

kfaraz deleted the fix_heap_mem_estimate branch February 3, 2022 05:04

clintropolis mentioned this pull request Feb 3, 2022

make EncodedKeyComponent constructor public, remove nullable from DimensionIndexer.processRowValsToUnsortedEncodedKeyComponent #12229

Merged

gianm mentioned this pull request Feb 15, 2022

Accessing private parts of datasketches #12261

Closed

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

abhishekagarwal87 mentioned this pull request May 25, 2022

[Draft] 0.23.0 Release notes #12510

Closed

kfaraz mentioned this pull request Oct 4, 2022

Set useMaxMemoryEstimates=false by default #13178

Merged

2 tasks

		{
		super(useMaxMemoryEstimates ? new DimensionDictionary<>() : new StringDimensionDictionary());

Update Aggregator and AggregatorFactory interfaces to improve mem estimates #12073

Update Aggregator and AggregatorFactory interfaces to improve mem estimates #12073

Conversation

kfaraz commented Dec 15, 2021 • edited Loading

Description

Changes

lgtm-com bot commented Jan 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgtm-com bot commented Jan 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Jan 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Jan 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Jan 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz commented Jan 12, 2022

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz commented Jan 17, 2022

lgtm-com bot commented Jan 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

abhishekagarwal87 commented Feb 1, 2022

kfaraz commented Dec 15, 2021 •

edited

Loading

kfaraz Jan 17, 2022 •

edited

Loading

kfaraz Jan 12, 2022 •

edited

Loading

kfaraz Jan 12, 2022 •

edited

Loading