Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase #12443

loquisgon · 2022-04-16T00:19:21Z

We have seen rare instances on the wild where during hash partitions the determine cardinality phase produces a single shard with a large segment without regards to the value of maxRowsPerSegment. Attempts to reproduce have not been successful so adding some defensive programming and logging in case this happens again would be helpful. This PR deals with the improbable case where the estimate from a Union HLLSketch that is used in the code returns negative. It also adds some logging to report the values used to come up with the final determination.

This PR has:

[X ] been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
[X ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
[ X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
[X ] been tested in a test Druid cluster.

somu-imply

Overall LGTM. Minor nit s

somu-imply · 2022-05-10T16:28:45Z

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;


instead of setting this 7 here, we should move this as a final static variable up top. Something like DEFAULT_NUM_SHARDS

Just a question, why we went with 7 and say why not 3 ? Any rationale behind it ?

Could also make this configurable with a default of 7.

My thinking is that three is too small but it is all arbitrary. This is not really a default, it is just for the remote, never actually verified to be observed, case that estimate is negative. So I would like to leave as is.

But in the case that we get into this situation, should the fall back here be configurable, so that if 7 results in a bad estimate / default value, the job can be rerun with a different value and produce potentially better results?

If we fall again here we should collect evidence and fix it for good.

In order to enforce collect and fix we could also throw an ISE here so the context is repeatable...how does this sound instead of the guesstimate of seven shards? Rather than guesstimating just throw an ISE and halt. This may be too harsh so the warning is better I think but stop there and not try to be more clever. Let's think of this as some sort of fishing expedition for data to see if this was the original problem. There is no evidence and those of us that have tried have not been able to reproduce the scenario.

hmm, yeah failing the ingestion completely may be extreme. I think best is to allow user to set a fallback num shards value if it cant be computed / estimated properly. Maybe a negative value for the config could indicate to fail the ingestion?

I realize that complicates things, so your choice to add or not. Would be good to allow some control here, but not a blocker.

Put the magic number behind a label.

somu-imply · 2022-05-10T16:29:45Z

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

+
+        // This is for potential debugging in case we suspect bad estimation of cardinalities etc,
+        LOG.debug("intervalToNumShards: %s", intervalToNumShards.toString());
+


Apart from putting it in the log, do we need a metric around it too ?

How often would this log message be hit? We rarely turn on debug logging until after an issue is seen. If its loggign here wont be too much, could we move to info level?

A similar log message was there before and it was explicitly removed by a previous change.

a metric here could be good, as @somu-imply suggested

I have instrumented the Task's to make it easier to emit metrics... we can delay this PR until this other PR with the metrics instrumentation merges and then I can easily add the metric.

zachjsh · 2022-05-10T19:48:42Z

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",


This seems like useful log, if not logged too often, maybe we can make it info level?

mmm, the goal of this was to do something for the bad estimate not for better logging? Let me think...

How many depends on how many shards were created per interval. This code will. union all the independent shards created from the parallel sub-tasks in order to create the final shard. I am being cautious again because before such logging was considered excessive.

I made it info based on code review...I agree it is useful to have.

If the logging was deemed to excessive before and explicitly removed, then maybe thats good reason to keep it debug. I'm not sure on the details though? Still not clear to me how often this log would be written.

It is not easy to know how frequent it would be logged without coming up with some data models and retrieving some data to understand more the real distributions of hash buckets in a given time chunk for a give set of dimensions. This is one of these things were experience and/or experimentation teaches you better IMO. So turning it on again and watching it seems like the right thing to do this time.

zachjsh

LGTM

somu-imply

LGTM

Agustin Gonzalez added 3 commits April 15, 2022 17:14

Deal with potential cardinality estimate being negative and add logging

ebde768

Fix typo in name

75bc1ff

Refine and minimize logging

4a471ef

somu-imply reviewed May 10, 2022

View reviewed changes

zachjsh reviewed May 10, 2022

View reviewed changes

Agustin Gonzalez added 2 commits May 11, 2022 13:59

Make it info based on code review

2ddc223

Create a named constant for the magic number

9d7d16e

zachjsh approved these changes May 19, 2022

View reviewed changes

somu-imply approved these changes May 19, 2022

View reviewed changes

loquisgon merged commit c236227 into apache:master May 20, 2022

abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase #12443

Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase #12443

loquisgon commented Apr 16, 2022 •

edited

Loading

somu-imply left a comment

somu-imply May 10, 2022

somu-imply May 10, 2022

zachjsh May 10, 2022

loquisgon May 11, 2022 •

edited

Loading

zachjsh May 11, 2022

loquisgon May 11, 2022

loquisgon May 11, 2022

zachjsh May 12, 2022 •

edited

Loading

zachjsh May 13, 2022

loquisgon May 18, 2022

somu-imply May 10, 2022

zachjsh May 10, 2022

loquisgon May 11, 2022

zachjsh May 11, 2022

loquisgon May 18, 2022

zachjsh May 10, 2022

loquisgon May 11, 2022 •

edited

Loading

loquisgon May 11, 2022

loquisgon May 11, 2022

zachjsh May 12, 2022

loquisgon May 16, 2022

zachjsh left a comment

somu-imply left a comment


		// This is for potential debugging in case we suspect bad estimation of cardinalities etc,
		LOG.debug("intervalToNumShards: %s", intervalToNumShards.toString());

Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase #12443

Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase #12443

Conversation

loquisgon commented Apr 16, 2022 • edited Loading

somu-imply left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loquisgon May 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachjsh May 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loquisgon May 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachjsh left a comment

Choose a reason for hiding this comment

somu-imply left a comment

Choose a reason for hiding this comment

loquisgon commented Apr 16, 2022 •

edited

Loading

loquisgon May 11, 2022 •

edited

Loading

zachjsh May 12, 2022 •

edited

Loading

loquisgon May 11, 2022 •

edited

Loading