Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply the fast filter optimization to composite aggregation #11505

Merged
merged 21 commits into from
Jan 17, 2024

Conversation

bowenlan-amzn
Copy link
Member

@bowenlan-amzn bowenlan-amzn commented Dec 7, 2023

Description

This PR adds the fast filter optimization to composite aggregation (comp-agg).
The optimization only applies when date histogram is the only source of comp-agg and no parent or sub aggregations.
The optimization only applies when the query is a range query on the same date field or is a MatchAllDocsQuery
The optimization acts on segment level, the rewritted range filters take in leaf context and give the counts of each range.

Refactor

  • Move the tryFastFilterAggregation from collect to getLeafCollector. Because if this try succeed, we don't need to return a collector to collect later. The results (doc counts of buckets) are saved at Aggregator or Shard level and will be processed by buildAggregations method later.

Glossary

  • slot: the bucket ordinal in comp-agg queue which refers to the aggregation result (key and doc count)

Related Issues

Resolves #11301

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Dec 7, 2023

Compatibility status:

Checks if related components are compatible with change ce81e3d

Incompatible components

Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/cross-cluster-replication.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git]

Copy link
Contributor

github-actions bot commented Dec 7, 2023

❌ Gradle check result for 36f98e7: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn bowenlan-amzn changed the title Comp agg optim Apply the fast filter optimization to composite aggregation Dec 7, 2023
Copy link
Contributor

github-actions bot commented Dec 7, 2023

❌ Gradle check result for 06a007e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn
Copy link
Member Author

PR Analysis

  • 🎯 Main theme: Optimization of composite aggregation in OpenSearch
  • 📝 PR summary: This PR introduces an optimization to the composite aggregation in OpenSearch. It applies a fast filter optimization to the composite aggregation, which can significantly improve the performance of certain queries. The PR also includes changes to several related classes and adds new tests to verify the functionality.
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: Yes
  • ⏱️ Estimated effort to review [1-5]: 4, because the PR includes a significant amount of new logic and changes to existing classes. It requires a good understanding of the OpenSearch aggregation framework to review effectively.
  • 🔒 Security concerns: No security concerns found

PR Feedback

  • 💡 General suggestions: The PR is generally well-structured and follows good coding practices. However, it would be beneficial to include more detailed comments explaining the logic behind the new optimization, especially in complex methods. This will make the code easier to understand for other developers. Also, consider adding more context to the PR description to help reviewers understand the changes better.

  • 🤖 Code feedback:
    • relevant file: server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
      suggestion: Consider refactoring the CompositeAggregator constructor to reduce its complexity and improve readability. The constructor is currently quite long and does a lot of different things, which can make it hard to understand. You could potentially extract some of the logic into separate methods. [important]
      relevant line: "+ CompositeAggregator("

    • relevant file: server/src/main/java/org/opensearch/search/aggregations/bucket/FilterRewriteHelper.java
      suggestion: It would be beneficial to add more detailed comments explaining the purpose and functionality of the FilterRewriteHelper class and its methods. This will make the code easier to understand for other developers. [medium]
      relevant line: "+public class FilterRewriteHelper {"

    • relevant file: server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
      suggestion: Consider adding null checks or assertions for the 'filters', 'bucketOrds', 'fieldType', and 'preparedRounding' fields. This can help prevent potential NullPointerExceptions. [important]
      relevant line: "+ private Weight[] filters = null;"

    • relevant file: server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
      suggestion: Consider refactoring the 'buildAggregations' method to reduce its complexity and improve readability. The method is currently quite long and does a lot of different things, which can make it hard to understand. You could potentially extract some of the logic into separate methods. [important]
      relevant line: "+ public InternalAggregation[] buildAggregations(long[] owningBucketOrds) throws IOException {"

Copy link
Contributor

github-actions bot commented Dec 7, 2023

❌ Gradle check result for d8f337b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 7, 2023

❌ Gradle check result for e7d8af9: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 589d03c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 3a621c4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bowenlan-amzn <[email protected]>
Copy link
Contributor

❌ Gradle check result for c6874c1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f2a220f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn
Copy link
Member Author

❌ Gradle check result for f2a220f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Confirm all the failures are already listed in flaky test issues
#10558
#9191
#10758
#8945
#11091

Signed-off-by: bowenlan-amzn <[email protected]>
Copy link
Contributor

❌ Gradle check result for ce81e3d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn
Copy link
Member Author

❌ Gradle check result for ce81e3d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Confirmed it's flaky test #9605

Copy link
Contributor

❕ Gradle check result for ce81e3d: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@msfroh msfroh merged commit d5a6f99 into opensearch-project:main Jan 17, 2024
30 checks passed
bowenlan-amzn added a commit to bowenlan-amzn/OpenSearch that referenced this pull request Jan 17, 2024
…ch-project#11505)

We can do efficient DateHistogram aggregation under composite 
aggregations.

---------

Signed-off-by: bowenlan-amzn <[email protected]>
@msfroh msfroh added the backport 2.x Backport to 2.x branch label Jan 18, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 18, 2024
We can do efficient DateHistogram aggregation under composite
aggregations.

---------

Signed-off-by: bowenlan-amzn <[email protected]>
(cherry picked from commit d5a6f99)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@bowenlan-amzn bowenlan-amzn deleted the comp-agg-optim branch January 18, 2024 06:29
msfroh pushed a commit that referenced this pull request Feb 7, 2024
…11914)

We can do efficient DateHistogram aggregation under composite
aggregations.

---------


(cherry picked from commit d5a6f99)

Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 7, 2024
…11914)

We can do efficient DateHistogram aggregation under composite
aggregations.

---------

(cherry picked from commit d5a6f99)

Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 6ed7565)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 10, 2024
…11914)

We can do efficient DateHistogram aggregation under composite
aggregations.

---------

(cherry picked from commit d5a6f99)

Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 6ed7565)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
…ch-project#11505)

We can do efficient DateHistogram aggregation under composite 
aggregations.

---------

Signed-off-by: bowenlan-amzn <[email protected]>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…ch-project#11505)

We can do efficient DateHistogram aggregation under composite 
aggregations.

---------

Signed-off-by: bowenlan-amzn <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ch-project#11505)

We can do efficient DateHistogram aggregation under composite
aggregations.

---------

Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch Search:Aggregations Search:Performance v2.12.0 Issues and PRs related to version 2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Date Histogram] Apply the optimization to Composite aggregation with source as DateHistogram
3 participants