Serialize top-level pipeline aggs as part of InternalAggregations #40177

javanna · 2019-03-18T20:20:33Z

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level InternalAggregations object.

Closes #40059

We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059

elasticmachine · 2019-03-18T20:20:35Z

Pinging @elastic/es-search

jimczi

I left some minor comments but the change looks good to me. I wonder if we could, as a follow up, merge the pipeline aggregators and the internal aggregations in QuerySearchResult, might be tricky to handle bwc though so def not in the scope of this pr.

jimczi · 2019-03-18T21:05:49Z

server/src/main/java/org/elasticsearch/search/aggregations/InternalAggregations.java

+     * Constructs a new aggregation providing its {@link InternalAggregation}s and {@link SiblingPipelineAggregator}s
+     */
+    public InternalAggregations(List<InternalAggregation> aggregations, List<SiblingPipelineAggregator> topLevelPipelineAggregators) {
+        super(aggregations);


The other ctr could call this one with: this(aggregations, Collections.emptyList()) ensuring that the topLevelPipelineAggregators list is never null ?

I added to my TODO list to convert this class to Writeable.

jimczi · 2019-03-18T21:09:02Z

server/src/main/java/org/elasticsearch/search/aggregations/InternalAggregations.java

    }

    @Override
    @SuppressWarnings("unchecked")
    public void writeTo(StreamOutput out) throws IOException {
        out.writeNamedWriteableList((List<InternalAggregation>)aggregations);
+        //TODO update version after backport
+        if (out.getVersion().onOrAfter(Version.V_8_0_0)) {
+            if (topLevelPipelineAggregators == null) {


can we rely on an empty list if the aggregations are completely reduced ? This way we don't need the boolean and can call the write list directly.

yes, I initially did not do it but I just realized that I can. I was worried about cases with CCS where we receive from e.g. 6.6 and then write the same object to e.g. 7.x. I just need to set the list to an empty one in that case too which removes the need for the list to be nullable 100%.

jimczi · 2019-03-18T21:17:36Z

server/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationsTests.java

+    }
+
+    //TODO update version and rename after backport
+    public void testSerializationFromPre_8_0_0() throws IOException {


I understand the intent of this test and I know that we have similar tests elsewhere but I think it should be moved to a rest test or omitted if we are confident that the existing rest tests are enough to test the bwc serialization. This test checks an internal class that we are allowed to change in a minor release (even a patch release) so I don't think we should use a static representation of the serialization that we'll need to change every time we make a modification to the serialization.

I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run. After the backport, this static version of the object is the binary representation of how we serialized the object prior to 6.7.0 (6.7.1 depending on what release the PR makes), which I am pretty sure we will not change. I can add a comment.

I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run.

They are overkilled if we write them only to test serialization but since we have some ccs rest tests already it shouldn't be too costly to add one that checks the support for pipeline aggregations. I also agree that we will probably not change the serialization of this class in 6.7.x but my point was more about the general idea of adding serialized bytes from a previous version in a unit test.

I plan to do integration tests for this scenario as part of #40038 , I wanted to add coverage there for the the field collapsing bug as well. I prefer the new java test over the yaml ones personally. But our current CCS integration test don't run against multiple versions, while this unit test makes sure that we can read something that was written from e.g. 6.6 compared to simulating that by calling readFrom on master and setting the version to 6.6. Do you see what I mean? Or am I missing something?

I understand the intent but I forgot that we don't run the bwc tests in every module, let's leave it like this for now and we can discuss further in #40038

javanna · 2019-03-18T22:14:29Z

I wonder if we could, as a follow up, merge the pipeline aggregators and the internal aggregations in QuerySearchResult, might be tricky to handle bwc though so def not in the scope of this pr.

yes that is also my goal, we might be able to do this in master and 7.x, indeed bwc is tricky especially for CCS which spans multiple versions. I will work on this as a followup.

…astic#40177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059

Version conditionals are no longer needed once elastic#40177 is back-ported all the way to 6.7.

Relates to elastic#40177

* Remove version conditionals from InternalAggregations Version conditionals are no longer needed once #40177 is back-ported all the way to 6.7. * Disable bwc tests Relates to #40177 * indentation

…astic#40177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059

Relates to elastic#40177

…astic#40177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059

Relates to elastic#40177

…0177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059

Relates to #40177

…0177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059

Relates to #40177

…0177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059

Relates to elastic#40177 which is now merged and backported to all branches.

Relates to #40177 which is now merged and backported to all branches.

As part of elastic#40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.

As part of #40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.

As part of elastic#40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.

As part of #40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.

javanna added >bug :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.7.0 v8.0.0 v7.2.0 labels Mar 18, 2019

javanna requested a review from jimczi March 18, 2019 20:20

javanna added 2 commits March 18, 2019 21:50

line length

876c6ca

license header

f1fe3a8

jimczi approved these changes Mar 18, 2019

View reviewed changes

address comments

88c4728

address test failures

5cbfedc

javanna merged commit 3c8970c into elastic:master Mar 19, 2019

javanna added the backport pending label Mar 19, 2019

javanna mentioned this pull request Mar 19, 2019

Cumulative 6.7 backport #40190

Closed

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Remove version conditionals from InternalAggregations

5994fa2

Version conditionals are no longer needed once elastic#40177 is back-ported all the way to 6.7.

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Disable bwc tests

e5948d9

Relates to elastic#40177

javanna mentioned this pull request Mar 19, 2019

Remove version conditionals from InternalAggregations #40193

Merged

This was referenced Mar 19, 2019

Cumulative 7.x backport #40195

Closed

Cumulative 7.0 backport #40196

Closed

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Disable bwc tests

2e9b47d

Relates to elastic#40177

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Disable bwc tests

9520a1d

Relates to elastic#40177

javanna added a commit that referenced this pull request Mar 19, 2019

Disable bwc tests

4c9d7df

Relates to #40177

javanna added a commit that referenced this pull request Mar 19, 2019

Disable bwc tests

9f2d7d1

Relates to #40177

javanna removed the backport pending label Mar 19, 2019

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Re-enable bwc tests

8826f88

Relates to elastic#40177 which is now merged and backported to all branches.

javanna mentioned this pull request Mar 19, 2019

Re-enable bwc tests on master #40215

Merged

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Re-enable bwc tests

aa5211c

Relates to elastic#40177 which is now merged and backported to all branches.

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Re-enable bwc tests

ecda8f0

Relates to elastic#40177 which is now merged and backported to all branches.

This was referenced Mar 19, 2019

Re-enable bwc tests on 7.x #40217

Merged

Re-enable bwc tests on 7.0 #40218

Merged

javanna added a commit that referenced this pull request Mar 19, 2019

Re-enable bwc tests (#40215)

d22c4b8

Relates to #40177 which is now merged and backported to all branches.

javanna added a commit that referenced this pull request Mar 19, 2019

Re-enable bwc tests (#40218)

a5cb792

Relates to #40177 which is now merged and backported to all branches.

javanna added a commit that referenced this pull request Mar 19, 2019

Re-enable bwc tests (#40217)

1ec9fb3

Relates to #40177 which is now merged and backported to all branches.

javanna mentioned this pull request Mar 21, 2019

Move top-level pipeline aggs out of QuerySearchResult #40319

Merged

michaelbaamonde added v7.0.0-rc1 and removed v7.0.0 labels Mar 25, 2019

codebrain mentioned this pull request Aug 5, 2019

[meta] 7.2 Release elastic/elasticsearch-net#3980

Closed

37 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize top-level pipeline aggs as part of InternalAggregations #40177

Serialize top-level pipeline aggs as part of InternalAggregations #40177

javanna commented Mar 18, 2019 •

edited

Loading

elasticmachine commented Mar 18, 2019

jimczi left a comment

jimczi Mar 18, 2019

javanna Mar 19, 2019

jimczi Mar 18, 2019

javanna Mar 18, 2019

jimczi Mar 18, 2019

javanna Mar 18, 2019 •

edited

Loading

jimczi Mar 18, 2019

javanna Mar 18, 2019

jimczi Mar 18, 2019

javanna commented Mar 18, 2019

Serialize top-level pipeline aggs as part of InternalAggregations #40177

Serialize top-level pipeline aggs as part of InternalAggregations #40177

Conversation

javanna commented Mar 18, 2019 • edited Loading

elasticmachine commented Mar 18, 2019

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna Mar 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Mar 18, 2019

javanna commented Mar 18, 2019 •

edited

Loading

javanna Mar 18, 2019 •

edited

Loading