-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize top-level pipeline aggs as part of InternalAggregations #40177
Conversation
We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments but the change looks good to me. I wonder if we could, as a follow up, merge the pipeline aggregators and the internal aggregations in QuerySearchResult
, might be tricky to handle bwc though so def not in the scope of this pr.
* Constructs a new aggregation providing its {@link InternalAggregation}s and {@link SiblingPipelineAggregator}s | ||
*/ | ||
public InternalAggregations(List<InternalAggregation> aggregations, List<SiblingPipelineAggregator> topLevelPipelineAggregators) { | ||
super(aggregations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other ctr could call this one with: this(aggregations, Collections.emptyList())
ensuring that the topLevelPipelineAggregators
list is never null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added to my TODO list to convert this class to Writeable.
} | ||
|
||
@Override | ||
@SuppressWarnings("unchecked") | ||
public void writeTo(StreamOutput out) throws IOException { | ||
out.writeNamedWriteableList((List<InternalAggregation>)aggregations); | ||
//TODO update version after backport | ||
if (out.getVersion().onOrAfter(Version.V_8_0_0)) { | ||
if (topLevelPipelineAggregators == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rely on an empty list if the aggregations are completely reduced ? This way we don't need the boolean and can call the write list directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I initially did not do it but I just realized that I can. I was worried about cases with CCS where we receive from e.g. 6.6 and then write the same object to e.g. 7.x. I just need to set the list to an empty one in that case too which removes the need for the list to be nullable 100%.
} | ||
|
||
//TODO update version and rename after backport | ||
public void testSerializationFromPre_8_0_0() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the intent of this test and I know that we have similar tests elsewhere but I think it should be moved to a rest test or omitted if we are confident that the existing rest tests are enough to test the bwc serialization. This test checks an internal class that we are allowed to change in a minor release (even a patch release) so I don't think we should use a static representation of the serialization that we'll need to change every time we make a modification to the serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run. After the backport, this static version of the object is the binary representation of how we serialized the object prior to 6.7.0 (6.7.1 depending on what release the PR makes), which I am pretty sure we will not change. I can add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run.
They are overkilled if we write them only to test serialization but since we have some ccs rest tests already it shouldn't be too costly to add one that checks the support for pipeline aggregations. I also agree that we will probably not change the serialization of this class in 6.7.x but my point was more about the general idea of adding serialized bytes from a previous version in a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to do integration tests for this scenario as part of #40038 , I wanted to add coverage there for the the field collapsing bug as well. I prefer the new java test over the yaml ones personally. But our current CCS integration test don't run against multiple versions, while this unit test makes sure that we can read something that was written from e.g. 6.6 compared to simulating that by calling readFrom on master and setting the version to 6.6. Do you see what I mean? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the intent but I forgot that we don't run the bwc tests in every module, let's leave it like this for now and we can discuss further in #40038
yes that is also my goal, we might be able to do this in master and 7.x, indeed bwc is tricky especially for CCS which spans multiple versions. I will work on this as a followup. |
…astic#40177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059
Version conditionals are no longer needed once elastic#40177 is back-ported all the way to 6.7.
Relates to elastic#40177
…astic#40177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059
Relates to elastic#40177
…astic#40177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With elastic#40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes elastic#40059
Relates to elastic#40177
…0177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059
…0177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059
…0177) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059
Relates to elastic#40177 which is now merged and backported to all branches.
Relates to elastic#40177 which is now merged and backported to all branches.
Relates to elastic#40177 which is now merged and backported to all branches.
Relates to #40177 which is now merged and backported to all branches.
Relates to #40177 which is now merged and backported to all branches.
Relates to #40177 which is now merged and backported to all branches.
As part of elastic#40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.
As part of #40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.
As part of elastic#40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.
As part of #40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.
We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.
With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.
With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level
InternalAggregations
object.Closes #40059