Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Implement a MetricsAware interface #120527

Merged
merged 10 commits into from
Jan 27, 2025

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Jan 21, 2025

This implements an interface that export the names of the plan nodes and functions that need to be counted in the metrics.

Also, the metrics are now counted from within the parser. This should allow correct accounting for the cases where some nodes can appear both standalone or part other nodes' children (like Aggregate being a child of INLINESTATS, so no STATS counting should occur).

The functions counting now also validates that behind a name there is actually a function registered.

Closes #115992.

This implements an interface that export the names of the plan nodes and
functions that need to be counted in the metrics.

Also, the metrics are now counted from within the parser. This should
allow correct accounting for the cases where some nodes can appear both
standalone or part other nodes' children (like Aggregate being a child
of INLINESTATS, so no STATS counting should occur).
@bpintea bpintea added >enhancement auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've updated the changelog YAML for you.

@bpintea bpintea marked this pull request as ready for review January 21, 2025 18:04
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My main comments are about using the Telemetry name instead of metrics and some small cleanups here and there; nothing major.

@@ -7,28 +7,56 @@

package org.elasticsearch.xpack.esql.stats;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o.e.x.esql.telemetry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new package and moved all telemetry-related classes there.

import java.util.Locale;
import java.util.Map;
import java.util.Set;

import static org.elasticsearch.common.Strings.format;

/**
* This class is responsible for collecting metrics related to ES|QL planning.
*/
public class PlanningMetrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not much related to planning here so I propose renaming to PlanTelemetry and update the javadoc accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated (initial naming was taken from the proposed design doc).

var functionName = functionRegistry.resolveAlias(name);
if (functionRegistry.functionExists(functionName)) {
// The metrics have been collected initially with their uppercase spelling
add(functions, functionName.toUpperCase(Locale.ROOT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to remove the toUpper() repetition in this class and EsqlFunctionRegistry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -218,7 +218,7 @@ private LogicalPlan resolveIndex(UnresolvedRelation plan, IndexResolution indexR
plan.metadataFields(),
plan.indexMode(),
indexResolutionMessage,
plan.commandName()
plan.metricName()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the string is provided internally, is there still a need to specify it in the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because in some cases (like with METRICS) we don't have a dedicated container.

/**
* Interface for plan nodes that need to be accounted in the statistics
*/
public interface MetricsAware {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TelemetryAware

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

public LogicalPlan createStatement(String query, QueryParams params) {
return createStatement(query, params, new PlanningMetrics(new EsqlFunctionRegistry()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used - I'd rather force consumer to change the constructor instead of hiding this object creation underneath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used in tests (I'd added a comment).
This could be moved out, but subsequently, as there are plenty of tests not caring about params or metrics.

@@ -116,9 +117,11 @@ public abstract class ExpressionBuilder extends IdentifierBuilder {
public static final int MAX_EXPRESSION_DEPTH = 400;

protected final QueryParams params;
private final PlanningMetrics metrics;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it protected so subclasses can refer to it instead of having their own copies.
To avoid future disruption in the constructor signature, introduce a basic ParsingContext record (record ParsingContext(QueryParams params, PlanTelemetry) and use that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, this was silly. Thx.

@@ -87,19 +89,25 @@ public class LogicalPlanBuilder extends ExpressionBuilder {

interface PlanFactory extends Function<LogicalPlan, LogicalPlan> {}

private final PlanningMetrics metrics;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on ExpressionBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -481,7 +489,7 @@ public LogicalPlan visitMetricsCommand(EsqlBaseParser.MetricsCommandContext ctx)
List.of(new MetadataAttribute(source, MetadataAttribute.TSID_FIELD, DataType.KEYWORD, false)),
IndexMode.TIME_SERIES,
null,
"FROM TS"
null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the constructor is the label is not needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new c'tor to not take that last parameter.

@@ -530,7 +538,7 @@ public PlanFactory visitJoinCommand(EsqlBaseParser.JoinCommandContext ctx) {
emptyList(),
IndexMode.LOOKUP,
null,
"???"
null // should not end up being counted in the metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@bpintea bpintea added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 27, 2025
@elasticsearchmachine elasticsearchmachine merged commit a4482d4 into elastic:main Jan 27, 2025
16 checks passed
@bpintea bpintea deleted the enh/metrics_aware branch January 27, 2025 18:25
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120527

bpintea added a commit to bpintea/elasticsearch that referenced this pull request Jan 28, 2025
This implements an interface that export the names of the plan nodes and
functions that need to be counted in the metrics.

Also, the metrics are now counted from within the parser. This should
allow correct accounting for the cases where some nodes can appear both
standalone or part other nodes' children (like Aggregate being a child
of INLINESTATS, so no STATS counting should occur).

The functions counting now also validates that behind a name there is
actually a function registered.

Closes elastic#115992.

(cherry picked from commit a4482d4)
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jan 28, 2025
elasticsearchmachine pushed a commit that referenced this pull request Jan 28, 2025
This reverts commit a4482d4.

It turns out that `PlanTelemetry` can add quite a bit of memory usage,
at least on "rude" queries. In `HeapAttackIT.testHugeManyConcat`, this
was using 30MB.

I'd like to revert this to see if we can - either reduce its memory
footprint or - track its memory somehow.
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Jan 28, 2025
This implements an interface that export the names of the plan nodes and
functions that need to be counted in the metrics.

Also, the metrics are now counted from within the parser. This should
allow correct accounting for the cases where some nodes can appear both
standalone or part other nodes' children (like Aggregate being a child
of INLINESTATS, so no STATS counting should occur).

The functions counting now also validates that behind a name there is
actually a function registered.

Closes elastic#115992.

(cherry picked from commit a4482d4)
bpintea added a commit that referenced this pull request Jan 29, 2025
* ESQL: Implement a MetricsAware interface (#120527)

This implements an interface that export the names of the plan nodes and
functions that need to be counted in the metrics.

Also, the metrics are now counted from within the parser. This should
allow correct accounting for the cases where some nodes can appear both
standalone or part other nodes' children (like Aggregate being a child
of INLINESTATS, so no STATS counting should occur).

The functions counting now also validates that behind a name there is
actually a function registered.

Closes #115992.

(cherry picked from commit a4482d4)

* Drop the HashSet gating when counting commands

The telemetry accounting is no longer done in just one place in the parser,
but split, so that no HashSet is required to discard duplicate accounting of
the same node. This lowers the memory requirements.
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Jan 29, 2025
* ESQL: Implement a MetricsAware interface (elastic#120527)

This implements an interface that export the names of the plan nodes and
functions that need to be counted in the metrics.

Also, the metrics are now counted from within the parser. This should
allow correct accounting for the cases where some nodes can appear both
standalone or part other nodes' children (like Aggregate being a child
of INLINESTATS, so no STATS counting should occur).

The functions counting now also validates that behind a name there is
actually a function registered.

Closes elastic#115992.

(cherry picked from commit a4482d4)

* Drop the HashSet gating when counting commands

The telemetry accounting is no longer done in just one place in the parser,
but split, so that no HashSet is required to discard duplicate accounting of
the same node. This lowers the memory requirements.
elasticsearchmachine pushed a commit that referenced this pull request Jan 29, 2025
* ESQL: Implement a MetricsAware interface (#120527)

This implements an interface that export the names of the plan nodes and
functions that need to be counted in the metrics.

Also, the metrics are now counted from within the parser. This should
allow correct accounting for the cases where some nodes can appear both
standalone or part other nodes' children (like Aggregate being a child
of INLINESTATS, so no STATS counting should occur).

The functions counting now also validates that behind a name there is
actually a function registered.

Closes #115992.

(cherry picked from commit a4482d4)

* Drop the HashSet gating when counting commands

The telemetry accounting is no longer done in just one place in the parser,
but split, so that no HashSet is required to discard duplicate accounting of
the same node. This lowers the memory requirements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Improve command resolution in telemetry
3 participants