-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track whether an optimization decision was cost-based #20990
Conversation
List<String> triggeredOptimizers = planOptimizerInfo.stream() | ||
.filter(x -> x.getOptimizerTriggered()) | ||
.map(x -> x.getOptimizerName()).collect(toList()); | ||
.map(x -> x.getOptimizerName()).distinct().sorted().collect(toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned up the output to be sorted and de-duplicated
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/TextRenderer.java
Outdated
Show resolved
Hide resolved
...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java
Outdated
Show resolved
Hide resolved
@@ -201,6 +211,8 @@ private void addJoinsWithDifferentDistributions(JoinNode joinNode, List<PlanNode | |||
|
|||
private JoinNode getSyntacticOrderJoin(JoinNode joinNode, Context context, JoinDistributionType joinDistributionType) | |||
{ | |||
isCostBased = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if caller can pass this information rather setting this here?
if (isTriggered || isApplicable) { | ||
session.getOptimizerInformationCollector().addInformation(new PlanOptimizerInformation(optimizerName, isTriggered, Optional.of(isApplicable), Optional.empty())); | ||
boolean isCostBased = optimizer.isCostBased(session); | ||
if (isTriggered || isApplicable || isCostBased) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this disjunction cause all cost-based optimizers to be logged? Currently it does not matter since none of the PlanOptimizers are cost-based, but if that should change this would not be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, I'll change it to only log isCostBased if the optimization triggered
@@ -63,12 +64,20 @@ public class DetermineJoinDistributionType | |||
private final CostComparator costComparator; | |||
private final TaskCountEstimator taskCountEstimator; | |||
|
|||
// records whether distribution decision was cost-based | |||
private boolean isCostBased; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\n
Add these as well? DetermineSemiJoinDistributionType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good modulo the nits. Please fix the release notes as well
...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java
Outdated
Show resolved
Hide resolved
2ba92de
to
4eb770b
Compare
Some of Presto's optimizers are heuristic, while others are cost-based. This change allows tracking which optimizers were driven by a cost-based decision (independent of whether the cost was estimated or supplied by HBO). This information is added to PlanOptimizerInformation and can be seen in the explain plan when verbose_optimizer_info_enabled=true, for example: presto:tpch> explain select lineitem.linenumber,count(*) from orders join lineitem on (lineitem.orderkey=orders.orderkey) group by linenumber; Query Plan ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ - Output[PlanNodeId 12][linenumber, _col1] => [linenumber:integer, count:bigint] _col1 := count (1:36) - RemoteStreamingExchange[PlanNodeId 297][GATHER] => [linenumber:integer, count:bigint] - Project[PlanNodeId 406][projectLocality = LOCAL] => [linenumber:integer, count:bigint] - Aggregate(FINAL)[linenumber][$hashvalue][PlanNodeId 7] => [linenumber:integer, $hashvalue:bigint, count:bigint] count := "presto.default.count"((count_15)) (1:36) - LocalExchange[PlanNodeId 355][HASH][$hashvalue] (linenumber) => [linenumber:integer, count_15:bigint, $hashvalue:bigint] - RemoteStreamingExchange[PlanNodeId 361][REPARTITION][$hashvalue_16] => [linenumber:integer, count_15:bigint, $hashvalue_16:bigint] - Aggregate(PARTIAL)[linenumber][$hashvalue_22][PlanNodeId 359] => [linenumber:integer, $hashvalue_22:bigint, count_15:bigint] count_15 := "presto.default.count"(*) (1:36) - Project[PlanNodeId 405][projectLocality = LOCAL] => [linenumber:integer, $hashvalue_22:bigint] Estimates: {source: CostBasedSourceInfo, rows: 58490 (799.67kB), cpu: 7320844.01, memory: 270000.00, network: 1654025.00} $hashvalue_22 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(linenumber), BIGINT'0')) (1:119) - InnerJoin[PlanNodeId 271][("orderkey_0" = "orderkey")][$hashvalue_17, $hashvalue_19] => [linenumber:integer] Estimates: {source: CostBasedSourceInfo, rows: 58490 (799.67kB), cpu: 6501977.37, memory: 270000.00, network: 1654025.00} Distribution: PARTITIONED - RemoteStreamingExchange[PlanNodeId 294][REPARTITION][$hashvalue_17] => [orderkey_0:bigint, linenumber:integer, $hashvalue_17:bigint] Estimates: {source: CostBasedSourceInfo, rows: 60175 (822.71kB), cpu: 3610500.00, memory: 0.00, network: 1384025.00} - ScanProject[PlanNodeId 1,403][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout='Optional[tpch.lineitem{}]'}, projectLocality = LOCAL] => [orderkey_0:bigint, linenumber:integer, $hashvalue_18:bigint] Estimates: {source: CostBasedSourceInfo, rows: 60175 (822.71kB), cpu: 842450.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 60175 (822.71kB), cpu: 2226475.00, memory: 0.00, network: 0.00} $hashvalue_18 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey_0), BIGINT'0')) (1:62) LAYOUT: tpch.lineitem{} orderkey_0 := orderkey:bigint:0:REGULAR (1:62) linenumber := linenumber:int:3:REGULAR (1:62) - LocalExchange[PlanNodeId 338][HASH][$hashvalue_19] (orderkey) => [orderkey:bigint, $hashvalue_19:bigint] Estimates: {source: CostBasedSourceInfo, rows: 15000 (205.08kB), cpu: 945000.00, memory: 0.00, network: 270000.00} - RemoteStreamingExchange[PlanNodeId 295][REPARTITION][$hashvalue_20] => [orderkey:bigint, $hashvalue_20:bigint] Estimates: {source: CostBasedSourceInfo, rows: 15000 (205.08kB), cpu: 675000.00, memory: 0.00, network: 270000.00} - ScanProject[PlanNodeId 0,404][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartitionValues=Optional.empty}', layout='Optional[tpch.orders{}]'}, projectLocality = LOCAL] => [orderkey:bigint, $hashvalue_21:bigint] Estimates: {source: CostBasedSourceInfo, rows: 15000 (205.08kB), cpu: 135000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 15000 (205.08kB), cpu: 405000.00, memory: 0.00, network: 0.00} $hashvalue_21 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey), BIGINT'0')) (1:50) LAYOUT: tpch.orders{} orderkey := orderkey:bigint:0:REGULAR (1:50) Triggered optimizers: [AddLocalExchanges, ApplyConnectorOptimization, HashGenerationOptimizer, PickTableLayoutWithoutPredicate, PruneJoinChildrenColumns, PruneJoinColumns, PruneProjectColumns, PruneTableScanColumns, PruneUnreferencedOutputs, PushPartialAggregationThroughExchange, RemoveRedundantDistinctAggregation, RemoveRedundantIdentityProjections, ReorderJoins, SetFlatteningOptimizer, SimplifyPlanWithEmptyInput, StatsRecordingPlanOptimizer, UnaliasSymbolReferences] Applicable optimizers: [AddNotNullFiltersToJoinNode, KeyBasedSampler, MergePartialAggregationsWithFilter, PushPartialAggregationThroughJoin] Cost-based optimizers: [PushPartialAggregationThroughExchange(CBO), ReorderJoins(CBO)]
4eb770b
to
1108b01
Compare
Some of Presto's optimizers are heuristic, while others are cost-based. This change allows tracking which optimizers were driven by a cost-based decision (independent of whether the cost was estimated or supplied by HBO). This information is added to PlanOptimizerInformation and can be seen in the explain plan when verbose_optimizer_info_enabled=true, for example:
Note: currently we don't track whether the cost came from HBO or through cost estimation, only that the decision was cost-driven. We already have a change that tracks what the source of the cost estimation is, so we can potentially intersect the two to find this information. I could potentially reconsider and track and log this information directly here.
Description
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.