Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Allow parallelism for final aggregation (#691)
Summary: X-link: facebookexternal/presto_cpp#691 Pull Request resolved: #1330 Final and single aggregations can run multi-threaded as long as data is partitioned on the grouping keys, but we used to artificially restrict these to run single-threaded. This change lifts the restriction. This allows queries like TPC-H q18 to run a lot faster: 2s vs. 5s for a single-node run over scale 10. To make this work we must not ignore any local exchange node in the query plan. Prestissimo used to silently drop gather and round-robin repartitioning local exchanges. We no longer do that. [1] Consider this global aggregation query. ``` presto> explain (type distributed) SELECT avg(discount), avg(quantity) FROM lineitem; Query Plan ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Fragment 0 [SINGLE] Output layout: [avg, avg_2] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION - Output[_col0, _col1] => [avg:double, avg_2:double] _col0 := avg (1:35) _col1 := avg_2 (1:50) - Aggregate(FINAL) => [avg:double, avg_2:double] avg := "presto.default.avg"((avg_10)) (1:35) avg_2 := "presto.default.avg"((avg_11)) (1:50) - LocalExchange[SINGLE] () => [avg_10:row("field0" double, "field1" bigint), avg_11:row("field0" double, "field1" bigint)] - RemoteSource[1] => [avg_10:row("field0" double, "field1" bigint), avg_11:row("field0" double, "field1" bigint)] Fragment 1 [tpch:orders:15000] Output layout: [avg_10, avg_11] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION - Aggregate(PARTIAL) => [avg_10:row("field0" double, "field1" bigint), avg_11:row("field0" double, "field1" bigint)] avg_10 := "presto.default.avg"((discount)) (1:35) avg_11 := "presto.default.avg"((quantity)) (1:50) - TableScan[TableHandle {connectorId='tpch', connectorHandle='lineitem:sf0.01', layout='Optional[lineitem:sf0.01]'}, grouped = false] => [quantity:double, discount:double] Estimates: {rows: 60175 (1.03MB), cpu: 1083150.00, memory: 0.00, network: 0.00} quantity := tpch:quantity (1:69) discount := tpch:discount (1:69) ``` Fragment 0 includes a gather local exchange: LocalExchange[SINGLE]. Before this change, the exchange was ignored and Velox ran a plan with Aggregate(FINAL) directly over RemoteSource in the same pipeline. The pipeline was restricted to single-threaded because it includes final aggregation. Now, Velox will run a plan with 2 pipelines. One pipeline will run RemoteSource, another - Aggregate(FINAL) and there will be a gather local exchange in between. [2] Consider this LIMIT query: ``` presto> explain (type distributed) SELECT orderkey FROM lineitem GROUP BY 1 LIMIT 17; Query Plan -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Fragment 0 [SINGLE] Output layout: [orderkey] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION - Output[orderkey] => [orderkey:bigint] - DistinctLimit[17] => [orderkey:bigint] - LocalExchange[SINGLE] () => [orderkey:bigint] - RemoteSource[1] => [orderkey:bigint] Fragment 1 [tpch:orders:15000] Output layout: [orderkey] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION - DistinctLimitPartial[17] => [orderkey:bigint] - TableScan[TableHandle {connectorId='tpch', connectorHandle='lineitem:sf0.01', layout='Optional[lineitem:sf0.01]'}, grouped = false] => [orderkey:bigint] Estimates: {rows: 60175 (528.88kB), cpu: 541575.00, memory: 0.00, network: 0.00} orderkey := tpch:orderkey (1:49) ``` Fragment 0 includes a gather local exchange: LocalExchange[SINGLE]. Before this change, the exchange was ignored and Velox ran a plan with DistinctLimit node directly over RemoteSource in the same pipeline. The pipeline was restricted to single-threaded because it includes final aggregation and final limit. Now, Velox will run a plan with 2 pipelines. One pipeline will run RemoteSource, another - DistinctLimit and there will be a gather local exchange in between. Reviewed By: Yuhta Differential Revision: D35304854 fbshipit-source-id: a37f8c09f6bb481edcfa29c033c4e4b092eeb173
- Loading branch information