Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](mtmv) Support to partition prune when query rewrite by sync materialized view #38527

Conversation

seawinde
Copy link
Contributor

Proposed changes

Support to partition prune when query rewrite by sync materialized view
such as table def is as following:

        CREATE TABLE IF NOT EXISTS test_unique (
        `time` DATETIME NULL COMMENT '查询时间', 
        `app_name` VARCHAR(64) NULL COMMENT '标识', 
        `event_id` VARCHAR(128) NULL COMMENT '标识', 
        `decision` VARCHAR(32) NULL COMMENT '枚举值', 
        `id` VARCHAR(35) NOT NULL COMMENT 'od', 
        `code` VARCHAR(64) NULL COMMENT '标识', 
        `event_type` VARCHAR(32) NULL COMMENT '事件类型' 
        )
        UNIQUE KEY(time)
        PARTITION BY RANGE(time)                                                                                                                                                                                                                
        (                                                                                                                                                                                                                                      
         FROM ("2024-07-01 00:00:00") TO ("2024-07-15 00:00:00") INTERVAL 1 HOUR                                                                                                                                                                              
        )     
        DISTRIBUTED BY HASH(time)
        BUCKETS 3 PROPERTIES ("replication_num" = "1");

sync materialized view def is

create materialized view as
    select
    app_name,
    event_id,
    time,
    count(*)
    from 
    test_duplicate
    group by
    app_name,
    event_id,
    time;

if your query is following, if rewritten by sync materialized view successfully, should partition prune

    select
    app_name,
    event_id,
    time,
    count(*)
    from 
    test_duplicate
    where time < '2024-07-05 01:00:00'
    group by
    app_name,
    time,
    event_id;

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41617 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 047d12725d69bedd9a4b629abdc2aaccca6a857a, data reload: false

------ Round 1 ----------------------------------
q1	18422	4213	4169	4169
q2	2471	211	214	211
q3	12645	1275	1353	1275
q4	10663	850	917	850
q5	8234	2965	2951	2951
q6	221	138	137	137
q7	1026	610	630	610
q8	9443	1878	1935	1878
q9	8416	6580	6619	6580
q10	8746	3813	3848	3813
q11	429	252	248	248
q12	413	226	222	222
q13	17762	2924	2915	2915
q14	274	243	240	240
q15	516	488	485	485
q16	479	408	401	401
q17	957	921	904	904
q18	7997	7484	7171	7171
q19	1478	1222	1207	1207
q20	566	317	323	317
q21	5259	4815	4750	4750
q22	350	287	283	283
Total cold run time: 116767 ms
Total hot run time: 41617 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4114	3985	4004	3985
q2	326	224	219	219
q3	2992	2974	2945	2945
q4	1872	1870	1865	1865
q5	5284	5209	5221	5209
q6	222	130	127	127
q7	2077	1690	1715	1690
q8	3220	3289	3263	3263
q9	8305	8278	8282	8278
q10	3768	3809	3810	3809
q11	547	449	453	449
q12	736	580	546	546
q13	12705	2922	2957	2922
q14	279	254	254	254
q15	521	478	470	470
q16	451	414	390	390
q17	1711	1641	1660	1641
q18	7784	7335	7193	7193
q19	1714	1652	1652	1652
q20	1979	1726	1738	1726
q21	5422	5146	5168	5146
q22	520	470	472	470
Total cold run time: 66549 ms
Total hot run time: 54249 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169620 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 047d12725d69bedd9a4b629abdc2aaccca6a857a, data reload: false

query1	929	378	375	375
query2	6471	1804	1728	1728
query3	6684	217	223	217
query4	20004	17382	17551	17382
query5	4270	502	512	502
query6	288	167	169	167
query7	4602	295	292	292
query8	254	200	193	193
query9	8533	2372	2366	2366
query10	438	276	273	273
query11	10571	10045	10173	10045
query12	140	88	85	85
query13	1632	371	373	371
query14	9201	7736	7811	7736
query15	216	161	167	161
query16	7083	445	411	411
query17	931	548	525	525
query18	1909	279	274	274
query19	197	139	137	137
query20	89	88	83	83
query21	205	110	99	99
query22	4245	4063	3949	3949
query23	33528	32913	32915	32913
query24	10372	3072	3032	3032
query25	672	379	392	379
query26	1756	153	143	143
query27	2979	276	272	272
query28	6967	1990	1942	1942
query29	1290	414	409	409
query30	291	150	149	149
query31	910	756	745	745
query32	99	55	61	55
query33	705	306	316	306
query34	909	474	484	474
query35	842	702	735	702
query36	999	882	855	855
query37	295	80	83	80
query38	2871	2753	2770	2753
query39	868	823	801	801
query40	281	121	114	114
query41	49	46	45	45
query42	124	99	104	99
query43	493	453	435	435
query44	1194	721	739	721
query45	208	178	181	178
query46	1074	797	797	797
query47	1793	1700	1695	1695
query48	360	290	307	290
query49	1178	422	425	422
query50	894	427	446	427
query51	6802	6708	6734	6708
query52	108	97	90	90
query53	257	178	181	178
query54	660	450	450	450
query55	86	78	76	76
query56	283	264	268	264
query57	1163	1033	1044	1033
query58	277	267	268	267
query59	2819	2452	2594	2452
query60	305	269	278	269
query61	104	98	101	98
query62	918	656	672	656
query63	246	183	186	183
query64	5923	1926	1902	1902
query65	3174	3082	3073	3073
query66	1442	338	343	338
query67	15495	14691	14776	14691
query68	4346	565	586	565
query69	443	303	320	303
query70	1105	1071	1095	1071
query71	371	279	290	279
query72	7150	2688	2573	2573
query73	769	327	326	326
query74	6070	5719	5634	5634
query75	3389	2733	2747	2733
query76	2378	1342	1421	1342
query77	509	319	315	315
query78	9375	8849	8919	8849
query79	1946	534	535	534
query80	1116	530	505	505
query81	527	226	225	225
query82	1168	132	125	125
query83	253	175	181	175
query84	272	81	80	80
query85	1356	322	318	318
query86	408	283	333	283
query87	3244	3094	3094	3094
query88	3004	2423	2406	2406
query89	391	299	291	291
query90	1838	193	194	193
query91	130	104	105	104
query92	65	52	52	52
query93	1568	608	601	601
query94	906	302	304	302
query95	389	271	269	269
query96	595	281	279	279
query97	3184	2997	3055	2997
query98	210	205	199	199
query99	1647	1299	1266	1266
Total cold run time: 263077 ms
Total hot run time: 169620 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.18 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 047d12725d69bedd9a4b629abdc2aaccca6a857a, data reload: false

query1	0.05	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.08	0.07
query5	0.49	0.49	0.49
query6	1.16	0.72	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.56	0.54	0.51
query10	0.56	0.56	0.57
query11	0.16	0.12	0.11
query12	0.14	0.12	0.12
query13	0.61	0.59	0.60
query14	0.77	0.80	0.78
query15	0.92	0.87	0.87
query16	0.36	0.36	0.35
query17	1.00	1.01	0.98
query18	0.22	0.22	0.21
query19	1.86	1.74	1.74
query20	0.02	0.01	0.01
query21	15.40	0.75	0.65
query22	5.08	7.28	1.43
query23	17.90	1.40	1.30
query24	2.28	0.22	0.22
query25	0.17	0.08	0.08
query26	0.32	0.22	0.21
query27	0.46	0.23	0.23
query28	13.18	1.00	0.97
query29	12.58	3.34	3.33
query30	0.25	0.06	0.05
query31	2.87	0.41	0.40
query32	3.24	0.48	0.48
query33	2.89	2.97	2.94
query34	15.43	4.25	4.25
query35	4.27	4.29	4.32
query36	0.68	0.47	0.49
query37	0.19	0.16	0.15
query38	0.16	0.15	0.15
query39	0.04	0.03	0.04
query40	0.16	0.13	0.13
query41	0.11	0.05	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 108.68 s
Total hot run time: 30.18 s

indexId,
PreAggStatus.unset(),
preAggStatus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be unset

Copy link
Contributor Author

@seawinde seawinde Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the preAggStatus is variable.
it's is unset when sync materialization context and is on when async materialization context

return scanPlan;
}
if (queryStructInfo.getRelations().size() == 1
&& queryStructInfo.getRelations().get(0) instanceof LogicalOlapScan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to org.apache.doris.nereids.rules.rewrite.DeferMaterializeTopNResult, maybe relation could be LogicalDeferMaterializeOlapScan? can we collect correct struct info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can collect the LogicalDeferMaterializeOlapScan when collect the struct info

private static class RelationCollector extends DefaultPlanVisitor<Void, List<CatalogRelation>> {
        @Override
        public Void visit(Plan plan, List<CatalogRelation> collectedRelations) {
            if (plan instanceof CatalogRelation) {
                collectedRelations.add((CatalogRelation) plan);
            }
            return super.visit(plan, collectedRelations);
        }
    }

@github-actions github-actions bot added the doing label Aug 2, 2024
@seawinde
Copy link
Contributor Author

seawinde commented Aug 2, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41922 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c02b7c618b52d091022577a83d1148ed14e2641a, data reload: false

------ Round 1 ----------------------------------
q1	17720	4207	4085	4085
q2	2021	202	197	197
q3	10464	1290	1332	1290
q4	10154	865	942	865
q5	7654	3004	2991	2991
q6	224	140	141	140
q7	1075	621	642	621
q8	9446	1897	1976	1897
q9	8567	6648	6649	6648
q10	8796	3878	3855	3855
q11	433	259	249	249
q12	414	236	237	236
q13	17757	2918	2979	2918
q14	284	246	252	246
q15	532	487	501	487
q16	536	391	386	386
q17	984	935	945	935
q18	8082	7397	7270	7270
q19	1389	1224	1238	1224
q20	555	332	345	332
q21	5348	4821	4763	4763
q22	357	287	289	287
Total cold run time: 112792 ms
Total hot run time: 41922 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4136	4051	4081	4051
q2	325	227	222	222
q3	3031	3038	3161	3038
q4	2003	2073	2011	2011
q5	5632	5512	5496	5496
q6	219	133	133	133
q7	2162	1803	1850	1803
q8	3319	3390	3382	3382
q9	8776	8698	8911	8698
q10	3948	4067	3939	3939
q11	582	474	455	455
q12	776	572	568	568
q13	16368	3106	3125	3106
q14	312	277	279	277
q15	537	508	481	481
q16	461	406	419	406
q17	1795	1744	1737	1737
q18	8385	7812	7622	7622
q19	1723	1748	1728	1728
q20	2083	1864	1834	1834
q21	5764	5593	5302	5302
q22	528	492	470	470
Total cold run time: 72865 ms
Total hot run time: 56759 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 170203 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c02b7c618b52d091022577a83d1148ed14e2641a, data reload: false

query1	924	389	367	367
query2	6477	1705	1698	1698
query3	6651	216	221	216
query4	20073	17532	17441	17441
query5	3678	526	518	518
query6	282	176	174	174
query7	4847	310	307	307
query8	267	210	194	194
query9	8536	2388	2372	2372
query10	427	281	273	273
query11	10463	10037	10262	10037
query12	125	91	88	88
query13	1648	405	385	385
query14	9318	6829	6858	6829
query15	206	169	170	169
query16	6931	499	475	475
query17	930	565	551	551
query18	1908	284	277	277
query19	191	144	145	144
query20	92	90	88	88
query21	222	100	95	95
query22	4232	4215	4020	4020
query23	33674	33916	33482	33482
query24	10223	3176	3106	3106
query25	701	411	428	411
query26	1711	164	164	164
query27	2971	284	287	284
query28	7401	2043	2009	2009
query29	1200	447	440	440
query30	232	153	159	153
query31	972	771	791	771
query32	100	61	59	59
query33	685	330	348	330
query34	956	502	524	502
query35	897	778	764	764
query36	1043	892	905	892
query37	206	82	83	82
query38	2916	2807	2780	2780
query39	875	836	814	814
query40	292	113	111	111
query41	48	46	43	43
query42	123	99	106	99
query43	470	420	417	417
query44	1184	742	769	742
query45	207	178	181	178
query46	1095	826	786	786
query47	1823	1760	1760	1760
query48	368	301	299	299
query49	1020	434	424	424
query50	909	440	449	440
query51	6836	6636	6630	6630
query52	102	89	92	89
query53	257	179	179	179
query54	631	465	459	459
query55	80	77	77	77
query56	277	258	254	254
query57	1128	1055	1041	1041
query58	265	277	259	259
query59	2569	2409	2254	2254
query60	290	274	269	269
query61	128	93	94	93
query62	885	654	658	654
query63	216	184	185	184
query64	5714	1901	1869	1869
query65	3182	3094	3128	3094
query66	1115	331	329	329
query67	15578	15029	14929	14929
query68	4367	580	585	580
query69	662	372	325	325
query70	1109	1066	1075	1066
query71	382	287	286	286
query72	7206	2703	2488	2488
query73	772	333	337	333
query74	6002	5706	5642	5642
query75	3398	2755	2738	2738
query76	2378	1221	1272	1221
query77	586	318	314	314
query78	9528	8951	8943	8943
query79	2417	548	541	541
query80	982	523	515	515
query81	563	225	226	225
query82	1008	132	128	128
query83	254	169	182	169
query84	261	80	84	80
query85	1268	339	302	302
query86	460	295	307	295
query87	3268	3108	3092	3092
query88	3357	2530	2518	2518
query89	394	295	288	288
query90	1698	195	199	195
query91	128	100	104	100
query92	60	50	52	50
query93	1867	614	620	614
query94	832	305	297	297
query95	377	270	275	270
query96	624	292	291	291
query97	3262	3091	3078	3078
query98	297	204	202	202
query99	1625	1301	1294	1294
Total cold run time: 263375 ms
Total hot run time: 170203 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.95 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c02b7c618b52d091022577a83d1148ed14e2641a, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.03	0.04
query3	0.22	0.05	0.05
query4	1.67	0.07	0.06
query5	0.49	0.49	0.48
query6	1.19	0.72	0.73
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.58	0.51	0.50
query10	0.57	0.57	0.56
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.62	0.62	0.60
query14	0.78	0.81	0.78
query15	0.91	0.86	0.87
query16	0.36	0.36	0.35
query17	0.99	1.00	1.02
query18	0.22	0.22	0.21
query19	1.87	1.78	1.71
query20	0.02	0.01	0.01
query21	15.40	0.77	0.67
query22	3.83	8.60	1.11
query23	17.91	1.36	1.34
query24	2.29	0.22	0.22
query25	0.18	0.08	0.08
query26	0.32	0.22	0.22
query27	0.46	0.24	0.24
query28	13.17	1.01	0.98
query29	12.51	3.32	3.33
query30	0.26	0.06	0.05
query31	2.87	0.40	0.41
query32	3.25	0.49	0.48
query33	2.95	2.96	2.95
query34	15.42	4.31	4.26
query35	4.31	4.31	4.30
query36	0.67	0.47	0.48
query37	0.19	0.16	0.16
query38	0.18	0.15	0.15
query39	0.04	0.03	0.04
query40	0.16	0.14	0.14
query41	0.10	0.04	0.05
query42	0.05	0.06	0.05
query43	0.05	0.04	0.04
Total cold run time: 107.55 s
Total hot run time: 29.95 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 6, 2024
Copy link
Contributor

github-actions bot commented Aug 6, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Aug 6, 2024

PR approved by anyone and no changes requested.

@starocean999 starocean999 merged commit e1c2da1 into apache:master Aug 6, 2024
29 of 30 checks passed
seawinde added a commit to seawinde/doris that referenced this pull request Aug 6, 2024
…sync materialized view (apache#38527)

Support to partition prune when query rewrite by sync materialized view
such as table def is as following:
```sql
        CREATE TABLE IF NOT EXISTS test_unique (
        `time` DATETIME NULL COMMENT '查询时间',
        `app_name` VARCHAR(64) NULL COMMENT '标识',
        `event_id` VARCHAR(128) NULL COMMENT '标识',
        `decision` VARCHAR(32) NULL COMMENT '枚举值',
        `id` VARCHAR(35) NOT NULL COMMENT 'od',
        `code` VARCHAR(64) NULL COMMENT '标识',
        `event_type` VARCHAR(32) NULL COMMENT '事件类型'
        )
        UNIQUE KEY(time)
        PARTITION BY RANGE(time)
        (
         FROM ("2024-07-01 00:00:00") TO ("2024-07-15 00:00:00") INTERVAL 1 HOUR
        )
        DISTRIBUTED BY HASH(time)
        BUCKETS 3 PROPERTIES ("replication_num" = "1");
```
sync materialized view def is

```sql
create materialized view as
    select
    app_name,
    event_id,
    time,
    count(*)
    from
    test_duplicate
    group by
    app_name,
    event_id,
    time;
```

if your query is following, if rewritten by sync materialized view
successfully, should partition prune
```sql
    select
    app_name,
    event_id,
    time,
    count(*)
    from
    test_duplicate
    where time < '2024-07-05 01:00:00'
    group by
    app_name,
    time,
    event_id;

```
gavinchou pushed a commit that referenced this pull request Aug 7, 2024
…sync materialized view (#38527)

## Proposed changes

Support to partition prune when query rewrite by sync materialized view
such as table def is as following:
```sql
        CREATE TABLE IF NOT EXISTS test_unique (
        `time` DATETIME NULL COMMENT '查询时间', 
        `app_name` VARCHAR(64) NULL COMMENT '标识', 
        `event_id` VARCHAR(128) NULL COMMENT '标识', 
        `decision` VARCHAR(32) NULL COMMENT '枚举值', 
        `id` VARCHAR(35) NOT NULL COMMENT 'od', 
        `code` VARCHAR(64) NULL COMMENT '标识', 
        `event_type` VARCHAR(32) NULL COMMENT '事件类型' 
        )
        UNIQUE KEY(time)
        PARTITION BY RANGE(time)                                                                                                                                                                                                                
        (                                                                                                                                                                                                                                      
         FROM ("2024-07-01 00:00:00") TO ("2024-07-15 00:00:00") INTERVAL 1 HOUR                                                                                                                                                                              
        )     
        DISTRIBUTED BY HASH(time)
        BUCKETS 3 PROPERTIES ("replication_num" = "1");
```
sync materialized view def is

```sql
create materialized view as
    select
    app_name,
    event_id,
    time,
    count(*)
    from 
    test_duplicate
    group by
    app_name,
    event_id,
    time;
```

if your query is following, if rewritten by sync materialized view
successfully, should partition prune
```sql
    select
    app_name,
    event_id,
    time,
    count(*)
    from 
    test_duplicate
    where time < '2024-07-05 01:00:00'
    group by
    app_name,
    time,
    event_id;

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.1-merged doing not-merge/2.1 reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants