Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance](opt) opt the order by performance in permutation #38985

Merged
merged 1 commit into from
Aug 8, 2024

Conversation

HappenLee
Copy link
Contributor

Proposed changes

Before:

select l_quantity from lineitem order by l_quantity limit 10000020;
+--------------+
| ReturnedRows |
+--------------+
| 10000020     |
+--------------+
1 row in set (2 min 24.42 sec)

after:

mysql [tpch]>select l_quantity from lineitem order by l_quantity limit 10000020;
+--------------+
| ReturnedRows |
+--------------+
| 10000020     |
+--------------+
1 row in set (28.42 sec)

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@HappenLee
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

std::partial_sort(res.begin(), sort_end, res.end(),
[this](size_t a, size_t b) { return data[a] < data[b]; });
} else {
if (reverse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (reverse)
if (reverse) {

be/src/vec/columns/column_decimal.h:283:

-             else
+             } else

if (reverse)
pdqsort(res.begin(), res.end(),
[this](size_t a, size_t b) { return data[a] > data[b]; });
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
else
else {

be/src/vec/columns/column_decimal.h:285:

-                         [this](size_t a, size_t b) { return data[a] < data[b]; });
+                         [this](size_t a, size_t b) { return data[a] < data[b]; });
+ }

limit = 0;
}
// std::partial_sort need limit << s can get performance benefit
if (limit > (s / 8.0)) limit = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (limit > (s / 8.0)) limit = 0;
if (limit > (s / 8.0)) { limit = 0;
}

@@ -236,7 +236,8 @@ void ColumnVector<T>::get_permutation(bool reverse, size_t limit, int nan_direct

if (s == 0) return;

if (limit >= s) limit = 0;
// std::partial_sort need limit << s can get performance benefit
if (limit > (s / 8.0)) limit = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (limit > (s / 8.0)) limit = 0;
if (limit > (s / 8.0)) { limit = 0;
}

@HappenLee
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

std::partial_sort(res.begin(), sort_end, res.end(),
[this](size_t a, size_t b) { return data[a] < data[b]; });
} else {
if (reverse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (reverse)
if (reverse) {

be/src/vec/columns/column_decimal.h:284:

-             else
+             } else

if (reverse)
pdqsort(res.begin(), res.end(),
[this](size_t a, size_t b) { return data[a] > data[b]; });
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
else
else {

be/src/vec/columns/column_decimal.h:286:

-                         [this](size_t a, size_t b) { return data[a] < data[b]; });
+                         [this](size_t a, size_t b) { return data[a] < data[b]; });
+ }

@HappenLee
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -21,6 +21,7 @@
#pragma once

#include <glog/logging.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'glog/logging.h' file not found [clang-diagnostic-error]

#include <glog/logging.h>
         ^

@HappenLee
Copy link
Contributor Author

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 42442 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2cce2f167edf3db41b5925bcfa147ebf9891a387, data reload: false

------ Round 1 ----------------------------------
q1	19047	4247	4198	4198
q2	2516	208	207	207
q3	10728	1417	1413	1413
q4	11176	866	989	866
q5	8181	3087	3029	3029
q6	221	140	140	140
q7	1086	631	629	629
q8	9454	1969	2015	1969
q9	8478	6603	6661	6603
q10	8777	3868	3864	3864
q11	427	259	257	257
q12	435	237	239	237
q13	17768	2982	2957	2957
q14	276	248	249	248
q15	548	486	497	486
q16	499	413	418	413
q17	995	916	899	899
q18	8196	7305	7381	7305
q19	1381	1223	1227	1223
q20	577	326	337	326
q21	5414	4887	4952	4887
q22	356	294	286	286
Total cold run time: 116536 ms
Total hot run time: 42442 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4119	4026	4018	4018
q2	334	237	218	218
q3	3038	3031	3003	3003
q4	1938	2033	1922	1922
q5	5291	5310	5277	5277
q6	221	134	135	134
q7	2098	1681	1696	1681
q8	3207	3312	3297	3297
q9	8359	8383	8365	8365
q10	3782	3881	3873	3873
q11	557	455	470	455
q12	757	607	547	547
q13	12822	3007	2978	2978
q14	290	274	255	255
q15	514	475	489	475
q16	448	408	404	404
q17	1762	1724	1698	1698
q18	7780	7292	7258	7258
q19	1739	1676	1679	1676
q20	1969	1769	1774	1769
q21	5640	5325	5421	5325
q22	515	469	485	469
Total cold run time: 67180 ms
Total hot run time: 55097 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169462 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2cce2f167edf3db41b5925bcfa147ebf9891a387, data reload: false

query1	925	384	374	374
query2	6501	1669	1726	1669
query3	6669	213	225	213
query4	20064	17578	17427	17427
query5	4295	521	546	521
query6	293	163	169	163
query7	4617	293	292	292
query8	251	199	195	195
query9	8503	2354	2336	2336
query10	457	270	264	264
query11	10549	10035	10064	10035
query12	140	87	85	85
query13	1620	379	368	368
query14	9172	7634	6880	6880
query15	198	163	159	159
query16	7102	476	446	446
query17	952	586	572	572
query18	1914	284	284	284
query19	211	142	145	142
query20	94	86	85	85
query21	209	103	97	97
query22	4288	4216	4073	4073
query23	33541	33161	32983	32983
query24	10389	3093	3092	3092
query25	717	377	386	377
query26	1779	163	148	148
query27	2872	280	279	279
query28	6921	1962	1949	1949
query29	1325	419	420	419
query30	290	150	155	150
query31	937	749	768	749
query32	102	54	54	54
query33	731	335	329	329
query34	920	503	478	478
query35	847	737	751	737
query36	981	854	889	854
query37	208	81	77	77
query38	2849	2723	2745	2723
query39	866	810	794	794
query40	276	113	111	111
query41	47	44	45	44
query42	128	106	116	106
query43	482	427	427	427
query44	1162	720	721	720
query45	210	180	179	179
query46	1100	809	774	774
query47	1801	1694	1734	1694
query48	355	293	292	292
query49	1212	445	445	445
query50	895	450	437	437
query51	6773	6711	6731	6711
query52	110	89	94	89
query53	261	186	187	186
query54	619	491	453	453
query55	78	76	77	76
query56	288	263	263	263
query57	1177	1051	1050	1050
query58	278	284	293	284
query59	2435	2288	2389	2288
query60	287	268	276	268
query61	98	95	99	95
query62	919	668	665	665
query63	226	184	186	184
query64	5890	2017	1970	1970
query65	3160	3101	3098	3098
query66	1448	345	349	345
query67	15339	14786	14790	14786
query68	4399	596	598	596
query69	453	334	302	302
query70	1089	1069	1066	1066
query71	435	291	283	283
query72	7256	2842	2635	2635
query73	784	336	334	334
query74	6090	5639	5689	5639
query75	3365	2747	2738	2738
query76	2576	1214	1277	1214
query77	534	338	340	338
query78	9656	8874	8892	8874
query79	2418	553	544	544
query80	1085	529	560	529
query81	586	232	231	231
query82	722	138	132	132
query83	247	192	180	180
query84	275	90	88	88
query85	1944	368	362	362
query86	501	292	308	292
query87	3260	3102	3119	3102
query88	3844	2599	2437	2437
query89	395	292	287	287
query90	1858	197	200	197
query91	128	101	100	100
query92	58	48	51	48
query93	2464	620	611	611
query94	902	300	305	300
query95	385	274	277	274
query96	605	277	279	277
query97	3220	3068	3057	3057
query98	220	209	197	197
query99	1636	1311	1286	1286
Total cold run time: 265743 ms
Total hot run time: 169462 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.91 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2cce2f167edf3db41b5925bcfa147ebf9891a387, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.03	0.04
query3	0.22	0.04	0.04
query4	1.69	0.06	0.06
query5	0.49	0.47	0.48
query6	1.15	0.73	0.72
query7	0.02	0.01	0.01
query8	0.05	0.05	0.05
query9	0.57	0.53	0.50
query10	0.55	0.57	0.55
query11	0.16	0.11	0.11
query12	0.15	0.12	0.13
query13	0.60	0.62	0.60
query14	0.78	0.80	0.80
query15	0.94	0.87	0.86
query16	0.36	0.35	0.36
query17	1.01	1.00	1.04
query18	0.23	0.21	0.21
query19	1.83	1.75	1.69
query20	0.02	0.01	0.03
query21	15.40	0.76	0.67
query22	4.65	7.75	1.09
query23	17.94	1.40	1.40
query24	2.28	0.22	0.22
query25	0.19	0.08	0.08
query26	0.31	0.21	0.22
query27	0.47	0.23	0.23
query28	13.16	1.00	0.98
query29	12.64	3.35	3.30
query30	0.26	0.05	0.05
query31	2.88	0.40	0.40
query32	3.22	0.48	0.49
query33	2.94	2.94	2.94
query34	15.45	4.31	4.29
query35	4.29	4.29	4.32
query36	0.67	0.48	0.48
query37	0.18	0.15	0.16
query38	0.17	0.15	0.14
query39	0.04	0.03	0.03
query40	0.16	0.13	0.13
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.04	0.03	0.05
Total cold run time: 108.44 s
Total hot run time: 29.91 s

HappenLee added a commit that referenced this pull request Aug 7, 2024
## Proposed changes

cherry pick #38985

<!--Describe your changes.-->
Copy link
Contributor

github-actions bot commented Aug 7, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 7, 2024
Copy link
Contributor

github-actions bot commented Aug 7, 2024

PR approved by anyone and no changes requested.

@HappenLee HappenLee merged commit df55639 into apache:master Aug 8, 2024
29 of 30 checks passed
HappenLee added a commit that referenced this pull request Aug 10, 2024
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Aug 14, 2024
…e#38985)

## Proposed changes

Before:
```
select l_quantity from lineitem order by l_quantity limit 10000020;
+--------------+
| ReturnedRows |
+--------------+
| 10000020     |
+--------------+
1 row in set (2 min 24.42 sec)

```

after:
```
mysql [tpch]>select l_quantity from lineitem order by l_quantity limit 10000020;
+--------------+
| ReturnedRows |
+--------------+
| 10000020     |
+--------------+
1 row in set (28.42 sec)
```

<!--Describe your changes.-->
@wm1581066 wm1581066 added the usercase Important user case type label label Aug 21, 2024
yiguolei pushed a commit that referenced this pull request Aug 24, 2024
## Proposed changes

Issue Number: cherry pick #38985

<!--Describe your changes.-->
GoGoWen pushed a commit to GoGoWen/incubator-doris that referenced this pull request Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.15-merged dev/2.1.6-merged dev/3.0.1-merged doing reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants