Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite array_append/array_prepend to remove deplicate codes #8108

Merged
merged 3 commits into from
Nov 10, 2023

Conversation

Veeupup
Copy link
Contributor

@Veeupup Veeupup commented Nov 9, 2023

Which issue does this PR close?

One issue from #7988 array_append && array_prepend.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Nov 9, 2023
@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 9, 2023

@alamb @jayzhan211 PTAL ^ . ^

@Veeupup Veeupup changed the title rewrite array_append to remove deplicate codes rewrite array_append/array_prepend to remove deplicate codes Nov 9, 2023
@alamb
Copy link
Contributor

alamb commented Nov 9, 2023

Thanks @Veeupup -- I'll check it out shortly

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Veeupup ! Could you also move the related tests to sqllogictest?

@jayzhan211
Copy link
Contributor

My expected way for array_append/array_prepend is building array via MutableArrayData, so I need to think about whether row converter is the better choice or not.

@jayzhan211
Copy link
Contributor

I wrote another function with MutableArrayData and it is a lot faster. 30000 micro second vs 300 micro second. @Veeupup Can you also try with MutableArrayData and compare it yourself, if you also got the same result, we can go with MutableArrayData approach.

@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 10, 2023

Thank you @Veeupup ! Could you also move the related tests to sqllogictest?

Sure! I'll make it later

@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 10, 2023

I wrote another function with MutableArrayData and it is a lot faster. 30000 micro second vs 300 micro second. @Veeupup Can you also try with MutableArrayData and compare it yourself, if you also got the same result, we can go with MutableArrayData approach.

Appreciate your advice! I'll try this way and benchmark it then

@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 10, 2023

Thank you @Veeupup ! Could you also move the related tests to sqllogictest?

hi @Weijun-H ! it seems that I do not need to do anything? (Or just remove ut fn test_array_prepend and fn test_array_append)

I have checked the sqllogictests with array_append/prepend, and it seems that sqllogictests already have these cases.

@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 10, 2023

@jayzhan211 I have implemented the array_append in MutableArray and RowConverter and it shows that Mutable performs better too.

Here are the benchmarks for sqllogictests:

MutableArray RowConverter
29.297µs 93.433µs
20.555µs 32.201µs
19.251µs 86.561µs
102.495µs 143.772µs
58.49µs 35.456µs
56.035µs 33.316µs
45.607µs 79.701µs
23.508µs 33.592µs
22.87µs 37.584µs
39.27µs 56.071µs
23.973µs 35.368µs
24.97µs 37.753µs
44.644µs 59.515µs
26.602µs 35.498µs
22.272µs 37.957µs
97.846µs 169.643µs
107.068µs 123.551µs
55.058µs 178.546µs
45.853µs 109.962µs
82.722µs 90.556µs
43.62µs 44.157µs

@Veeupup Veeupup requested a review from Weijun-H November 10, 2023 16:41
@alamb
Copy link
Contributor

alamb commented Nov 10, 2023

Very nice!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Veeupup and @jayzhan211 -- this looks a lot nicer to me. I think there is even more performance to be gained here by avoiding concat (and using ArrayData to build up the final output) but this is a lot better than what we have currently.

Thanks a lot 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants