-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] VariableWidthViewVectorBenchmarks #41
Comments
Thanks for digging into this @ViggoC! I think we can take out the setZero since unreferenced data is unspecified |
@lidavidm So we can just remove it without side effort? |
What do you mean side effort? |
Is it side effect? |
Oh, good point. Technically there would be a side effect in that data could be left over in the buffer, and then make it into IPC, but that is the tradeoff. (I believe it's already possible if say you create a string vector, add data, and later set the row count to something lower than before - the extra data won't be cleared.) |
Yeah that make sense. And @ViggoC thanks for looking into this, I am looking into the PR as well. This is much needed to improve the initial version. |
@vibhatha And I also see that VariableWidthViewVector allocated much more memory than VariableWidthVector in some cases. It runs out of memory(8G) before I changed the |
Hmm, I think we still allocate power-of-two buffers for each of the data buffers, probably we should instead round to some multiple of say 64K or 1 MB or something instead. The implementation is still new so we are working out how best to treat it. |
@ViggoC technically the answer is yes, I can do it, but in a practical sense my phase would be significantly slower than earlier because the time I could allocate would be significantly smaller. If this needs fixing sooner, the best I can do is help to figure out things to do and definitely review the PR if you could work on this feature. |
IMHO, The only side effect of removing
|
ah...are the requirements for variable width vector different? if they must be 0-padded then we can't do anything here. |
No, that diagram only applies to short strings. It's not specified what happens to long strings. |
It would be better to write some test cases and evaluate this. |
"Any remaining bytes after the string itself are padded with 0." |
yeah, when i mentioned padding, i mean short string, there is no padding data for long string. for long string, every bytes in viewbuffer will be overwritten. so at least we can skip viewbuffer.setZero for long string. |
commit 30e595411e37b7ca2739c7005442ed7b329eeaea. the benchmark only involves long string, so it shows a big improvement.
|
The performance of VariableWidthViewVector is much slower than VariableWidthVector in some cases.
setZero
for overwriting is the bottleneck.Environment: 2.6 GHz Intel Core i7, MacOS 13.6, openjdk version "13.0.2"
See code of benchmark in apache/arrow#44634.
Component(s)
Java
The text was updated successfully, but these errors were encountered: