-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add HashStringAllocator::InputStream #12364
Conversation
This pull request was exported from Phabricator. Differential Revision: D69750088 |
✅ Deploy Preview for meta-velox canceled.
|
2be69b0
to
d8f1814
Compare
Summary: When we get `ByteInputStream` from `HashStringAllocator`, we used to have to materialize all the byte ranges in a vector, which is not efficient. This change improves the efficiency by creating a `ByteInputStream` directly over the linked list of a multi-part allocation. Differential Revision: D69750088
This pull request was exported from Phabricator. Differential Revision: D69750088 |
return range_.position == range_.size && !header_->isContinued(); | ||
} | ||
|
||
std::streampos tellp() const final { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we end up calling this often? It would probably be cheaper to just maintain a counter instead of counting it on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't even see this got called in real query. It's doing the same thing as BufferedOutputStream::tellp
so at least there will be no regression.
} | ||
|
||
void seekp(std::streampos pos) final { | ||
header_ = begin_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updates to header_ and range_ are tightly bound, everywhere you call resetRange(), the line above sets header_ to some new value.
it looks like it'd be easier/safer to make resetRange into resetHeader (or something like that)
void resetHeader(const Header* header) {
VELOX_DCHECK_GT(header->usableSize(), 0);
header_ = header;
range_.buffer = reinterpret_cast<uint8_t*>(header_->begin());
range_.size = header_->usableSize();
range_.position = 0;
}
HashStringAllocator::headerOf(value.data())); | ||
stream->readBytes(dest, copySize); | ||
stream.ByteInputStream::readBytes(dest, copySize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to add this prefix because the InputStream class was forward declared inside the HashStringAllocator, right?
Could you inline the definition in side of HashStringAllocator, or define it outside the class in the same header? HashStringAllocatorInputStream would be just as clear as HashStringAllocator::InputStream, slightly shorter, and then we don't need all these ByteInputStream:: prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we need to prefix this because we are calling readBytes<T>
(a templated helper) instead of the virtual readBytes
, they are 2 different methods, and because they have same name, the virtual one is shadowing the templated one if called from subclass. Moving the subclass to different location cannot fix this.
Summary: When we get `ByteInputStream` from `HashStringAllocator`, we used to have to materialize all the byte ranges in a vector, which is not efficient. This change improves the efficiency by creating a `ByteInputStream` directly over the linked list of a multi-part allocation. Reviewed By: kevinwilfong Differential Revision: D69750088
d8f1814
to
6b23ab6
Compare
This pull request was exported from Phabricator. Differential Revision: D69750088 |
This pull request has been merged in c560aaf. |
Summary:
When we get
ByteInputStream
fromHashStringAllocator
, we used tohave to materialize all the byte ranges in a vector, which is not efficient.
This change improve the efficiency by creating a
ByteInputStream
directly overthe linked list of a multi-part allocation.
Differential Revision: D69750088