-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DUCKDB] Native duckdb lance reader #347
Conversation
Filed #348 to track the optimization for zero-copy arrow=>ducdkb conversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add some unit tests?
perf optimization can happen separately
// TODO: dynamic_pointer_cast does not work here, IDK why. | ||
auto array = std::static_pointer_cast<typename ::arrow::TypeTraits<T>::ArrayType>(arr); | ||
assert(array != nullptr); | ||
// TODO: How to use zero copy to move data from arrow to duckdb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duckdb-pyarrow integration is zero-copy right? i forgot how but vaguely remember seeing code that uses the Arrow C structs to dress up arrow memory buffer as duckdb vector. not a blocker for this tho i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked a few places, i.e.
pythonpkg
https://github.com/duckdb/duckdb/blob/master/tools/pythonpkg/src/include/duckdb_python/array_wrapper.hpp- https://github.com/duckdb/duckdb/blob/master/src/include/duckdb/common/arrow/arrow_appender.hpp
Seems no code that convert from arrow => duckdb, it only has the reversed conversion (i..e, duckdb => arrow).
I can look into it more later. Filed #348 to track the work.
out->SetVectorType(::duckdb::VectorType::FLAT_VECTOR); | ||
} | ||
|
||
/// Convert a String array into duckdb vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. fixed.
auto indices_arr = std::static_pointer_cast<::arrow::Int8Array>(array->indices()); | ||
for (int i = 0; i < indices_arr->length(); ++i) { | ||
auto idx = indices_arr->Value(i); | ||
out->SetValue(i, std::string(dict_arr->Value(idx))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder whether this fixes the duckdb-arrow issue where list of dictionary querying doesn't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea probally, we can actually make (a hardcoded set of ) user defined types work via our plugins tho.
Support
SELECT * from lance_scan('s3://URI')
directly viaduckdb
extension.