-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge a pre-computed column / array. #824
Conversation
dac6795
to
ce1a476
Compare
ce1a476
to
949dbab
Compare
rust/src/datatypes.rs
Outdated
pub(crate) fn lance_supports_nulls(datatype: &DataType) -> bool { | ||
match datatype { | ||
DataType::Utf8 | ||
| DataType::LargeUtf8 | ||
| DataType::Binary | ||
| DataType::FixedSizeBinary(_) | ||
| DataType::FixedSizeList(_, _) => true, | ||
_ => false, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there types I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct
and list
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes list
. struct
I am not sure. How do we encode structs? They don't have offsets so I don't think we support nulls there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm . good point, we think we don't really encode struct itself, but for each field, it encodes separately.
Let's get it going for now.
aebd0e0
to
6b2e770
Compare
arrow-schema = "37.0" | ||
arrow-select = "37.0" | ||
async-recursion = "1.0" | ||
async-trait = "0.1.60" | ||
byteorder = "1.4.3" | ||
chrono = "0.4.23" | ||
clap = { version = "4.1.1", features = ["derive"], optional = true } | ||
# This is already used by datafusion | ||
dashmap = "5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what is benefit of this lib over std::HashMap
in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dashmap allows multiple threads to insert concurrently, whereas std::HashMap doesn't allow concurrent inserts. This is just useful for building up the hashmap very fast, given we have the RHS data in memory already for most cases. And as mentioned in the comment, this is already used by DataFusion, so there's no incremental compilation cost from adding it to our dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i am fine with that.
Minor question: we need to keep dashmap
version sync with datafusion, right? Will cargo pull two different packages if we used the different versions? Not blocker for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do. They use 5 right now. We should check all of our dependencies for duplicates periodically, including tokio.
.duration_since(SystemTime::UNIX_EPOCH) | ||
.unwrap() | ||
.as_nanos(); | ||
self.timestamp_nanos = nanos as u128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this persisted in protobuf?
do we need nano
resolution ?
Also might want to have a timezone, so that this is reliable for either "system clock" or cross-datacenter writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this persisted in protobuf?
Yes. This is just the existing behavior moved over to this file so it's easier to work with.
But I'll note this in our doc on OCC, since we might want to update when considering concurrent writers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved per Lei's review
Closes #587