-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support RecordBatch
with zero columns but non zero row count, add field to RecordBatchOptions
(#1536)
#1552
Conversation
schema: &SchemaRef, | ||
columns: &[ArrayRef], | ||
/// if any validation check fails, otherwise returns the created [`RecordBatch`] | ||
fn try_new_impl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure why we need a separate method for this, but I avoided changing it to keep the diff down
@@ -402,15 +402,20 @@ impl RecordBatch { | |||
|
|||
/// Options that control the behaviour used when creating a [`RecordBatch`]. | |||
#[derive(Debug)] | |||
#[non_exhaustive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new option imo shouldn't be a breaking change, this should give us that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it is time to make these fields non pub
and add with_match_field_name(&mut self, val: bool)
type accessors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel particularly strongly either way, although have to write and maintain setters and accessors is a bit of a PIA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although have to write and maintain setters and accessors is a bit of a PIA
The price for backwards compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non_exhaustive should make additive changes non-breaking, which should be good enough hopefully 😀
Happy to change if you feel strongly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good 👍
arrow/src/record_batch.rs
Outdated
"must either specify a row count or at least one column".to_string(), | ||
) | ||
})?; | ||
|
||
if columns.iter().any(|c| c.len() != row_count) { | ||
return Err(ArrowError::InvalidArgumentError( | ||
"all columns in a record batch must have the same length".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error will now also happen when the specified row count doesn't match the array counts (so the wording might be good to update?)
@@ -243,7 +248,7 @@ impl RecordBatch { | |||
/// # } | |||
/// ``` | |||
pub fn num_rows(&self) -> usize { | |||
self.columns[0].data().len() | |||
self.row_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could avoid adding row_count
with something like
self
.columns
.get(0)
.map(|col| col.data.len())
.unwrap_or(0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would always return 0 if there are no columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see -- I think I was a little confused -- is the idea that the schema also has 0 fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it's basically a way to support projecting no columns and just getting the row count. This is needed for #1537
@@ -402,15 +402,20 @@ impl RecordBatch { | |||
|
|||
/// Options that control the behaviour used when creating a [`RecordBatch`]. | |||
#[derive(Debug)] | |||
#[non_exhaustive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it is time to make these fields non pub
and add with_match_field_name(&mut self, val: bool)
type accessors?
@@ -402,15 +402,20 @@ impl RecordBatch { | |||
|
|||
/// Options that control the behaviour used when creating a [`RecordBatch`]. | |||
#[derive(Debug)] | |||
#[non_exhaustive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although have to write and maintain setters and accessors is a bit of a PIA
The price for backwards compatibility?
RecordBatch
with No Columns but Non-Zero Row Count (#1536)
RecordBatch
with No Columns but Non-Zero Row Count (#1536)RecordBatch
with No Columns but Non-Zero Row Count, add field to RecordBatchOptions
(#1536)
RecordBatch
with No Columns but Non-Zero Row Count, add field to RecordBatchOptions
(#1536)RecordBatch
with zero columns but non zero row count, add field to RecordBatchOptions
(#1536)
Which issue does this PR close?
Closes #1536
Rationale for this change
See ticket
What changes are included in this PR?
Makes it possible to construct a RecordBatch with no columns, but a non-zero row count. I had a brief search in the codebase and couldn't find anything obvious that this would break, but I guess there is only one way to find out 😄
Are there any user-facing changes?
This technically makes a breaking change to
RecordBatchOptions
as it wasn't marked as non-exhaustive.