-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sending schemas for empty streams #3594
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor nits
/// is not specified, an encoded Schema message will be sent when | ||
/// the first [`RecordBatch`], if any, is encoded. Some clients | ||
/// expect a Schema message even if there is no data sent. | ||
pub fn with_schema(mut self, schema: SchemaRef) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it will be more consistent to always require specifying schema instead of two mixed approaches (specifying it by this, getting from first RecordBatch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that the Flight protocol did not require sending a schema , however the format docs don't seem to specify one way or the other; https://arrow.apache.org/docs/format/Flight.html
Requiring a schema might make the API easier to use correctly with other Flight clients (I am pretty sure the issue we hit upstream in IOx was interoperability with the golang client).
Downstream in DataFusion we almost always have a https://docs.rs/datafusion/16.1.0/datafusion/physical_plan/type.SendableRecordBatchStream.html which has a schema attached.
So in summary, I could go either way. Does anyone else have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I hit an issue downstream https://github.com/influxdata/influxdb_iox/pull/6711 that makes me think @viirya 's idea to always require a schema is the best (otherwise it is easy to use this interface incorrectly). I'll make another PR shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what requiring a schema to encode the stream would look like: #3609
Co-authored-by: Raphael Taylor-Davies <[email protected]>
Benchmark runs are scheduled for baseline = 1afefbb and contender = 902a17d. 902a17d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3591
Rationale for this change
Some Flight clients expect to see Schema messages even if there is no data (e.g. an empty result set)
What changes are included in this PR?
with_schema
toFlightDataEncoderBuilder
to specify schema when creating streamsFlightDataDecoder
Are there any user-facing changes?
FlightDataDecoder
now hasschema()
that returns the actual schema rather thangot_schema()
which returns a booleanFlightDataEncoderBuilder::with_schema