-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add example for custom file format with COPY TO
#11174
Conversation
COPY TO
COPY TO
COPY TO
ece7907
to
04d7fe7
Compare
04d7fe7
to
c657653
Compare
c657653
to
c1af955
Compare
@@ -29,5 +29,6 @@ do | |||
# Skip tests that rely on external storage and flight | |||
if [ ! -d $filename ]; then | |||
cargo run --example $example_name | |||
cargo clean -p datafusion-examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into consistent build issues where the runner would run out of disk space. This is the best solution I could come up with. It seems to add about 5-10 seconds to the example action, which takes about 12 minutes overall.
Certainly open to alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hitting similar issues with trying to add new examples in #11089
This seems like a good idea to me
My best hope is to move some of the example binaries into inline examples in the docs instead: #11178
Hopefully that will free up some additional space as well as make the examples easier to navigate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
er, sorry this came up. There are a few other tricks here (actions/runner-images#2840 (comment)) to clear up some disk space that may help also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I think this is a good excuse / reason to take another pass through the examples directory (and the library guide that you started however long ago).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tshauck for putting this example together! I think that wrapping an existing FileFormat
was a clever way to demonstrate this without tons of boilerplate code being required for a working example. It ran locally for me no issues.
I left just one small suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @tshauck -- I also think this looks really nice.
Thank you for the review @devinjdangelo
I agree the idea of wrapping an existing format to show the API is clever. I was thinking it would be awesome to somehow cook up a simple to implement custom format that would handle things entirely end to end but I can't think of any format suitable simple.
Yeah, I started down the road of something custom, but it turned out to be a non-trivial amount of code. Perhaps adding a link to something like https://github.com/datafusion-contrib/datafusion-orc would show a more complex example, though it doesn't implement |
Thanks again @tshauck and @devinjdangelo |
* feat: add example for copy to * better docs plus tempdir * build: clean examples if over 10GB * only 1GB * build: try clearing some disk space before running * build: remove sudo * build: try clean * build: run clean * build: only clean examples * docs: better output for example
Which issue does this PR close?
Closes #11079
Rationale for this change
Adds an example of for how to
COPY table TO
a custom file format.What changes are included in this PR?
Created sort of a mock file format factory and file format that just wrap the CSV one but for a TSV.
Are these changes tested?
yes, I've run the example
Are there any user-facing changes?
no