-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add to_parquet_dataset function #2898
Conversation
Codecov ReportAttention:
Additional details and impacted files
|
…it-hep/awkward into feat-add-to-parquet-dataset
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
|
@jpivarski Having a strange problem here - the fsspec implementation worked with S3, but now there's two tests failing because of this: schema = pyarrow_parquet.ParquetFile(filepath, filesystem=fs).schema_arrow It's only happening with two out of four tests (there are no obvious similarities), and 'filesystem' is definitely a keyword argument as shown in the [docs...] (https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html#:~:text=most%20Parquet%20files.-,filesystem,-FileSystem%2C%20default%20None) |
@zbilodea , the "minimal" Ci run has pyarrow 7.0.0 (2 yr old), so I'm not too surprised if the API has changed in that time. |
Here is the v7 doc page: https://arrow.apache.org/docs/7.0/python/generated/pyarrow.parquet.ParquetFile.html , and indeed I don't see filesystem= . The m=previous method would be to open the file and pass that:
|
What's the status of this PR, @zbilodea? |
I made all the changes, tested it with S3 and it worked. All set as far as I can tell! |
I think this is done but we lost track of it before merging. If it's done, @zbilodea, please merge it! |
No description provided.