Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add to_parquet_dataset function #2898

Merged
merged 51 commits into from
Mar 20, 2024
Merged

Conversation

zbilodea
Copy link
Collaborator

No description provided.

@zbilodea zbilodea marked this pull request as draft December 13, 2023 15:47
Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (b749e49) 81.90% compared to head (f023205) 81.93%.
Report is 3 commits behind head on main.

Additional details and impacted files
Files Coverage Δ
src/awkward/operations/__init__.py 100.00% <100.00%> (ø)
src/awkward/operations/ak_to_parquet_dataset.py 90.38% <90.38%> (ø)

... and 2 files with indirect coverage changes

@zbilodea zbilodea marked this pull request as ready for review January 26, 2024 11:01
@zbilodea zbilodea requested a review from jpivarski January 26, 2024 11:01
@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2024

Codecov Report

Attention: Patch coverage is 90.56604% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 81.93%. Comparing base (b749e49) to head (f023205).
Report is 37 commits behind head on main.

❗ Current head f023205 differs from pull request most recent head ba0101d. Consider uploading reports for the commit ba0101d to get more accurate results

Additional details and impacted files
Files Coverage Δ
src/awkward/operations/__init__.py 100.00% <100.00%> (ø)
src/awkward/operations/ak_to_parquet_dataset.py 90.38% <90.38%> (ø)

... and 2 files with indirect coverage changes

@zbilodea
Copy link
Collaborator Author

zbilodea commented Feb 7, 2024

@jpivarski Having a strange problem here - the fsspec implementation worked with S3, but now there's two tests failing because of this:

schema = pyarrow_parquet.ParquetFile(filepath, filesystem=fs).schema_arrow
Throwing: TypeError: init() got an unexpected keyword argument 'filesystem'

It's only happening with two out of four tests (there are no obvious similarities), and 'filesystem' is definitely a keyword argument as shown in the [docs...] (https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html#:~:text=most%20Parquet%20files.-,filesystem,-FileSystem%2C%20default%20None)
I'll keep trying to figure it out

@martindurant
Copy link
Contributor

@zbilodea , the "minimal" Ci run has pyarrow 7.0.0 (2 yr old), so I'm not too surprised if the API has changed in that time.

@martindurant
Copy link
Contributor

Here is the v7 doc page: https://arrow.apache.org/docs/7.0/python/generated/pyarrow.parquet.ParquetFile.html , and indeed I don't see filesystem= . The m=previous method would be to open the file and pass that:

with fsspec.open(...) as f:
     pa.ParquetFile(f, ...)

@zbilodea zbilodea marked this pull request as ready for review February 12, 2024 13:00
@jpivarski
Copy link
Member

What's the status of this PR, @zbilodea?

@zbilodea
Copy link
Collaborator Author

zbilodea commented Mar 6, 2024

I made all the changes, tested it with S3 and it worked. All set as far as I can tell!

@jpivarski
Copy link
Member

I think this is done but we lost track of it before merging.

If it's done, @zbilodea, please merge it!

@zbilodea zbilodea enabled auto-merge (squash) March 20, 2024 20:48
@zbilodea zbilodea merged commit 3270642 into main Mar 20, 2024
38 checks passed
@zbilodea zbilodea deleted the feat-add-to-parquet-dataset branch March 20, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants