feat: add to_parquet_dataset function #2898

zbilodea · 2023-12-13T15:46:58Z

No description provided.

codecov · 2023-12-13T15:54:34Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (b749e49) 81.90% compared to head (f023205) 81.93%.
Report is 3 commits behind head on main.

Additional details and impacted files

Files	Coverage Δ
src/awkward/operations/__init__.py	`100.00% <100.00%> (ø)`
src/awkward/operations/ak_to_parquet_dataset.py	`90.38% <90.38%> (ø)`

... and 2 files with indirect coverage changes

…it-hep/awkward into feat-add-to-parquet-dataset

codecov-commenter · 2024-02-07T09:40:00Z

Codecov Report

Attention: Patch coverage is 90.56604% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 81.93%. Comparing base (b749e49) to head (f023205).
Report is 37 commits behind head on main.

❗ Current head f023205 differs from pull request most recent head ba0101d. Consider uploading reports for the commit ba0101d to get more accurate results

Additional details and impacted files

Files	Coverage Δ
src/awkward/operations/__init__.py	`100.00% <100.00%> (ø)`
src/awkward/operations/ak_to_parquet_dataset.py	`90.38% <90.38%> (ø)`

... and 2 files with indirect coverage changes

zbilodea · 2024-02-07T15:53:34Z

@jpivarski Having a strange problem here - the fsspec implementation worked with S3, but now there's two tests failing because of this:

schema = pyarrow_parquet.ParquetFile(filepath, filesystem=fs).schema_arrow
Throwing: TypeError: init() got an unexpected keyword argument 'filesystem'

It's only happening with two out of four tests (there are no obvious similarities), and 'filesystem' is definitely a keyword argument as shown in the [docs...] (https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html#:~:text=most%20Parquet%20files.-,filesystem,-FileSystem%2C%20default%20None)
I'll keep trying to figure it out

martindurant · 2024-02-08T16:12:50Z

@zbilodea , the "minimal" Ci run has pyarrow 7.0.0 (2 yr old), so I'm not too surprised if the API has changed in that time.

martindurant · 2024-02-08T16:14:29Z

Here is the v7 doc page: https://arrow.apache.org/docs/7.0/python/generated/pyarrow.parquet.ParquetFile.html , and indeed I don't see filesystem= . The m=previous method would be to open the file and pass that:

with fsspec.open(...) as f:
     pa.ParquetFile(f, ...)

jpivarski · 2024-03-04T23:36:34Z

What's the status of this PR, @zbilodea?

zbilodea · 2024-03-06T14:57:45Z

I made all the changes, tested it with S3 and it worked. All set as far as I can tell!

jpivarski · 2024-03-20T19:24:07Z

I think this is done but we lost track of it before merging.

If it's done, @zbilodea, please merge it!

added to_parquet_dataset, passes simple test

3441004

zbilodea marked this pull request as draft December 13, 2023 15:47

style: pre-commit fixes

5f48c02

pre-commit-ci bot temporarily deployed to docs December 13, 2023 16:03 Inactive

agoose77 mentioned this pull request Dec 19, 2023

ci: simplify test workflow #2869

Merged

agoose77 force-pushed the main branch from ed5c6df to f6d6f5c Compare December 19, 2023 21:21

Merge branch 'main' into feat-add-to-parquet-dataset

28c5a1a

agoose77 temporarily deployed to docs December 19, 2023 22:45 — with GitHub Actions Inactive

added more complicated test

7de6817

zbilodea temporarily deployed to docs January 22, 2024 13:55 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

8dd30eb

zbilodea temporarily deployed to docs January 22, 2024 14:36 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

b3640b5

zbilodea temporarily deployed to docs January 25, 2024 10:29 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

2e5c7bc

zbilodea temporarily deployed to docs January 25, 2024 16:34 — with GitHub Actions Inactive

zbilodea and others added 6 commits January 26, 2024 10:11

Merge branch 'main' into feat-add-to-parquet-dataset

2056f30

fixed test

8172c51

style: pre-commit fixes

bf4b66c

formatting for tests

cadc4d2

Merge branch 'feat-add-to-parquet-dataset' of https://github.com/scik…

e264d83

…it-hep/awkward into feat-add-to-parquet-dataset

removed unnecessary check from test

838f156

zbilodea temporarily deployed to docs January 26, 2024 09:43 — with GitHub Actions Inactive

added to docstrings

56df512

zbilodea temporarily deployed to docs January 26, 2024 09:58 — with GitHub Actions Inactive

one additional check

ec3a1d7

zbilodea temporarily deployed to docs January 26, 2024 10:35 — with GitHub Actions Inactive

zbilodea marked this pull request as ready for review January 26, 2024 11:01

zbilodea requested a review from jpivarski January 26, 2024 11:01

zbilodea temporarily deployed to docs February 5, 2024 14:55 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

dff39ad

zbilodea temporarily deployed to docs February 7, 2024 09:46 — with GitHub Actions Inactive

changed tests again to try and solve pytest adjacent error...

5215564

zbilodea temporarily deployed to docs February 7, 2024 13:59 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

9135501

zbilodea temporarily deployed to docs February 7, 2024 14:47 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

98dfc98

zbilodea temporarily deployed to docs February 12, 2024 12:31 — with GitHub Actions Inactive

Changed arguments for pyarrow 7.0

1ed0b7c

zbilodea temporarily deployed to docs February 12, 2024 12:51 — with GitHub Actions Inactive

zbilodea marked this pull request as ready for review February 12, 2024 13:00

Merge branch 'main' into feat-add-to-parquet-dataset

89ee769

zbilodea temporarily deployed to docs March 6, 2024 07:54 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

632338d

zbilodea temporarily deployed to docs March 6, 2024 15:06 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

f945366

jpivarski temporarily deployed to docs March 20, 2024 20:26 — with GitHub Actions Inactive

Merge branch 'main' into feat-add-to-parquet-dataset

ba0101d

zbilodea enabled auto-merge (squash) March 20, 2024 20:48

zbilodea deployed to docs March 20, 2024 20:58 — with GitHub Actions View deployment

zbilodea merged commit 3270642 into main Mar 20, 2024
38 checks passed

zbilodea deleted the feat-add-to-parquet-dataset branch March 20, 2024 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add to_parquet_dataset function #2898

feat: add to_parquet_dataset function #2898

zbilodea commented Dec 13, 2023

codecov bot commented Dec 13, 2023 •

edited

Loading

codecov-commenter commented Feb 7, 2024 •

edited by codecov bot

Loading

zbilodea commented Feb 7, 2024

martindurant commented Feb 8, 2024

martindurant commented Feb 8, 2024

jpivarski commented Mar 4, 2024

zbilodea commented Mar 6, 2024

jpivarski commented Mar 20, 2024

feat: add to_parquet_dataset function #2898

feat: add to_parquet_dataset function #2898

Conversation

zbilodea commented Dec 13, 2023

codecov bot commented Dec 13, 2023 • edited Loading

Codecov Report

codecov-commenter commented Feb 7, 2024 • edited by codecov bot Loading

Codecov Report

zbilodea commented Feb 7, 2024

martindurant commented Feb 8, 2024

martindurant commented Feb 8, 2024

jpivarski commented Mar 4, 2024

zbilodea commented Mar 6, 2024

jpivarski commented Mar 20, 2024

codecov bot commented Dec 13, 2023 •

edited

Loading

codecov-commenter commented Feb 7, 2024 •

edited by codecov bot

Loading