Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unneeded partitioning #1417

Merged
merged 1 commit into from
Dec 20, 2024
Merged

Remove unneeded partitioning #1417

merged 1 commit into from
Dec 20, 2024

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Dec 9, 2024

Speeds up the integration tests

@kevinjqliu
Copy link
Contributor

Looks like some tests are failing due to number of partitions being hardcoded in tests

=========================== short test summary info ============================
FAILED tests/integration/test_reads.py::test_pyarrow_deletes[session_catalog_hive] - assert [4, 5, 6] == [1, 2, 3]
  At index 0 diff: 4 != 1
  Full diff:
  - [1, 2, 3]
  + [4, 5, 6]
FAILED tests/integration/test_reads.py::test_pyarrow_deletes[session_catalog] - assert [4, 5, 6] == [1, 2, 3]
  At index 0 diff: 4 != 1
  Full diff:
  - [1, 2, 3]
  + [4, 5, 6]
FAILED tests/integration/test_reads.py::test_pyarrow_deletes_double[session_catalog_hive] - assert [10] == [5]
  At index 0 diff: 10 != 5
  Full diff:
  - [5]
  + [10]
FAILED tests/integration/test_reads.py::test_pyarrow_deletes_double[session_catalog] - assert [10] == [5]
  At index 0 diff: 10 != 5
  Full diff:
  - [5]
  + [10]
= 4 failed, 812 passed, 8 skipped, 2859 deselected, 2 warnings in 286.99s (0:04:46) =

@kevinjqliu
Copy link
Contributor

Looks like modifying provision.py to include this part is what we need. I double checked by running integration tests locally.

spark = (
    SparkSession
        .builder
        .config("spark.sql.shuffle.partitions", "1")
        .config("spark.default.parallelism", "1")
        .getOrCreate()
)

from https://github.com/apache/iceberg-rust/pull/826/files#r1892947712

@Fokko Fokko changed the title Remove unneeded partitoning Remove unneeded partitioning Dec 20, 2024
@Fokko Fokko force-pushed the fd-fix-partitioning branch from 2e0fcd9 to e988dd6 Compare December 20, 2024 07:51
@Fokko
Copy link
Contributor Author

Fokko commented Dec 20, 2024

@kevinjqliu Nice one! The partitioning would shuffle all the records based on year, so they would fall into the same bucket. I think the config is much cleaner, thanks @kevinjqliu! 🙌

@Fokko Fokko requested review from kevinjqliu and sungwy December 20, 2024 14:23
Copy link
Collaborator

@sungwy sungwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for cleaning this up, and thank you @kevinjqliu for the review!

@kevinjqliu kevinjqliu merged commit ab6b190 into main Dec 20, 2024
7 checks passed
@kevinjqliu kevinjqliu deleted the fd-fix-partitioning branch December 20, 2024 15:17
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants