-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: add unit tests for range distribution on bucket partition column #11033
Conversation
// It takes 2 checkpoint cycle for statistics collection and application | ||
// of the globally aggregated statistics in the range partitioner. | ||
// The last two checkpoints should have range shuffle applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How stable is this test?
Do I understand correctly, that relaxed the conditions so the test will never fail if the feature is correct?
Would this test fail on a slow machine (like the CI) with the feature turned off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the relaxed condition is from maxAddedDataFilesPerCheckpoint
as NUM_BUCKETS + parallelism
, which would be guaranteed by the range partition. In some cases, it can be smaller than that as NUM_BUCKETS
or parallelism
for divisible scenarios.
this test is guaranteed to fail without range partition, as each writer subtask can write NUM_BUCKETS
of files. the total number of data files per commit can get up to NUM_BUCKETS * parallelism.
It looks the new UT is flaky https://github.com/apache/iceberg/actions/runs/10825717894/job/30035219384
|
* main: (208 commits) Docs: Fix Flink 1.20 support versions (apache#11065) Flink: Fix compile warning (apache#11072) Docs: Initial committer guidelines and requirements for merging (apache#10780) Core: Refactor ZOrderByteUtils (apache#10624) API: implement types timestamp_ns and timestamptz_ns (apache#9008) Build: Bump com.google.errorprone:error_prone_annotations (apache#11055) Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062) Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018) Kafka Connect: Disable publish tasks in runtime project (apache#11032) Flink: add unit tests for range distribution on bucket partition column (apache#11033) Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027) Core: Add benchmark for appending files (apache#11029) Build: Ignore benchmark output folders across all modules (apache#11030) Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846) Docs: bump latest version to 1.6.1 (apache#11036) OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024) Core: Generate realistic bounds in benchmarks (apache#11022) Add REST Compatibility Kit (apache#10908) Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009) Docs: Add Druid docs url to sidebar (apache#10997) ...
Also started to use the new
DataGeneratorSource
which is only available in 1.19 and after. hence, didn't add the unit test to 1.18.