-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit job ids via side output in TriggerFileLoads process to keep beam.Flatten() happy for Spark and Flink runners #23954
Conversation
Run Python Examples_Spark |
Run Python Examples_Flink |
Run Python 3.8 PostCommit |
R: @chamikaramj Relevant tests are passing now. |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov Report
@@ Coverage Diff @@
## master #23954 +/- ##
=======================================
Coverage 73.53% 73.53%
=======================================
Files 707 707
Lines 95856 95859 +3
=======================================
+ Hits 70486 70494 +8
+ Misses 24053 24048 -5
Partials 1317 1317
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Thanks. Also, pls make sure that we execute relavent BQ post-commit test suites to make sure that we do not break BQ sink. |
Run Python PreCommit |
1 similar comment
Run Python PreCommit |
Run Python 3.8 PostCommit |
Run BigQueryIO Read Performance Test Python |
Run BigQueryIO Write Performance Test Python Batch |
Run Python 3.7 PostCommit |
…gerFileLoads process to keep beam.Flatten() happy for Spark and Flink runners
…a side output in TriggerFileLoads process to keep beam.Flatten() happy for Spark and Flink runners
Fixes some tests for #23907
Changes in #23012 included waiting until jobs in a given step are finished before emitting the job IDs to the next step, see code block. This introduced a problem with
FILE_LOADS
in Flink and Spark runners:RuntimeError: Pipeline BeamApp-jenkins-1102122925-c4be4910_cbaf26e4-874a-42dc-adf1-f3a63242d10c failed in state FAILED: java.lang.IllegalArgumentException: PCollectionNodes [PCollectionNode{id=ref_PCollection_PCollection_52, PCollection=unique_name: "61Write/BigQueryBatchFileLoads/TriggerLoadJobsWithoutTempTables.None" coder_id: "ref_Coder_FastPrimitivesCoder_3" is_bounded: BOUNDED windowing_strategy_id: "ref_Windowing_Windowing_1" }] were consumed but never produced
See #23907 for more details.
These changes add a side output to TriggerLoadJobs and yield job IDs to this output in process (as was done previously) to keep the beam.Flatten() that combines job IDs here happy. The main output still waits on jobs to finish before emitting them collectively to ensure atomicity between steps in the workflow.