-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and speed up CDC delete support [DestinationsV2] #28029
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
I need some help running the new I've:
Why do I need I also probably need help getting and installing credentials... where do I do that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my local docker env is pretty borked so can't try out the new build stuff, but it seems reasonable 🤷
.../java/io/airbyte/integrations/destination/bigquery/typing_deduping/BigQuerySqlGenerator.java
Show resolved
Hide resolved
.../java/io/airbyte/integrations/destination/bigquery/typing_deduping/BigQuerySqlGenerator.java
Outdated
Show resolved
Hide resolved
.../java/io/airbyte/integrations/destination/bigquery/typing_deduping/BigQuerySqlGenerator.java
Show resolved
Hide resolved
...e/integrations/destination/bigquery/typing_deduping/BigQuerySqlGeneratorIntegrationTest.java
Show resolved
Hide resolved
...e/integrations/destination/bigquery/typing_deduping/BigQuerySqlGeneratorIntegrationTest.java
Outdated
Show resolved
Hide resolved
.../java/io/airbyte/integrations/destination/bigquery/typing_deduping/BigQuerySqlGenerator.java
Outdated
Show resolved
Hide resolved
_airbyte_loaded_at IS NULL | ||
OR ( | ||
_airbyte_loaded_at IS NOT NULL | ||
AND JSON_VALUE(`_airbyte_data`, '$._ab_cdc_deleted_at') IS NOT NULL | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the main logical change (from airbytehq/typing-and-deduping-sql#21) - we are inserting both the new records or previous CDC-deleted records to the raw table for cursor comparison
// TODO replace `id`, `$.id` with PK | ||
// TODO replace `INT64` with PK's type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-azure-blob-storage/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-azure-blob-storage docker image for platform linux/x86_64 | ✅ |
./gradlew :airbyte-integrations:connectors:destination-azure-blob-storage:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-azure-blob-storage test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-dynamodb/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-dynamodb docker image for platform linux/x86_64 | ✅ |
./gradlew :airbyte-integrations:connectors:destination-dynamodb:integrationTest | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-dynamodb test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-bigquery docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-bigquery:integrationTest | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-mysql-strict-encrypt/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-mysql-strict-encrypt docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization-mysql:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-mysql-strict-encrypt:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-mysql-strict-encrypt test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-teradata/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-teradata docker image for platform linux/x86_64 | ✅ |
./gradlew :airbyte-integrations:connectors:destination-teradata:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-teradata test
destination-bigquery is passing tests #28029 (comment). Merging. |
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-bigquery-denormalized/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-bigquery-denormalized docker image for platform linux/x86_64 | ✅ |
./gradlew :airbyte-integrations:connectors:destination-bigquery-denormalized:integrationTest | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery-denormalized test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-databricks/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-databricks docker image for platform linux/x86_64 | ✅ |
./gradlew :airbyte-integrations:connectors:destination-databricks:integrationTest | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-databricks test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-mssql-strict-encrypt/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-mssql-strict-encrypt docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization-mssql:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-mssql-strict-encrypt:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-mssql-strict-encrypt test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-postgres-strict-encrypt/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-postgres-strict-encrypt docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-postgres-strict-encrypt:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-postgres-strict-encrypt test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-gcs/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-gcs docker image for platform linux/x86_64 | ✅ |
./gradlew :airbyte-integrations:connectors:destination-gcs:integrationTest | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-gcs test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-mysql/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-mysql docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization-mysql:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-mysql:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-mysql test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-oracle-strict-encrypt/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-oracle-strict-encrypt docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization-oracle:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-oracle-strict-encrypt:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-oracle-strict-encrypt test
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-snowflake/metadata.yaml | ✅ |
Connector version semver check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-snowflake docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization-snowflake:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-snowflake:integrationTest | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-snowflake test
* Revert "Revert "Destination Bigquery: Scaffolding for destinations v2 (#27268)"" This reverts commit 348c577. * version bumps+changelog * Speed up BQ by having 2 queries, and not an OR (#27981) * 🐛 Destination Bigquery: fix bug in standard inserts for syncs >10K records (#27856) * only run t+d code if it's enabled * dockerfile+changelog * remove changelog entry * Destinations V2: handle optional fields for `object` and `array` types (#27898) * catch null schema * fix null properties * clean up * consolidate + add more tests * try catch * empty json test * Automated Commit - Formatting Changes * remove todo * destination bigquery: misc updates to 1s1t code (#28057) * switch to checkedconsumer * add unit test for buildColumnId * use flag * restructure prefix check * fix build * more type-parsing fixes (#28100) * more type-parsing fixes * handle duplicates * Automated Commit - Format and Process Resources Changes * add tests for asColumns * Automated Commit - Format and Process Resources Changes * log warnings instead of throwing exception * better log message * error level --------- Co-authored-by: edgao <[email protected]> * Automated Commit - Formatting Changes * Improve protocol type parsing (#28126) * Automated Commit - Formatting Changes * Change from T&D every 10k records to an increasing time based interval (#28130) * fifteen minute t&d * add typing and deduping operation valve for increased intervals of typing and deduping * Automated Commit - Format and Process Resources Changes * resolve bizarre merge conflict * Automated Commit - Format and Process Resources Changes --------- Co-authored-by: jbfbell <[email protected]> * Simplify and speed up CDC delete support [DestinationsV2] (#28029) * Simplify and speed up CDC delete support [DestinationsV2] * better QUOTE * spotbugs? * recompile dbt image for local arch and use that when building images * things compile, but tests fail * tests working-ish * comment * fix logic to re-insert deleted records for cursor comparison. tests pass! * remove comment * Skip CDC re-include logic if there are no CDC columns * stop hardcoding pk (#28092) * wip * remove TODOs --------- Co-authored-by: Edward Gao <[email protected]> * update method name * Automated Commit - Formatting Changes * depend on pinned normalization version * implement 1s1t DATs for destination-bigquery (#27852) * intiial implementation * Automated Commit - Formatting Changes * add second sync to test * do concurrent things * Automated Commit - Formatting Changes * clarify comment * minor tweaks * more stuff * Automated Commit - Formatting Changes * minor cleanup * lots of fixes * handle sql vs json null better * verify extra columns * only check deleted_at if in DEDUP mode and the column exists * add full refresh append test case * Automated Commit - Formatting Changes * add tests for the remaining sync modes * Automated Commit - Formatting Changes * readability stuff * Automated Commit - Formatting Changes * add test for gcs mode * remove static fields * Automated Commit - Formatting Changes * add more test cases, tweak test scaffold * cleanup * Automated Commit - Formatting Changes * extract recorddiffer * and use it in the sql generator test * fix * comment * naming+comment * one more comment * better assert * remove unnecessary thing * one last thing * Automated Commit - Formatting Changes * enable concurrent execution on all java integration tests * add test for default namespace * Automated Commit - Formatting Changes * implement a 2-stream test * Automated Commit - Formatting Changes * extract methods * invert jsonNodesNotEquivalent * Automated Commit - Formatting Changes * fix conditional * pull out diffSingleRecord * Automated Commit - Formatting Changes * handle nulls correctly * remove raw-specific handling; break up methods * Automated Commit - Formatting Changes --------- Co-authored-by: edgao <[email protected]> Co-authored-by: octavia-approvington <[email protected]> * Destinations V2: move create raw tables earlier (#28255) * move create raw tables * better log message * stop building normalization (#28256) * fix ability to run tests * disable incremental t+d for now * Automated Commit - Formatting Changes --------- Co-authored-by: Evan Tahler <[email protected]> Co-authored-by: Cynthia Yin <[email protected]> Co-authored-by: cynthiaxyin <[email protected]> Co-authored-by: edgao <[email protected]> Co-authored-by: Joe Bell <[email protected]> Co-authored-by: jbfbell <[email protected]> Co-authored-by: octavia-approvington <[email protected]>
* Revert "Revert "Destination Bigquery: Scaffolding for destinations v2 (airbytehq#27268)"" This reverts commit 348c577. * version bumps+changelog * Speed up BQ by having 2 queries, and not an OR (airbytehq#27981) * 🐛 Destination Bigquery: fix bug in standard inserts for syncs >10K records (airbytehq#27856) * only run t+d code if it's enabled * dockerfile+changelog * remove changelog entry * Destinations V2: handle optional fields for `object` and `array` types (airbytehq#27898) * catch null schema * fix null properties * clean up * consolidate + add more tests * try catch * empty json test * Automated Commit - Formatting Changes * remove todo * destination bigquery: misc updates to 1s1t code (airbytehq#28057) * switch to checkedconsumer * add unit test for buildColumnId * use flag * restructure prefix check * fix build * more type-parsing fixes (airbytehq#28100) * more type-parsing fixes * handle duplicates * Automated Commit - Format and Process Resources Changes * add tests for asColumns * Automated Commit - Format and Process Resources Changes * log warnings instead of throwing exception * better log message * error level --------- Co-authored-by: edgao <[email protected]> * Automated Commit - Formatting Changes * Improve protocol type parsing (airbytehq#28126) * Automated Commit - Formatting Changes * Change from T&D every 10k records to an increasing time based interval (airbytehq#28130) * fifteen minute t&d * add typing and deduping operation valve for increased intervals of typing and deduping * Automated Commit - Format and Process Resources Changes * resolve bizarre merge conflict * Automated Commit - Format and Process Resources Changes --------- Co-authored-by: jbfbell <[email protected]> * Simplify and speed up CDC delete support [DestinationsV2] (airbytehq#28029) * Simplify and speed up CDC delete support [DestinationsV2] * better QUOTE * spotbugs? * recompile dbt image for local arch and use that when building images * things compile, but tests fail * tests working-ish * comment * fix logic to re-insert deleted records for cursor comparison. tests pass! * remove comment * Skip CDC re-include logic if there are no CDC columns * stop hardcoding pk (airbytehq#28092) * wip * remove TODOs --------- Co-authored-by: Edward Gao <[email protected]> * update method name * Automated Commit - Formatting Changes * depend on pinned normalization version * implement 1s1t DATs for destination-bigquery (airbytehq#27852) * intiial implementation * Automated Commit - Formatting Changes * add second sync to test * do concurrent things * Automated Commit - Formatting Changes * clarify comment * minor tweaks * more stuff * Automated Commit - Formatting Changes * minor cleanup * lots of fixes * handle sql vs json null better * verify extra columns * only check deleted_at if in DEDUP mode and the column exists * add full refresh append test case * Automated Commit - Formatting Changes * add tests for the remaining sync modes * Automated Commit - Formatting Changes * readability stuff * Automated Commit - Formatting Changes * add test for gcs mode * remove static fields * Automated Commit - Formatting Changes * add more test cases, tweak test scaffold * cleanup * Automated Commit - Formatting Changes * extract recorddiffer * and use it in the sql generator test * fix * comment * naming+comment * one more comment * better assert * remove unnecessary thing * one last thing * Automated Commit - Formatting Changes * enable concurrent execution on all java integration tests * add test for default namespace * Automated Commit - Formatting Changes * implement a 2-stream test * Automated Commit - Formatting Changes * extract methods * invert jsonNodesNotEquivalent * Automated Commit - Formatting Changes * fix conditional * pull out diffSingleRecord * Automated Commit - Formatting Changes * handle nulls correctly * remove raw-specific handling; break up methods * Automated Commit - Formatting Changes --------- Co-authored-by: edgao <[email protected]> Co-authored-by: octavia-approvington <[email protected]> * Destinations V2: move create raw tables earlier (airbytehq#28255) * move create raw tables * better log message * stop building normalization (airbytehq#28256) * fix ability to run tests * disable incremental t+d for now * Automated Commit - Formatting Changes --------- Co-authored-by: Evan Tahler <[email protected]> Co-authored-by: Cynthia Yin <[email protected]> Co-authored-by: cynthiaxyin <[email protected]> Co-authored-by: edgao <[email protected]> Co-authored-by: Joe Bell <[email protected]> Co-authored-by: jbfbell <[email protected]> Co-authored-by: octavia-approvington <[email protected]>
Move over the updated logic from airbytehq/typing-and-deduping-sql#20 and airbytehq/typing-and-deduping-sql#21. Closes #27923