Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination-S3: Correctly generate int64 values #23466

Merged
merged 23 commits into from
Mar 10, 2023

Conversation

etsybaev
Copy link
Contributor

@etsybaev etsybaev commented Feb 26, 2023

What

As noted in #14362 (comment), dest-s3's avro and parquet output formats currently use int32. This causes problems for users with int64 values. We should have dest-s3 produce int64 values for integer inputs.

The reason we use int32 right now is because we want to support union types, which behave kind of weirdly in edge cases.

How

Updated S3 to use Avro.LONG type instead of Avro.INT

🚨 User Impact 🚨

At the first glance, no breaking changes are expected. In the edge case user might need to reset connection and re-sync data.
Important note: If we would get oneOf ("long" and "long, specific type: timestamp" - the value would be treated as a long

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

@etsybaev etsybaev linked an issue Feb 26, 2023 that may be closed by this pull request
@etsybaev
Copy link
Contributor Author

etsybaev commented Feb 26, 2023

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/4275787287
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/4275787287
No Python unittests run

Build Passed

Test summary info:

All Passed

@etsybaev
Copy link
Contributor Author

etsybaev commented Feb 26, 2023

/test connector=connectors/destination-gcs

🕑 connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/4275787683
✅ connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/4275787683
No Python unittests run

Build Passed

Test summary info:

All Passed

@etsybaev
Copy link
Contributor Author

etsybaev commented Feb 26, 2023

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/4275788448
✅ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/4275788448
No Python unittests run

Build Passed

Test summary info:

All Passed

@etsybaev
Copy link
Contributor Author

etsybaev commented Feb 26, 2023

/test connector=connectors/destination-bigquery

🕑 connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/4275789361
✅ connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/4275789361
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 15      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    18      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     171     10    94%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         195     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         65     39    40%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1461    632    57%

Build Passed

Test summary info:

All Passed

@etsybaev
Copy link
Contributor Author

etsybaev commented Feb 26, 2023

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4275790781
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/4275790781
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 15      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    18      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     171     10    94%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         195     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         65     39    40%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1461    632    57%

Build Passed

Test summary info:

All Passed

@etsybaev
Copy link
Contributor Author

etsybaev commented Feb 26, 2023

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/4275791186
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/4275791186
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 15      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    18      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     171     10    94%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         195     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         65     39    40%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1461    632    57%

Build Passed

Test summary info:

All Passed

@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (10)

Connector Version Changelog Publish
destination-bigquery 1.2.15
destination-bigquery-denormalized 1.2.15
destination-databricks 0.3.1
destination-gcs 0.2.15
destination-jdbc 0.3.14 🔵
(ignored)
🔵
(ignored)
destination-r2 0.1.0
destination-redshift 0.4.1
destination-s3 0.3.21
destination-s3-glue 0.1.2
destination-snowflake 0.4.51
  • See "Actionable Items" below for how to resolve warnings and errors.

👀 Other Modules (1)

  • base-normalization

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@etsybaev
Copy link
Contributor Author

etsybaev commented Feb 26, 2023

/test connector=connectors/destination-bigquery-denormalized

🕑 connectors/destination-bigquery-denormalized https://github.com/airbytehq/airbyte/actions/runs/4276176483
✅ connectors/destination-bigquery-denormalized https://github.com/airbytehq/airbyte/actions/runs/4276176483
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 15      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    18      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     171     10    94%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_config/transform.py                         195     48    75%
normalization/transform_catalog/utils.py                             51     14    73%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                147     80    46%
normalization/transform_catalog/transform.py                         65     39    40%
normalization/transform_catalog/stream_processor.py                 595    400    33%
-------------------------------------------------------------------------------------
TOTAL                                                              1461    632    57%

Build Passed

Test summary info:

All Passed

@etsybaev etsybaev changed the title Etsybaev/17564 updated avro int to long Destination S3: Correctly generate int64 values Mar 1, 2023
@etsybaev etsybaev marked this pull request as ready for review March 1, 2023 16:17
@etsybaev etsybaev requested a review from a team as a code owner March 1, 2023 16:17
@etsybaev etsybaev changed the title Destination S3: Correctly generate int64 values Destination-S3: Correctly generate int64 values Mar 1, 2023
@grishick grishick requested a review from a team March 4, 2023 00:44
return Schema.createUnion(newUnionTypes);
} else {
return Schema.createUnion(unionTypes);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is fine, I got a little carried away, so heres an alternate approach

private Schema createUnionAndCheckLongTypesDuplications(List<Schema> unionTypes) {
    Predicate<Schema> isALong = type -> type.getType() == Schema.Type.LONG;
    Predicate<Schema> isPlainLong = isALong.and(type -> Objects.isNull(type.getLogicalType()));
    Predicate<Schema> isTimestampMicrosLong = isALong.and(type ->
        Objects.nonNull(type.getLogicalType()) && "timestamp-micros".equals(type.getLogicalType().getName()));

    boolean hasPlainLong = unionTypes.stream().anyMatch(isPlainLong);
    boolean hasTimestampMicrosLong = unionTypes.stream().anyMatch(isTimestampMicrosLong);
    Predicate<Schema> removeTimestampType = type -> !(hasPlainLong && hasTimestampMicrosLong && isTimestampMicrosLong.test(type));
    return Schema.createUnion(unionTypes.stream().filter(removeTimestampType).collect(Collectors.toList()));
 }
    ```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use your approach. Thanks!

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-s3

🕑 Publishing the following connectors:
connectors/destination-s3
https://github.com/airbytehq/airbyte/actions/runs/4384198657


Connector Did it publish? Were definitions generated?
connectors/destination-s3

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-gcs

🕑 Publishing the following connectors:
connectors/destination-gcs
https://github.com/airbytehq/airbyte/actions/runs/4384582611


Connector Did it publish? Were definitions generated?
connectors/destination-gcs

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-bigquery

🕑 Publishing the following connectors:
connectors/destination-bigquery
https://github.com/airbytehq/airbyte/actions/runs/4384755604


Connector Did it publish? Were definitions generated?
connectors/destination-bigquery

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-bigquery-denormalized

🕑 Publishing the following connectors:
connectors/destination-bigquery-denormalized
https://github.com/airbytehq/airbyte/actions/runs/4385023358


Connector Did it publish? Were definitions generated?
connectors/destination-bigquery-denormalized

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-snowflake

🕑 Publishing the following connectors:
connectors/destination-snowflake
https://github.com/airbytehq/airbyte/actions/runs/4385234732


Connector Did it publish? Were definitions generated?
connectors/destination-snowflake

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-redshift

🕑 Publishing the following connectors:
connectors/destination-redshift
https://github.com/airbytehq/airbyte/actions/runs/4385293988


Connector Did it publish? Were definitions generated?
connectors/destination-redshift

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-snowflake

🕑 Publishing the following connectors:
connectors/destination-snowflake
https://github.com/airbytehq/airbyte/actions/runs/4385724329


Connector Did it publish? Were definitions generated?
connectors/destination-snowflake

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev
Copy link
Contributor Author

etsybaev commented Mar 10, 2023

/publish connector=connectors/destination-snowflake

🕑 Publishing the following connectors:
connectors/destination-snowflake
https://github.com/airbytehq/airbyte/actions/runs/4385821891


Connector Did it publish? Were definitions generated?
connectors/destination-snowflake

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@etsybaev etsybaev enabled auto-merge (squash) March 10, 2023 16:02
@etsybaev etsybaev merged commit 6672b9c into master Mar 10, 2023
@etsybaev etsybaev deleted the etsybaev/17564-updated-avro-int-to-long branch March 10, 2023 16:17
danielduckworth pushed a commit to danielduckworth/airbyte that referenced this pull request Mar 13, 2023
* [17564] Updated s3 avro to use long instead of int


---------

Co-authored-by: etsybaev <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
adriennevermorel pushed a commit to adriennevermorel/airbyte that referenced this pull request Mar 17, 2023
* [17564] Updated s3 avro to use long instead of int


---------

Co-authored-by: etsybaev <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
* [17564] Updated s3 avro to use long instead of int


---------

Co-authored-by: etsybaev <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
* [17564] Updated s3 avro to use long instead of int


---------

Co-authored-by: etsybaev <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Destination S3: Correctly generate int64 values
4 participants