Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: update schema change capability #28200

Merged
merged 5 commits into from
Jul 21, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions docs/cloud/managing-airbyte-cloud/manage-schema-changes.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,27 @@
# Manage schema changes

Once every 24 hours, Airbyte checks for changes in your source schema and allows you to review the changes and fix breaking changes. This process helps ensure accurate and efficient data syncs, minimizing errors and saving you time and effort in managing your data pipelines.
You can specify for each connection how Airbyte should handle any change of schema in the source. This process helps ensure accurate and efficient data syncs, minimizing errors and saving you time and effort in managing your data pipelines.

:::note
Airbyte checks for any changes in your source schema before every sync or once every 24 hours, whichever is more frequent.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfsiega-airbyte I think I recall this being the case, but am not sure now if this is true. Is it still once every 24 hours or does it check on sync start as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to say every 24 hours

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would phrase it as: Airbyte checks for any changes in your source schema before syncing, at most once every 24 hours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, updated 👍


Schema changes are flagged in your connection but are not propagated to your destination.

:::
Based on your configured settings for "Detect and propagate schema changes", Airbyte can automatically sync those changes or ignore them:
* **Propagate all changes** automatically propagates stream changes (additions or deletions) or column changes (additions or deletions) detected in the source
* **Propagate column changes only** automatically propagates column changes detected in the source
* **Ignore** any schema change, in which case the schema you’ve set up will not change even if the source schema changes until you approve the changes manually
* **Pause connection** disables the connection from syncing further once a change is detected

When a new column is detected and propagated, values for that column will be filled in for the updated rows. If you are missing values for rows not updated, a backfill can be done by completing a full refresh.

When a column is deleted, the values for that column will stop updating for the updated rows and be filled with Null values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we will actually now delete these columns immediately, based on the related convo in Normalization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-gron Was thinking about this more today after our chat - I agree that's the behavior we decided on. Does that mean the proposed changes will come with Destinations V2 - which I believe that's coming at the end of Q3? On the one hand, we could publish these with the foresight those changes will be coming, or re-publish with the updates when V2 is officially released. I'm leaning towards the latter but let me know if you think it's better to just update now!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply! I'm good with that option :)


When a new stream is detected and propagated, the first sync will fill all data in as if it is a historical sync. When a stream is deleted from the source, the stream will stop updating, and we leave any existing data in the destination. The rest of the enabled streams will continue syncing.

In all cases, if a breaking change is detected, the connection will be paused for manual review to prevent future syncs from failing. Breaking schema changes occur when:
* The data type of a field from the source changes
* An existing primary key is removed from the source
* An existing cursor is removed from the source

See "Fix breaking schema changes" to understand how to resolve these types of changes.

## Review non-breaking schema changes

Expand All @@ -29,11 +44,10 @@ To review non-breaking schema changes:

## Fix breaking schema changes

:::note

Breaking changes can only occur in the **Cursor** or **Primary key** fields.

:::
Breaking schema changes occur when:
* The data type of a field from the source changes
* An existing primary key is removed from the source
* An existing cursor is removed from the source

To review and fix breaking schema changes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the fix is the data type changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our chat I believe we decided to make any data type changes breaking changes. Not sure if that answers your question though - lmk if I missed something!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had a typo in my original question.

I meant to ask - what should a user do when they encounter a data type change? Is the only option in that case to run a reset?

1. On the [Airbyte Cloud](http://cloud.airbyte.com/) dashboard, click **Connections** and select the connection with breaking changes (indicated by a **red exclamation mark** icon).
Expand Down