-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have destination schema update with source schema update #1967
Comments
Thanks for creating this issue. This is definitely on our roadmap, we will update you as we have a better sense of when we will tackle this.
Would be curious to get your opinion on which behavior you would prefer. Airbyte detects that a column that used to have the type int now has the type string. Would you prefer:
One of my concerns with 100% automatic schema changes, is that if they are ever done in error they can be hard to fix. For example if someone is trying to push data of the wrong type into a column, that sounds like that could be a mistake to me. Would you prefer the pure automatic approach (option 1) or (option 2) having the sanity check that the schema migration is what you expect. |
Thanks for the answer ! Having these sanity checks would be great. I believe every column type change / new column should be validated by the ingestion owner. As you said, option 1 sounds like option 2 with every change automatically validated. The ability to summarise the changes in a concise way would be great, it could look like a list :
Let me know if you want to further develop the reasoning ! |
Thanks, this answer is super helpful. We will be in touch as wee start work on this and if we want to run anything else by you. |
I would also be super interested in this sollution. Very critical for us, as our internally developed software changes very frequently. |
Very excited for this feature! This will definitely make our lives easier. I have a few comments/questions in addition to what is already mentioned:
In general I agree that 100% automated solution may have unintended consequences. |
@cgardens, is there a public roadmap somewhere that outlines when this functionality might be prioritized? |
As April 2022 is fast approaching, may I have an update on this thread about the roadmap of this feature? |
Any news on this feature? |
It's surprising to see Airbyte being so late on this feature, while openly comparing itself to all alternatives as the best tool in the market. In a distributed team dealing with large data (with engineers and data working separately), you can't expect us to manually refresh the changed tables:
At the very least, Airbyte should add new columns when new fields are added - there's no debate between automatic and semi-automatic here. For column data type changes or columns being dropped, I agree we could debate about the best implementation, but I would at least expect an alert. Seeing this issue opened for 2+ years while being "definitely on the roadmap" is worrying and doesn't convince us to move to Airbyte, despite being seemingly a great product otherwise. |
I’m contemplating moving our Postgres to Snowflake pipeline from Meltano to Airbyte, but this is a deal breaker as the source schema changes almost daily (new columns and tables are added). Are there alternatives / ways to prioritize this? |
This has been a supported feature for a few quarters now, but we did not come back and update this issue. You can read more about it in our blog post, read more details in our docs or enable it via our API. I'll close this issue now, but if you have any further questions, feel free to tag me and ask! For more about what we are currently working on, you can visit our public roadmap here: https://go.airbyte.com/roadmap. |
@nataliekwong thanks for you comment! I guess I'm a bit confused about the documentation of the Postgres source (https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#cdc-requirements) which still states that schema changes are not propagated. Are schema changes not supported for this particular extractor? Or is the documentation simply out-of-date? 🤔 |
Hey @mvgijssel , good catch! It looks like the docs are not up to date, thanks for helping to identify it! I'll look internally for help to update the docs. Schema propagation with Postgres CDC is supported. |
Tell us about the problem you're trying to solve
We're planning a migration from a MySQL Azure instance to BigQuery. There are about 50 tables in the source datasets. Our client is a growing company, and will regularly add fields in the source tables. I understand Airbyte does not update the target table when the source table changes schema.
Describe the solution you’d like
I would like Airbyte to handle changes in the source table and update the target tables accordinly. Ideally this could be done automatically. Another solution closer to what we have today could be : when running a load step (with airflow for example), Airbyte would return a specific response when the source schema has changed. It leaves the user with the possibility to trigger a full reload through an API call when he receives such response.
Describe the alternative you’ve considered or used
Fivetran handles these changes, otherwise we designed a custom setup that runs a full reload whenever there is a change in the schema. This is a "brute force" solution, we know there are smarter ways to do this, like being able to understand what are the changes.
Additional context
No other context, really looking forward to seeing where Airbyte will go in the upcoming months !
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: