Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested improvements on sync mode: Incremental - Deduped + History #2683

Closed
leecheeaun opened this issue Mar 31, 2021 · 2 comments
Closed
Labels
priority/high High priority team/triage type/bug Something isn't working type/enhancement New feature or request

Comments

@leecheeaun
Copy link

leecheeaun commented Mar 31, 2021

Tell us about the problem you're trying to solve

Running a query against the source (SQL Server) and running the same query against the destination (BigQuery) should yield the same results i.e. The data in the destination should be the same as the data in the source, even after the data has been modified

Trying out the Incremental - Deduped + History function - this seems to fit my use case for having the freshest data in the warehouse.

Issue 1: However, running into an issue where I have tables with Primary Key, but no cursor (last_updated timestamp). This results in failed syncs as a cursor needs to be specified when using the above sync mode.

Issue 2: Also, there are too many tables in the destination:

  • _airbyte_raw_table (JSON blob)
  • table_scd (historical data, behavior like append)
  • table (the latest data)

Describe the solution you’d like

Issue 1: Make use of the emitted_at column that is generated by Airbyte to be the cursor

Issue 2: Give user the option to choose which table they would like to have appear in the destination

┆Issue is synchronized with this Asana task by Unito

@leecheeaun leecheeaun added the type/enhancement New feature or request label Mar 31, 2021
@jrhizor jrhizor added priority/high High priority type/bug Something isn't working labels Apr 5, 2021
@jrhizor
Copy link
Contributor

jrhizor commented Apr 5, 2021

Thanks for reporting this!

Your first issue is definitely something we want to improve. I added some labels to make sure this gets prioritized.

For the second, views of the different tables are part of the implementation; it's how we generate the deduped version. Is the concern the size/presence of the tables or just the experience consuming it? Would making it more obvious that table is the table to look at for Incremental - Deduped + History or better documentation help here? Or different naming schemes?

@davinchia
Copy link
Contributor

We've implemented issue 1. Issue 2 is tracked #3487.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high High priority team/triage type/bug Something isn't working type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants