-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code #38316
🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code #38316
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
…pe for dimensions transformations
Regression test results: TestDataIntegrity.test_record_schema_match_without_state [failed] - Value of root['properties']['budget']['type'] changed from "integer" to "number". Value of root['properties']['roas_bid']['type'] changed from "integer" to "number". (same error for all fields with type number in schema but actual type is integer). TestDataIntegrity.test_all_records_are_the_same_without_state [failed] - Same differences with integer/number as above. Read URLs: some requests in py version due to HttpAvailabilityStrategy PS: Reviewer can ask me to send the full html report in slack dm. Regression tests were running locally as I needed to change start date in config and chose testing without state due to breaking changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the manifest and the schemas overall look good and given the size of the manifest and number of streams, I am going to trust that we've carefully run live tests to verify that the changes are working and the breaking changes are expected. I didn't see anything glaring.
I did however have some questions to clarify my understanding for the custom components and some suggestions on the code itself. Especially around why exactly we need for two types of advertiser id (+ids) partition routers.
airbyte-integrations/connectors/source-tiktok-marketing/metadata.yaml
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml
Outdated
Show resolved
Hide resolved
...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py
Outdated
Show resolved
Hide resolved
...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py
Outdated
Show resolved
Hide resolved
...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py
Outdated
Show resolved
Hide resolved
...source-tiktok-marketing/source_tiktok_marketing/components/semi_incremental_record_filter.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml
Outdated
Show resolved
Hide resolved
...s/source-tiktok-marketing/source_tiktok_marketing/components/hourly_datetime_based_cursor.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note on naming and Just one last discussion point on the need for the MultipleAdvertiserIdsPartitionRouter
. given it's only used on one stream, depending on how drastically it reduces requests, I think we might want to get rid of it even if that deviates from the original behavior.
What I want to figure out is how much the separate partition router benefits us. Basically, does combining the advertiser ids into a single slice results in them all getting bundled up and we only have to go through a single full iteration? Versus, if we separate them into individual slices and that means we have to perform one full iteration per advertiser_id slice. For example, if we have 5 advertiser_ids, then we end up making 5x the requests. If thats the case we can leave as is.
After we clear that up this is good to go. nice work!
...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py
Outdated
Show resolved
Hide resolved
...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differences identified
I tried to run regression testing using ad_groups
and there is a lot of red. Should we be worried about that? The output didn't allow me to validate if there were actually errors so for the rest, I tried to check a bit manually and noted my observations below. Note that I haven't checked a couple of things like if format of the states were compatible because we require a reset anyway. Also, I haven't done all the streams, only those documented below.
For all streams I've validated
- [Accepted] The first request isn't performed, probably because of the availability strategy so I'm fine with this
- [Accepted] state for full refresh now supports RFR as it emits a state like
{'__ab_no_cursor_state_message': True}
ads_reports_daily
- [To review] The records are different. The most recent version has
record["metrics"]["cost_per_secondary_goal_result"] == None
while the new one has a value of-
. There might be other differences
ad_groups_reports_daily
- [To review]
page_size query param is not passed anymoreEDIT: This was working - [To review] like
ads_reports_daily
, the records are different. Based on this re-occurrence, I expect this change to have been applied to all the records exposingmetrics
audiences
OK
Conclusion
I would like us to understand the report for the regression tests. Once we can explain the differences, I'll approve this PR
...yte-integrations/connectors/source-tiktok-marketing/integration_tests/expected_records.jsonl
Show resolved
Hide resolved
@maxi297
This was fixed by adding DefaultTransformation in low code. But I double check. This is the reason that we have differences in records for integer values that described as number in stream schema. Also double check page size failure. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, there were no issue with the page size.
LGTM!
What
resolved: https://github.com/airbytehq/airbyte-internal-issues/issues/7824
How
Migrated source to use low-code cdk instead of python cdk.
Regression tests are described here: #38316 (comment)
Main changes:
lifetime=true
as request param, which is deprecated on API v1.3. Now lifetime reports usequery_lifetime=true
, with this param start_date and end_date should not be provided. Exception: advertiser_lifetime_report: API v1.3 doesn't allow query_lifetime=true` with advertiser reports, so this stream was implemented exactly as in py version with start_date and end_date query params(range >=365d)Review guide
User Impact
Breaking change users will need to follow migration guide for affected streams.
Can this PR be safely reverted and rolled back?
Breaking change due to changes in schema and state format.