🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code #38316

darynaishchenko · 2024-05-17T16:10:21Z

What

resolved: https://github.com/airbytehq/airbyte-internal-issues/issues/7824

How

Migrated source to use low-code cdk instead of python cdk.
Regression tests are described here: #38316 (comment)
Main changes:

State: Previously all incremental streams used incorrect state without partition. On low-code cdk all incremental streams use per partition state.
Lifetime reports: Previously implementation used lifetime=true as request param, which is deprecated on API v1.3. Now lifetime reports use query_lifetime=true, with this param start_date and end_date should not be provided. Exception: advertiser_lifetime_report: API v1.3 doesn't allow query_lifetime=true` with advertiser reports, so this stream was implemented exactly as in py version with start_date and end_date query params(range >=365d)
Advertiser Ids stream: schema was changed to use advertiser_id as type of stream to be up to date with API docs.
Discover for configs with granularity: In py implementation were missing streams(campaigns_audience_reports, ad_group_audience_reports_by_platform, ad_group_audience_reports_by_country, ads_audience_reports_by_country, advertisers_audience_reports_by_country, campaigns_audience_reports_by_platform, advertisers_audience_reports_by_platform, ads_audience_reports_by_platform, ads_audience_reports_by_province), which users with provided granularity actually can use but streams method didn't return them. For configs with granularity source removes granularity from stream name as it was previously named.

Review guide

User Impact

Breaking change users will need to follow migration guide for affected streams.

Can this PR be safely reverted and rolled back?

Breaking change due to changes in schema and state format.

YES 💚
NO ❌

vercel · 2024-05-17T16:10:25Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 1, 2024 0:37am

…ed streams

…pe for dimensions transformations

…to-low-code

darynaishchenko · 2024-06-11T15:38:31Z

Regression test results:
test_catalog_are_the_same [failed] – updated advertiser_id: integer - string. (breaking change described in the docs)

TestDataIntegrity.test_record_schema_match_without_state [failed] - Value of root['properties']['budget']['type'] changed from "integer" to "number". Value of root['properties']['roas_bid']['type'] changed from "integer" to "number". (same error for all fields with type number in schema but actual type is integer).
Both versions have type number, but default type transformer was added in low code version so 0 value is changed to 0.0. For db with transformations(e.g. BigQuery) it’s not a breaking change as destination already converts this data values to a number.
Streams are in a list of breaking changes affected by state changes, so users will do refresh&clear anyway.
This change from 0 to 0.0 occured due to added Default schema normalization in low code to be compatible with stream schemas that was added for api v1.2.0 and in v1.3.0 some fields have new type. For example *_id was changed from integer to string and stream schemas for v.1.2.0 use integer as type.

TestDataIntegrity.test_all_records_are_the_same_without_state [failed] - Same differences with integer/number as above.

Read URLs: some requests in py version due to HttpAvailabilityStrategy

PS: Reviewer can ask me to send the full html report in slack dm. Regression tests were running locally as I needed to change start date in config and chose testing without state due to breaking changes.

brianjlai

I think the manifest and the schemas overall look good and given the size of the manifest and number of streams, I am going to trust that we've carefully run live tests to verify that the changes are working and the breaking changes are expected. I didn't see anything glaring.

I did however have some questions to clarify my understanding for the custom components and some suggestions on the code itself. Especially around why exactly we need for two types of advertiser id (+ids) partition routers.

airbyte-integrations/connectors/source-tiktok-marketing/metadata.yaml

airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml

...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py

...source-tiktok-marketing/source_tiktok_marketing/components/semi_incremental_record_filter.py

airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml

...s/source-tiktok-marketing/source_tiktok_marketing/components/hourly_datetime_based_cursor.py

brianjlai

A note on naming and Just one last discussion point on the need for the MultipleAdvertiserIdsPartitionRouter. given it's only used on one stream, depending on how drastically it reduces requests, I think we might want to get rid of it even if that deviates from the original behavior.

What I want to figure out is how much the separate partition router benefits us. Basically, does combining the advertiser ids into a single slice results in them all getting bundled up and we only have to go through a single full iteration? Versus, if we separate them into individual slices and that means we have to perform one full iteration per advertiser_id slice. For example, if we have 5 advertiser_ids, then we end up making 5x the requests. If thats the case we can leave as is.

After we clear that up this is good to go. nice work!

...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py

maxi297

Differences identified

I tried to run regression testing using ad_groups and there is a lot of red. Should we be worried about that? The output didn't allow me to validate if there were actually errors so for the rest, I tried to check a bit manually and noted my observations below. Note that I haven't checked a couple of things like if format of the states were compatible because we require a reset anyway. Also, I haven't done all the streams, only those documented below.

For all streams I've validated

[Accepted] The first request isn't performed, probably because of the availability strategy so I'm fine with this
[Accepted] state for full refresh now supports RFR as it emits a state like {'__ab_no_cursor_state_message': True}

ads_reports_daily

[To review] The records are different. The most recent version has record["metrics"]["cost_per_secondary_goal_result"] == None while the new one has a value of -. There might be other differences

ad_groups_reports_daily

[To review] ~~page_size query param is not passed anymore~~ EDIT: This was working
[To review] like ads_reports_daily, the records are different. Based on this re-occurrence, I expect this change to have been applied to all the records exposing metrics

audiences

OK

Conclusion

I would like us to understand the report for the regression tests. Once we can explain the differences, I'll approve this PR

...yte-integrations/connectors/source-tiktok-marketing/integration_tests/expected_records.jsonl

darynaishchenko · 2024-06-14T17:36:47Z

@maxi297
Please take a look at my regression tests results here #38316 (comment)

The most recent version has record["metrics"]["cost_per_secondary_goal_result"] == None while the new one has a value of -. There might be other differences

This was fixed by adding DefaultTransformation in low code. But I double check. This is the reason that we have differences in records for integer values that described as number in stream schema.

Also double check page size failure.
Thanks.

darynaishchenko · 2024-06-17T10:10:41Z

@maxi297

Fixed record["metrics"]["cost_per_secondary_goal_result"] == None while the new one has a value of - for all report based streams by adding TransformEmptyMetrics custom component. Thanks for pointing it.
page_size query param is not passed anymore, I guess you just didn't see it in requests because it is now in the end of request(due to order of params in the definition of requester). I double checked it. All request params are identical with py version, except order of params, but it doesn't affect a response.
Request looks like:
https://business-api.tiktok.com/open_api/v1.3/report/integrated/get/?service_type=AUCTION&report_type=BASIC&data_level=AUCTION_ADGROUP&dimensions=[dimensions]&metrics=[metrics]&start_date=2022-09-30&end_date=2022-10-29&page_size=1000&advertiser_id=7001035076276387841.

maxi297

You are right, there were no issue with the page size.

LGTM!

…to-low-code

darynaishchenko added 2 commits May 17, 2024 19:02

updated dependencies

5de6ef0

migrate streams to low code

9dcfc02

darynaishchenko self-assigned this May 17, 2024

darynaishchenko marked this pull request as draft May 17, 2024 16:10

octavia-squidington-iii added area/connectors Connector related issues connectors/source/tiktok-marketing labels May 17, 2024

darynaishchenko added 19 commits May 20, 2024 12:02

updated poetry.lock

e69706c

updated transfromations, schema normalization and custom part router

77f0240

support configs without credentials

6c3dd55

added lookback for report streams

ccc3f15

end date for report streams

bf2416c

added include deleted for report streams

3e42045

added include deleted for ads, ad_groups and campaigns streams

7316447

updated abnormal state

977edc0

format fix

991a7bb

moved spec to manifest

9a9abd5

moved schemas to manifest

6984e22

deleted streams.py

87cbfad

updated custom components, default start date, check stream, discover…

5ee1960

…ed streams

added unit tests

7369e20

updated abnormal_state

d83a775

updated expected records

587ccab

support secret and app_id for config with environment, added value_ty…

7b8687f

…pe for dimensions transformations

updated streams for old configs with granularity

c6130f3

bump version, breaking change docs

75a439b

octavia-squidington-iii added the area/documentation Improvements or additions to documentation label May 23, 2024

vercel bot deployed to Preview May 23, 2024 16:01 View deployment

Merge branch 'master' into daryna/source-tik-tok-marketing/migartion-…

4fc6588

…to-low-code

vercel bot deployed to Preview May 23, 2024 17:20 View deployment

updated upgradeDeadline

9495b87

brianjlai reviewed Jun 12, 2024

View reviewed changes

darynaishchenko added 2 commits June 12, 2024 14:13

refactor code

f4f8a19

format fix

e9e2a0c

darynaishchenko requested a review from brianjlai June 12, 2024 11:50

brianjlai reviewed Jun 12, 2024

View reviewed changes

...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py Outdated Show resolved Hide resolved

...ource-tiktok-marketing/source_tiktok_marketing/components/advertiser_ids_partition_router.py Outdated Show resolved Hide resolved

renamed custom partition routers

4ff4eab

darynaishchenko requested review from brianjlai and maxi297 June 13, 2024 09:07

updated migration guide

259160b

vercel bot deployed to Preview June 14, 2024 15:30 View deployment

maxi297 reviewed Jun 14, 2024

View reviewed changes

...yte-integrations/connectors/source-tiktok-marketing/integration_tests/expected_records.jsonl Show resolved Hide resolved

brianjlai approved these changes Jun 15, 2024

View reviewed changes

added custom transformer for empty metrics

94cdbdc

darynaishchenko requested a review from maxi297 June 17, 2024 10:10

maxi297 approved these changes Jun 18, 2024

View reviewed changes

lazebnyi approved these changes Jun 24, 2024

View reviewed changes

Merge branch 'master' into daryna/source-tik-tok-marketing/migartion-…

381c17a

…to-low-code

vercel bot deployed to Preview June 25, 2024 08:43 View deployment

lazebnyi mentioned this pull request Jun 25, 2024

update source tiktok to support python 3.11 #39578

Closed

darynaishchenko added 3 commits July 1, 2024 15:02

Merge branch 'master' into daryna/source-tik-tok-marketing/migartion-…

076ca01

…to-low-code

updated upgradeDeadline

3f43273

updated changelog

cddcbd1

vercel bot deployed to Preview July 1, 2024 12:37 View deployment

darynaishchenko merged commit 1b85b28 into master Jul 1, 2024
31 checks passed

darynaishchenko deleted the daryna/source-tik-tok-marketing/migartion-to-low-code branch July 1, 2024 16:41

xiaohansong pushed a commit that referenced this pull request Jul 9, 2024

🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code (#38316)

7ed334f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code #38316

🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code #38316

darynaishchenko commented May 17, 2024 •

edited

Loading

vercel bot commented May 17, 2024 •

edited

Loading

darynaishchenko commented Jun 11, 2024

brianjlai left a comment

brianjlai left a comment

maxi297 left a comment •

edited

Loading

darynaishchenko commented Jun 14, 2024

darynaishchenko commented Jun 17, 2024

maxi297 left a comment

🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code #38316

🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code #38316

Conversation

darynaishchenko commented May 17, 2024 • edited Loading

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

vercel bot commented May 17, 2024 • edited Loading

darynaishchenko commented Jun 11, 2024

brianjlai left a comment

Choose a reason for hiding this comment

brianjlai left a comment

Choose a reason for hiding this comment

maxi297 left a comment • edited Loading

Choose a reason for hiding this comment

Differences identified

For all streams I've validated

ads_reports_daily

ad_groups_reports_daily

audiences

Conclusion

darynaishchenko commented Jun 14, 2024

darynaishchenko commented Jun 17, 2024

maxi297 left a comment

Choose a reason for hiding this comment

darynaishchenko commented May 17, 2024 •

edited

Loading

vercel bot commented May 17, 2024 •

edited

Loading

maxi297 left a comment •

edited

Loading