Rewrite dbt cloud crawler using discovery API #1052

alyiwang · 2024-12-24T19:48:57Z

🤔 Why?

Should use the dbt discovery API environment endpoint to fetch most of the metadata instead of jobs. This simplifies the steps and has more complete lineage info.

🤓 What?

rewrite the whole dbt cloud crawler using discovery API environment endpoint, as well as the admin API to get all projects and environments
Update required configs, job_ids no longer used

🧪 Tested?

Tested against metaphor dbt instance. File diff with MCE generated using previous crawlers. The results are mostly the same.

Known differences

now able to retrieve dbt model columns
dbtModel.sourceModels not longer filled as it's an deprecated field in favor of entityUpstream
docsUrl no longer filled, as the previous format https://cloud.getdbt.com/accounts/123/jobs/146/docs/#!/xxx no longer supported by dbt
test.sql not available currently, can get it from top-level environment.tests endpoint later on
metrics label, dimensions, filters, timeGrains are not available right now, but formula is now available.

☑️ Checks

My PR contains actual code changes, and I have updated the version number in pyproject.toml.

github-actions · 2024-12-24T19:55:54Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
13525	12117	90%	85%	🟢

New Files

File	Coverage	Status
metaphor/dbt/cloud/parser/env_parser.py	100%	🟢
metaphor/dbt/cloud/parser/lineage_parser.py	93%	🟢
metaphor/dbt/cloud/parser/macro_parser.py	96%	🟢
metaphor/dbt/cloud/parser/metric_parser.py	97%	🟢
metaphor/dbt/cloud/parser/source_parser.py	92%	🟢
TOTAL	96%	🟢

Modified Files

File	Coverage	Status
metaphor/common/entity_id.py	96%	🟢
metaphor/dbt/cloud/client.py	100%	🟢
metaphor/dbt/cloud/config.py	100%	🟢
metaphor/dbt/cloud/extractor.py	96%	🟢
metaphor/dbt/cloud/parser/common.py	71%	🟢
metaphor/dbt/util.py	94%	🟢
TOTAL	93%	🟢

updated for commit: 050dcb0 by action🐍

codecov · 2024-12-24T20:00:52Z

Codecov Report

Attention: Patch coverage is 93.26425% with 26 lines in your changes missing coverage. Please review.

Project coverage is 89.58%. Comparing base (b4913d5) to head (050dcb0).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
metaphor/dbt/cloud/parser/model_parser.py	86.25%	11 Missing ⚠️
metaphor/dbt/cloud/parser/lineage_parser.py	92.53%	5 Missing ⚠️
metaphor/dbt/cloud/parser/source_parser.py	91.83%	4 Missing ⚠️
metaphor/dbt/util.py	90.47%	2 Missing ⚠️
metaphor/common/entity_id.py	80.00%	1 Missing ⚠️
metaphor/dbt/cloud/extractor.py	94.73%	1 Missing ⚠️
metaphor/dbt/cloud/parser/macro_parser.py	96.42%	1 Missing ⚠️
metaphor/dbt/cloud/parser/metric_parser.py	96.77%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1052      +/-   ##
==========================================
+ Coverage   89.54%   89.58%   +0.04%     
==========================================
  Files         211      210       -1     
  Lines       13525    13525              
==========================================
+ Hits        12111    12117       +6     
+ Misses       1414     1408       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mars-lan

Great work! Thanks for the refactoring.

alyiwang added 4 commits December 24, 2024 11:35

missing complied code from dbt [sc-29961]

bf6dc33

clean up old client and parsers

ba94cf3

Add tests

5599d5a

bump version

52f80e7

alyiwang requested a review from mars-lan December 24, 2024 19:48

fix dataset name normalization

0c29ba4

add tests

0283a31

alyiwang enabled auto-merge (squash) December 25, 2024 00:00

alyiwang added 5 commits December 24, 2024 16:44

fix macro lineage

c7a4d6c

adjust logger debug

265cdc6

add test

98f8405

minor fix

7c0c99c

add tests

050dcb0

mars-lan approved these changes Dec 26, 2024

View reviewed changes

alyiwang merged commit 4433922 into main Dec 26, 2024
6 checks passed

alyiwang deleted the yi.wang/sc-29961/missing-complied-code-from-dbt branch December 26, 2024 15:26

mars-lan mentioned this pull request Dec 31, 2024

Fix various dbt cloud connector issues #1054

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite dbt cloud crawler using discovery API #1052

Rewrite dbt cloud crawler using discovery API #1052

alyiwang commented Dec 24, 2024 •

edited

Loading

github-actions bot commented Dec 24, 2024 •

edited

Loading

codecov bot commented Dec 24, 2024 •

edited

Loading

mars-lan left a comment

Rewrite dbt cloud crawler using discovery API #1052

Rewrite dbt cloud crawler using discovery API #1052

Conversation

alyiwang commented Dec 24, 2024 • edited Loading

🤔 Why?

🤓 What?

🧪 Tested?

☑️ Checks

github-actions bot commented Dec 24, 2024 • edited Loading

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

codecov bot commented Dec 24, 2024 • edited Loading

Codecov Report

mars-lan left a comment

Choose a reason for hiding this comment

alyiwang commented Dec 24, 2024 •

edited

Loading

github-actions bot commented Dec 24, 2024 •

edited

Loading

codecov bot commented Dec 24, 2024 •

edited

Loading