Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datacontract import --format dbt #104

Closed
simonharrer opened this issue Mar 21, 2024 · 12 comments
Closed

datacontract import --format dbt #104

simonharrer opened this issue Mar 21, 2024 · 12 comments
Assignees

Comments

@simonharrer
Copy link
Contributor

Out of #103 came the idea of having an import of dbt models to a datacontract.yaml

datacontract import --format dbt models.yaml
@emirkmo
Copy link
Contributor

emirkmo commented Mar 22, 2024

I already do something like this for import, creating a datacontract.yaml given a dbt project, but was using the "schema" field instead of the "models" field, with a custom schema type. (Slightly off-topic, but schema was much more widely understandable than models in our workshops. Just some feedback I can provide on its depreciation in the specification).

However our code is/was quite specific to the format of the dbt projects we allowed. To do it properly, one would want to parse & use the manifest.json file from a dbt project. It is the most straightforward way of working with dbt projects generically.

You would go into dbt Nodes in the manifest, and for every resource_type of model import the columns, data_types if given, descriptions if given, etc. The only difficulty is mapping the data_types to the supported ones in datacontract spec. Hence why physical model specific schema might make more sense for the import.. As a first step though, the model in models could just not provide the data_type or provide the dbt one if it matches.

(For parsing the manifest, Dagster-dbt does this as well, and the code is Apache-2 Licensed, if you are looking for inspiration).
The import is something I can contribute on, if the implementation sounds ok.


Much easier of course is to be pointed to a dbt schema.yaml file, and using that for importing the models. Anything not defined in that yaml file would be missed. Then again, maybe that's ok.

@pixie79
Copy link
Contributor

pixie79 commented May 10, 2024

I think the later is fine. As I presume most people with more than a few dbt models split them into a model per file otherwise it gets quite unwieldily very quickly.
Either that or parse them all but allow an input to specify which models you want to include in the data contract as it could be you want to or three for a specific contract?

@emirkmo
Copy link
Contributor

emirkmo commented May 13, 2024

I think the later is fine. As I presume most people with more than a few dbt models split them into a model per file otherwise it gets quite unwieldily very quickly.

This does not match my experience with larger dbt projects. But one or several models can logically co exist and be part a data contract so it is fine anyway? (It’s reasonable to ask/expect to not mix models from different data products/contracts..)

@torbenkeller
Copy link
Contributor

I'm looking into this right now

@simonharrer
Copy link
Contributor Author

Awesome! I assigned you the issue. :-)

@jochenchrist
Copy link
Contributor

@torbenkeller any progress here?

@teoria
Copy link
Contributor

teoria commented Jul 2, 2024

i've been working with dbt, maybe I can help

@torbenkeller
Copy link
Contributor

@jochenchrist Was working on other things the last weeks, sorry. But I will continue on this.

@teoria sounds good, if you want we can pair program to get this ready

@torbenkeller
Copy link
Contributor

torbenkeller commented Jul 3, 2024

@teoria you can contact me on the datacontract slack server

@teoria
Copy link
Contributor

teoria commented Jul 3, 2024

Nice!

@teoria
Copy link
Contributor

teoria commented Jul 9, 2024

first version of dbt manifest importer
#317

Usage:
Contract with all dbt models
$datacontract import --format dbt --source /path/manifest_dbt.json

Contract with only 2 models
$datacontract import --format dbt --source /path/manifest_dbt.json --dbt-model orders --dbt-model customers

@simonharrer
Copy link
Contributor Author

Mark this as closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants