Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make table-level and column-level meta extensible for other tools #68

Open
yu-iskw opened this issue Apr 6, 2023 · 2 comments
Open

Comments

@yu-iskw
Copy link
Contributor

yu-iskw commented Apr 6, 2023

Motivation

Some tools which integrate with dbt defines their own custom schema under the meta property at table-level and column-level.
For instance, lightdash enables us to declare metrics like below.

https://docs.lightdash.com/guides/how-to-create-metrics

# schema.yml
version: 2
models:
  - name: "orders"
    description: "A table of all orders."
    columns:
      - name: "status"
        description: "Status of an order: ordered/processed/complete"
      - name: "order_id"
        meta:
          metrics:
            total_order_count:
              type: count_distinct
      - name: "order_value"
        meta:
          metrics:
            total_sales:
              type: sum

We define the meta property just as object. I don't have any good ideas to support such extensibility in JSON schema. But, if we make the JSON schema opened to other tools, that would be awesome.

https://github.com/dbt-labs/dbt-jsonschema/blob/main/schemas/dbt_yml_files.json#L680-L682

@joellabes
Copy link
Collaborator

Agreed! I don't think that JSON Schema itself is extensible in that way, but if there is any way to augment a schema with components from another schema then that would be awesome.

@yu-iskw
Copy link
Contributor Author

yu-iskw commented May 23, 2024

@joellabes I got an idea to support other tools which define their custom meta by utilizing $ref to include external files and anyOf to unify them. We can independently define custom meta for each tool and include them in dbt_yml_files-latest.json.

Overview

To enhance interoperability between dbt and tools like Lightdash, I propose extending the JSON schemas used in dbt to support external schema references. This approach will allow seamless integration and customization of metadata at various levels, facilitating better data management and analysis.

Proposal Details

1. Utilizing $ref in JSON Schemas:

Use the $ref feature to include external schemas, enabling separate management of schemas while maintaining a unified structure.
Merging Definitions with anyOf:

2. Use anyOf to combine multiple definitions for a property, allowing different tools to extend the schema as needed.

Support for Various Resource Types:

3. dbt supports various resource types, including models, sources, and snapshots. This proposal extends metadata support at both table and column levels for each resource type.

Resource types to be supported:

  • Table level meta of dbt model
  • Column level meta of dbt model
  • Table level meta of dbt source
  • Column level meta of dbt source
  • Table level meta of dbt snapshot
  • Column level meta of dbt snapshot

Examples of External Schemas for Lightdash:

  • ./meta/lightdash/model_table_meta.json
  • ./meta/lightdash/model_column_meta.json
  • ./meta/lightdash/source_table_meta.json
  • ./meta/lightdash/source_column_meta.json

JSON Schema Example

{
  "models": {
    "type": "array",
    "items": {
      "type": "object",
      "required": ["name"],
      "properties": {
        "name": { "type": "string" },
        "description": { "type": "string" },
        "access": {
          "type": "string",
          "enum": ["private", "protected", "public"]
        },
        "columns": {
          "type": "array",
          "items": {
            "$ref": "#/$defs/column_properties"
          }
        },
        "config": { "$ref": "#/$defs/model_configs" },
        "constraints": { "$ref": "#/$defs/constraints" },
        "data_tests": {
          "type": "array",
          "items": { "$ref": "#/$defs/data_tests" }
        },
        "deprecation_date": { "type": "string" },
        "docs": { "$ref": "#/$defs/docs_config" },
        "group": { "$ref": "#/$defs/group" },
        "latest_version": { "type": "number" },
        "meta": {
          "anyOf": [
            { "type": "object" },
            { "$ref": "./meta/lightdash/model_table_meta.json" }
          ]
        },
        "tests": {
          "type": "array",
          "items": { "$ref": "#/$defs/data_tests" }
        },
        "versions": {
          "type": "array",
          "items": {
            "type": "object",
            "required": ["v"],
            "properties": {
              "columns": {
                "type": "array",
                "items": {
                  "anyOf": [
                    { "$ref": "#/$defs/include_exclude" },
                    { "$ref": "#/$defs/column_properties" },
                    { "$ref": "./meta/lightdash/model_column_meta.json" }
                  ]
                }
              },
              "config": { "$ref": "#/$defs/model_configs" },
              "v": { "type": "number" }
            }
          }
        }
      },
      "additionalProperties": false
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants