Skip to content

Latest commit

 

History

History
234 lines (164 loc) · 5.62 KB

README.md

File metadata and controls

234 lines (164 loc) · 5.62 KB

dbt - Data Build Tool

DBT Logo

Transform, test, and document data in your warehouse

🚀 About

In this HashiQube DevOps lab, you'll get hands-on experience with dbt (Data Build Tool) - a transformation tool that enables data analysts and engineers to transform, test, and document data in cloud data warehouses.

Analytics Engineering with dbt

The modern analytics engineering workflow powered by dbt

📋 Getting Started

Before provisioning, review the dbt and adapter versions located in common.sh.

You can control which adapter and version you want to install with dbt by changing the DBT_WITH variable to one of these values:

DBT_WITH=postgres

# AVAILABLE OPTIONS:
# postgres    - PostgreSQL adapter
# redshift    - Amazon Redshift adapter
# bigquery    - Google BigQuery adapter
# snowflake   - Snowflake adapter
# mssql       - SQL Server and Synapse adapter
# spark       - Apache Spark adapter
# all         - Install all adapters (excluding mssql)

📥 Provision

Github Codespace

Open in GitHub Codespaces

bash docker/docker.sh
bash database/postgresql.sh
bash dbt/dbt.sh

Vagrant

vagrant up --provision-with basetools,docsify,docker,postgresql,dbt

Docker Compose

docker compose exec hashiqube /bin/bash
bash hashiqube/basetools.sh
bash docker/docker.sh
bash docsify/docsify.sh
bash database/postgresql.sh
bash dbt/dbt.sh

🧪 dbt Labs Example Project

The provisioner automatically sets up the Jaffle Shop example project from dbt Labs.

Running the Example

  1. Run the Provision step above.

  2. The example project from https://github.com/dbt-labs/jaffle_shop is already cloned into /vagrant/dbt/jaffle_shop.

  3. Enter the HashiQube environment:

    vagrant ssh
  4. Navigate to the example project:

    cd /vagrant/dbt/jaffle_shop
  5. Explore the project structure and follow the tutorial at https://github.com/dbt-labs/jaffle_shop#running-this-project.

💻 Using Your Own dbt Project

  1. Enter HashiQube SSH session:

    vagrant ssh
  2. If you have an existing dbt project under your home directory, you can access it via the /osdata volume, which is mapped to your home directory.

  3. Update your profiles.yml with the correct credentials for your target database.

  4. Test your connection:

    dbt debug
  5. Run your dbt project:

    dbt run

🖥️ Web UI Access

Once provisioning is complete, you can access the dbt web interface:

DBT Project View

dbt project view showing models and structure

DBT Database View

dbt database view showing tables and schemas

DBT Lineage Graph

dbt lineage graph showing data transformation dependencies

🔌 Supported Database Adapters

dbt supports multiple database adapters, allowing you to connect to various data warehouses.

MSSQL and Synapse

These adapters require a specific version of dbt:

Core:
  - installed: 1.1.0
  - latest:    1.2.1 - Update available!

Plugins:
  - postgres:  1.1.0 - Update available!
  - synapse:   1.1.0 - Up to date!
  - sqlserver: 1.1.0 - Up to date!

Other Adapters

When using other adapters, you'll see something like:

Core:
  - installed: 1.2.1
  - latest:    1.2.1 - Up to date!

Plugins:
  - spark:     1.2.0 - Up to date!
  - postgres:  1.2.1 - Up to date!
  - snowflake: 1.2.0 - Up to date!
  - redshift:  1.2.1 - Up to date!
  - bigquery:  1.2.0 - Up to date!

💡 Performance Tips

As your dbt project grows, dbt run and dbt test commands can become time-consuming. Here are some optimization strategies:

Using Deferred Execution

Store artifacts to reuse in future runs:

# Run only new or modified models
dbt run --select [...] --defer --state path/to/artifacts

# Test only new or modified models
dbt test --select [...] --defer --state path/to/artifacts

This approach:

  • Executes only what's new or changed in your code
  • Reuses previously compiled artifacts
  • Significantly reduces execution time for large projects
  • Is perfect for CI/CD pipelines and pull request validation

🔧 Provisioner Scripts

The dbt environment is set up using these scripts:

common.sh

#!/bin/bash
# Set versions
DBT_VERSION="1.2.1"
DBT_SQL_SERVER_VERSION="1.1.0"
DBT_SQL_SYNAPSE_VERSION="1.1.0"
DBT_WITH=postgres

# Available adapter options:
# postgres
# redshift
# bigquery
# snowflake
# mssql
# spark
# all

dbt-global.sh

#!/bin/bash
# This script provisions the global dbt environment...

🔗 Additional Resources