Repository for the D ONE databricks brick-by-brick workshop
-
Setup Workspace
- Adding the repository
- Create a personal cluster
-
Delta + Unity Catalog
- Read and Write Tables
- Upload data to Unity Catalog
- Time Travel + Installing Libraries
-
Medallion Architecture & Workflow Orchestration
- 3 Notebooks - Medallion architecture
- Creating a Workflow Job
-
ML and MLOps
Login to the workspace.
Adding the repository to your workspace:
- Click on
Workspace
in the navigation menu to the left. - Click on the directory
Home
. - Click on
Create
and chooseGit folder
and paste this URL intoGit repository
- Click on
Create Git folder
Now you should see a repository named brick-by-brick
under your own directory.
- Click on the
Compute
tab in the navigation menu to the left. - Click on
Create compute
and choose the following settings: - Choose the
Personal Copmute
Policy - Make sure the
Single user access
is under your user - Click on
Create Compute
Go to the following notebooks and follow the instructions:
Bronze
.Silver
Gold
- Click on the
Workflows
tab in the navigation menu to the left. - Click on the
Create job
button. - Add a Job name for your Workflow at the top:
bricks-<firstname_lastname>
. - Choose the following settings
- Task Name:
bronze_task
- Source:
Workspace
- Path: Click on
Select Notebook
and choose your Bronze Notebook - Cluster: Choose your existing cluster that you created in your first exercise.
- Task Name:
- Click on
Create
- Now you have created a workflow Job with one task inside.
- Click on
Add task
and chooseNotebook
- Repeat the steps for both the
silver_task
andgold_task
.- Make sure that they are dependent on each other in the following order bronze_task -> silver_task -> gold_task
- Click on
Run now
to run the whole Job.
Congratulations, you have now created a workflow Job.
- Run the ML Preprocessing notebook in your catalog to create the feature table.
- Move on to the ML MLflow Tracking notebook and walk through the steps to understand how to interact with MLflow experiments inside the Databricks workspace.
- Move on to the ML Model Registry notebook and walk through the steps to understand how to interact with the model registry via python APIs or via the directly using the UI
- (Optional)Tie steps 1-3 together by creating a new ML workflow! See the results of the workflow run in the UI.
- (Optional)Finally move on to the AutoML notebook and see for yourself how easy it is to use databricks AutoML as a quick way to create baseline models.
Excellent, you have now mastered MLflow on Databricks and you are ready to apply these principals to your own project.