Skip to content

Latest commit

 

History

History
118 lines (89 loc) · 3.6 KB

TUTORIAL.md

File metadata and controls

118 lines (89 loc) · 3.6 KB

Data Catalog Fileset Exporter Tutorial

Intro

This tutorial will walk you through the execution of the Data Catalog Fileset Exporter.

Python CLI

This script is a Python CLI, if you want to look at the code open: .

Otherwise go to the next step.

CSV fields

Go to the file, and find the 5. Export Filesets to CSV file section. This section explains the CSV columns created when the Python CLI is executed.

Set Up the Service Account

First, let's set up the Service Account. (You may skip this, if you already have your Service Account)

Start by setting your project ID. Replace the placeholder to your project.

gcloud config set project MY_PROJECT_PLACEHOLDER

Next load it in a environment variable.

export PROJECT_ID=$(gcloud config get-value project)

Then create a Service Account.

gcloud iam service-accounts create datacatalog-fs-exporter-sa \
--display-name  "Service Account for Fileset Exporter" \
--project $PROJECT_ID

Next create a credentials folder where the Service Account will be saved.

mkdir -p ~/credentials

Next create and download the Service Account Key.

gcloud iam service-accounts keys create "datacatalog-fs-exporter-sa.json" \
--iam-account "datacatalog-fs-exporter-sa@$PROJECT_ID.iam.gserviceaccount.com" \
&& mv datacatalog-fs-exporter-sa.json ~/credentials/datacatalog-fs-exporter-sa.json

Next add Data Catalog admin role to the Service Account.

gcloud projects add-iam-policy-binding $PROJECT_ID \
--member "serviceAccount:datacatalog-fs-exporter-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--quiet \
--project $PROJECT_ID \
--role "roles/datacatalog.admin"

Next set up the credentials environment variable.

export GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-fs-exporter-sa.json

Install the Python CLI

Install and config the datacatalog-fileset-exporter CLI.

pip3 install datacatalog-fileset-exporter --user

Next load it to your PATH.

export PATH=~/.local/bin:$PATH

Next test it out.

datacatalog-fileset-exporter --help

Execute the Python CLI

Run the Python CLI:

Create an output folder:

mkdir -p ~/output

Run the CLI:

datacatalog-fileset-exporter filesets export --project-ids $PROJECT_ID --file-path ~/output/filesets.csv

Let's see the output:

cat ~/output/filesets.csv

Use the Cloud Editor to see the file, or upload it to Google Sheets to better visualize it.

Congratulations!

You've successfully finished the Data Catalog Fileset Exporter Tutorial.