Skip to content

Commit

Permalink
docs(ingest): improve doc gen, docs for snowflake, looker (datahub-pr…
Browse files Browse the repository at this point in the history
  • Loading branch information
shirshanka committed Sep 8, 2022
1 parent 62699a1 commit fe73ab9
Show file tree
Hide file tree
Showing 13 changed files with 276 additions and 65 deletions.
23 changes: 0 additions & 23 deletions metadata-ingestion/docs/sources/looker/looker.md

This file was deleted.

62 changes: 62 additions & 0 deletions metadata-ingestion/docs/sources/looker/looker_pre.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
### Pre-Requisites

#### Set up the right permissions
You need to provide the following permissions for ingestion to work correctly.
```
access_data
explore
manage_models
see_datagroups
see_lookml
see_lookml_dashboards
see_looks
see_pdts
see_queries
see_schedules
see_sql
see_system_activity
see_user_dashboards
see_users
```
Here is an example permission set after configuration.
![Looker DataHub Permission Set](./looker_datahub_permission_set.png)

#### Get an API key

You need to get an API key for the account with the above privileges to perform ingestion. See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.


### Ingestion through UI

The following video shows you how to get started with ingesting Looker metadata through the UI.

:::note

You will need to run `lookml` ingestion through the CLI after you have ingested Looker metadata through the UI. Otherwise you will not be able to see Looker Views and their lineage to your warehouse tables.

:::

<div
style={{
position: "relative",
paddingBottom: "57.692307692307686%",
height: 0
}}
>
<iframe
src="https://www.loom.com/embed/b8b9654e02714d20a44122cc1bffc1bb"
frameBorder={0}
webkitallowfullscreen=""
mozallowfullscreen=""
allowFullScreen=""
style={{
position: "absolute",
top: 0,
left: 0,
width: "100%",
height: "100%"
}}
/>
</div>


13 changes: 0 additions & 13 deletions metadata-ingestion/docs/sources/looker/lookml.md

This file was deleted.

11 changes: 11 additions & 0 deletions metadata-ingestion/docs/sources/looker/lookml_post.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#### Configuration Notes

:::note

The integration can use an SQL parser to try to parse the tables the views depends on.

:::

This parsing is disabled by default, but can be enabled by setting `parse_table_names_from_sql: True`. The default parser is based on the [`sqllineage`](https://pypi.org/project/sqllineage/) package.
As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a custom parser and take it into use by setting the `sql_parser` configuration value. A custom SQL parser must inherit from `datahub.utilities.sql_parser.SQLParser`
and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to `module_name.ClassName` of the parser.
84 changes: 84 additions & 0 deletions metadata-ingestion/docs/sources/looker/lookml_pre.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
### Pre-requisites

#### [Optional] Create an API key

See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.
You need to ensure that the API key is attached to a user that has Admin privileges.

If that is not possible, read the configuration section and provide an offline specification of the `connection_to_platform_map` and the `project_name`.

### Ingestion through UI

Ingestion using lookml connector is not supported through the UI.
However, you can set up ingestion using a GitHub Action to push metadata whenever your main lookml repo changes.

#### Sample GitHub Action

Drop this file into your `.github/workflows` directory inside your Looker github repo.

```
name: lookml metadata upload
on:
push:
branches:
- main
paths-ignore:
- "docs/**"
- "**.md"
pull_request:
branches:
- main
paths-ignore:
- "docs/**"
- "**.md"
release:
types: [published, edited]
workflow_dispatch:
jobs:
lookml-metadata-upload:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Run LookML ingestion
run: |
pip install 'acryl-datahub[lookml,datahub-rest]'
cat << EOF > lookml_ingestion.yml
# LookML ingestion configuration
source:
type: "lookml"
config:
base_folder: ${{ github.workspace }}
parse_table_names_from_sql: true
github_info:
repo: ${{ github.repository }}
branch: ${{ github.ref }}
# Options
#connection_to_platform_map:
# acryl-snow: snowflake
#platform: snowflake
#default_db: DEMO_PIPELINE
api:
client_id: ${LOOKER_CLIENT_ID}
client_secret: ${LOOKER_CLIENT_SECRET}
base_url: ${LOOKER_BASE_URL}
sink:
type: datahub-rest
config:
server: ${DATAHUB_GMS_HOST}
token: ${DATAHUB_TOKEN}
EOF
datahub ingest -c lookml_ingestion.yml
env:
DATAHUB_GMS_HOST: ${{ secrets.DATAHUB_GMS_HOST }}
DATAHUB_TOKEN: ${{ secrets.DATAHUB_TOKEN }}
LOOKER_BASE_URL: https://acryl.cloud.looker.com # <--- replace with your Looker base URL
LOOKER_CLIENT_ID: ${{ secrets.LOOKER_CLIENT_ID }}
LOOKER_CLIENT_SECRET: ${{ secrets.LOOKER_CLIENT_SECRET }}
```

If you want to ingest lookml using the **datahub** cli directly, read on for instructions and configuration details.
5 changes: 3 additions & 2 deletions metadata-ingestion/docs/sources/looker/lookml_recipe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ source:

# Optional additional github information. Used to add github links on the dataset's entity page.
github_info:
repo: org/repo-name
repo: org/repo-name
# Default sink is datahub-rest and doesn't need to be configured
# See https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for customization options

# sink configs
29 changes: 27 additions & 2 deletions metadata-ingestion/docs/sources/snowflake/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,29 @@
To get all metadata from Snowflake you need to use two plugins `snowflake` and `snowflake-usage`. Both of them are described in this page. These will require 2 separate recipes.
Ingesting metadata from Snowflake requires either using the **snowflake-beta** module with just one recipe (recommended) or the two separate modules **snowflake** and **snowflake-usage** (soon to be deprecated) with two separate recipes.

All three modules are described on this page.

We encourage you to try out new `snowflake-beta` plugin as alternative to running both `snowflake` and `snowflake-usage` plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata .
We encourage you to try out the new **snowflake-beta** plugin as alternative to running both **snowflake** and **snowflake-usage** plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata.

## Snowflake Ingestion through the UI

The following video shows you how to ingest Snowflake metadata through the UI.

<div style={{ position: "relative", paddingBottom: "56.25%", height: 0 }}>
<iframe
src="https://www.loom.com/embed/15d0401caa1c4aa483afef1d351760db"
frameBorder={0}
webkitallowfullscreen=""
mozallowfullscreen=""
allowFullScreen=""
style={{
position: "absolute",
top: 0,
left: 0,
width: "100%",
height: "100%"
}}
/>
</div>


Read on if you are interested in ingesting Snowflake metadata using the **datahub** cli, or want to learn about all the configuration parameters that are supported by the connectors.
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
source:
type: snowflake-beta
config:

# This option is recommended to be used for the first time to ingest all lineage
ignore_start_time_lineage: true
# This is an alternative option to specify the start_time for lineage
# if you don't want to look back since beginning
start_time: '2022-03-01T00:00:00Z'
start_time: "2022-03-01T00:00:00Z"

# Coordinates
account_id: "abc48144"
Expand Down Expand Up @@ -35,9 +34,7 @@ source:
profile_table_level_only: true
profile_pattern:
allow:
- 'ACCOUNTING_DB.*.*'
- 'MARKETING_DB.*.*'


sink:
# sink configs
- "ACCOUNTING_DB.*.*"
- "MARKETING_DB.*.*"
# Default sink is datahub-rest and doesn't need to be configured
# See https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for customization options
Loading

0 comments on commit fe73ab9

Please sign in to comment.