Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(feature-guide) Impact Analysis #5765

Merged
merged 26 commits into from
Sep 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
b1031d6
update sidebar titles to remove About DataHub
maggiehays Aug 29, 2022
1a24dbf
move impact analysis guide to new folder; update links
maggiehays Aug 29, 2022
7ce93f7
update copy in Understand Data in Context section
maggiehays Aug 29, 2022
e667877
adding feature guide template to sidebar
maggiehays Aug 29, 2022
cc32235
adding feature guide template
maggiehays Aug 29, 2022
e475859
update docs readme to link to feature guide template
maggiehays Aug 29, 2022
c3d39d9
enhance docs-website readme
maggiehays Aug 29, 2022
50060f6
add comments to feature guide template
maggiehays Aug 30, 2022
f5f5167
Merge branch 'master' into docs-impact-analysis
maggiehays Aug 30, 2022
b57feeb
add links to graphql and lineage resources
maggiehays Aug 30, 2022
b6ba66f
linter cleanup
maggiehays Aug 30, 2022
25b08c1
Merge branch 'master' into docs-impact-analysis
maggiehays Aug 30, 2022
8f356b2
Merge branch 'master' into docs-impact-analysis
maggiehays Aug 30, 2022
78e5f54
updating reference links
maggiehays Aug 30, 2022
c9064e8
update to graphql reference links
maggiehays Aug 30, 2022
14fb64e
Merge branch 'master' into docs-impact-analysis
maggiehays Aug 30, 2022
dc9e935
Merge branch 'master' into docs-impact-analysis
maggiehays Aug 30, 2022
9770e51
add image and gif best practices
maggiehays Aug 30, 2022
b7e89ac
update feature guide template with image details
maggiehays Aug 30, 2022
bee2dbf
fix link
hsheth2 Aug 31, 2022
aa955ef
Merge branch 'master' into docs-impact-analysis
maggiehays Sep 1, 2022
54d32b0
Merge branch 'master' into docs-impact-analysis
maggiehays Sep 1, 2022
6126431
Merge branch 'master' into docs-impact-analysis
maggiehays Sep 1, 2022
81db8f2
update template from YouTube -> Videos
maggiehays Sep 1, 2022
5962ef8
Update docs-website/README.md
maggiehays Sep 1, 2022
ab33a09
update feature to Lineage Impact Analysis
maggiehays Sep 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 103 additions & 1 deletion docs-website/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,106 @@ To regenerate GraphQL API docs, simply rebuild the docs-website directory.

```console
./gradlew docs-website:build
```
```

## Managing Content

Please use the following steps when adding/managing content for the docs site.

### Leverage Documentation Templates

* [Feature Guide Template](./docs/_feature-guide-template.md)
* [Metadata Ingestion Source Template](./metadata-ingestion/source-docs-template.md)

### Self-Hosted vs. Managed DataHub

The docs site includes resources for both self-hosted (aka open-source) DataHub and Managed DataHub alike.

* All Feature Guides should include the `FeatureAvailability` component within the markdown file itself
* Features only available via Managed DataHub should have the `saasOnly` class if they are included in `sidebar.js` to display the small "cloud" icon:

```
{
type: "doc",
id: "path/to/document",
className: "saasOnly",
},
```

### Sidebar Display Options
maggiehays marked this conversation as resolved.
Show resolved Hide resolved

`generateDocsDir.ts` has a bunch of logic to auto-generate the docs site Sidebar; here are a few ways to manage how documents are displayed.

1. Leverage the document's H1 value

By default, the Sidebar will display the H1 value of the Markdown file, not the file name itself.

**NOTE:** `generateDocsDir.ts` will strip leading values of `DataHub ` and `About DataHub ` to minimize repetitive values of DataHub in the sidebar

2. Hard-code the section title in `generateDocsDir.ts`

Map the file to a hard-coded value in `const hardcoded_titles`

3. Assign a `title` separate from the H1 value

You can add the following details at the top of the markdown file:

```
---
title: [value to display in the sidebar]
---
```

*This will be ignored your H1 value begins with `DataHub ` or `About DataHub `*

**NOTE:** Assigning a value for `label:` in `sidebar.js` is not reliable, e.g.

```
{ // Don't do this
label: "Usage Guide",
type: "doc",
id: "path/to/document",
},
```

### Determine the Appropriate Sidebar Section

When adding a new document to the site, determine the appropriate sidebar section:

**What is DataHub?**

By the end of this section, readers should understand the core use cases that DataHub addresses, target end-users, high-level architecture, & hosting options.

**Get Started**

The goal of this section is to provide the bare-minimum steps required to:
- Get DataHub Running
- Optionally configure SSO
- Add/invite Users
- Create Polices & assign roles
- Ingest at least one source (i.e., data warehouse)
- Understand high-level options for enriching metadata

**Ingest Metadata**

This section aims to provide a deeper understanding of how ingestion works. Readers should be able to find details for ingesting from all systems, apply transformers, understand sinks, and understand key concepts of the Ingestion Framework (Sources, Sinks, Transformers, and Recipes).

**Enrich Metadata**

The purpose of this section is to provide direction on how to enrich metadata when shift-left isn’t an option.

**Act on Metadata**

This section provides concrete examples of acting on metadata changes in real-time and enabling Active Metadata workflows/practices.

**Deploy DataHub**

The purpose of this section is to provide the minimum steps required to deploy DataHub to the vendor of your choosing.

**Developer Guides**

The purpose of this section is to provide developers & technical users with concrete tutorials on how to work with the DataHub CLI & APIs.

**Feature Guides**

This section aims to provide plain-language feature overviews for both technical and non-technical readers alike.
3 changes: 3 additions & 0 deletions docs-website/generateDocsDir.ts
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,9 @@ function markdown_guess_title(
if (sidebar_label.startsWith("DataHub ")) {
sidebar_label = sidebar_label.slice(8).trim();
}
if (sidebar_label.startsWith("About DataHub ")) {
hsheth2 marked this conversation as resolved.
Show resolved Hide resolved
sidebar_label = sidebar_label.slice(14).trim();
}
if (sidebar_label != title) {
contents.data.sidebar_label = sidebar_label;
}
Expand Down
3 changes: 2 additions & 1 deletion docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ module.exports = {
// className: "saasOnly",
// },
// "docs/wip/metadata-analytics",
// "docs/wip/impact-analysis",
"docs/act-on-metadata/impact-analysis",
// {
// type: "doc",
// id: "docs/wip/events-bridge",
Expand Down Expand Up @@ -514,6 +514,7 @@ module.exports = {
// - "perf-test/README",
// "metadata-jobs/README",
// "docs/how/add-user-data",
// "docs/_feature-guide-template"
// ],
},
};
2 changes: 1 addition & 1 deletion docs-website/src/pages/docs/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ const featureGuideContent = [
{ title: "UI-Based Ingestion", icon: <ApiTwoTone />, to: "docs/ui-ingestion" },
{ title: "Search", icon: <SearchOutlined />, to: "docs/how/search" },
// { title: "Browse", icon: <CompassTwoTone />, to: "/docs/quickstart" },
{ title: "Impact Analysis", icon: <NodeExpandOutlined />, to: "docs/wip/impact-analysis" },
{ title: "Lineage Impact Analysis", icon: <NodeExpandOutlined />, to: "docs/act-on-metadata/impact-analysis" },
{ title: "Metadata Tests", icon: <CheckCircleTwoTone />, to: "docs/wip/metadata-tests" },
{ title: "Approval Flows", icon: <SafetyCertificateTwoTone />, to: "docs/wip/approval-workflows" },
{ title: "Personal Access Tokens", icon: <LockTwoTone />, to: "docs/authentication/personal-access-tokens" },
Expand Down
4 changes: 2 additions & 2 deletions docs-website/src/pages/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,8 @@ function Home() {
</h2>
<p>
DataHub is the one-stop shop for documentation, schemas,
ownership, lineage, pipelines and usage information. Data
quality and data preview information coming soon.
ownership, lineage, pipelines, data quality, usage information,
and more.
</p>
</div>
<div className="col col--6 col--offset-1">
Expand Down
50 changes: 50 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,51 @@
# DataHub Docs Overview

DataHub's project documentation is hosted at [datahubproject.io](https://datahubproject.io/docs)

## Types of Documentation

### Feature Guide

A Feature Guide should follow the [Feature Guide Template](/_feature-guide-template.md), and should provide the following value:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjoyce0510 looking for your input here


* At a high level, what is the concept/feature within DataHub?
* Why is the feature useful?
* What are the common use cases of the feature?
* What are the simple steps one needs to take to use the feature?

When creating a Feature Guide, please remember to:

* Provide plain-language descriptions for both technical and non-technical readers
* Avoid using industry jargon, abbreviations, or acryonyms
* Provide descriptive screenshots, links out to relevant YouTube videos, and any other relevant resources
* Provide links out to Tutorials for advanced use cases

*Not all Feature Guides will require a Tutorial.*

### Tutorial

A Tutorial is meant to provide very specific steps to accomplish complex workflows and advanced use cases that are out of scope of a Feature Guide.

Tutorials should be written to accomodate the targeted persona, i.e. Developer, Admin, End-User, etc.

*Not all Tutorials require an associated Feature Guide.*

## Docs Best Practices

### Embedding GIFs and or Screenshots

* Store GIFs and screenshots in [datahub-project/static-assets](https://github.com/datahub-project/static-assets); this minimizes unnecessarily large image/file sizes in the main repo
* Center-align screenshots and size down to 70% - this improves readability/skimability within the site

Example snippet:

```
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-export-full-list.png"/>
</p>
```

* Use the "raw" GitHub image link (right click image from GitHub > Open in New Tab > copy URL):

* Good: https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/dbt-test-logic-view.png
* Bad: https://github.com/datahub-project/static-assets/blob/main/imgs/dbt-test-logic-view.png
83 changes: 83 additions & 0 deletions docs/_feature-guide-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# About DataHub [Feature Name]

<!-- All Feature Guides should begin with `About DataHub ` to improve SEO -->

<!--
Update feature availability; by default, feature availabilty is Self-Hosted and Managed DataHub

Add in `saasOnly` for Managed DataHub-only features
-->

<FeatureAvailability/>

<!-- This section should provide a plain-language overview of feature. Consider the following:

* What does this feature do? Why is it useful?
* What are the typical use cases?
* Who are the typical users?
* In which DataHub Version did this become available? -->

## [Feature Name] Setup, Prerequisites, and Permissions

<!-- This section should provide plain-language instructions on how to configure the feature:

* What special configuration is required, if any?
* How can you confirm you configured it correctly? What is the expected behavior?
* What access levels/permissions are required within DataHub? -->

## Using [Feature Name]

<!-- Plain-language instructions of how to use the feature

Provide a step-by-step guide to use feature, including relevant screenshots and/or GIFs

* Where/how do you access it?
* What best practices exist?
* What are common code snippets?
-->

## Additional Resources

<!-- Comment out any irrelevant or empty sections -->

### Videos

<!-- Use the following format to embed YouTube videos:

**Title of YouTube video in bold text**

<p align="center">
<iframe width="560" height="315" src="www.youtube.com/embed/VIDEO_ID" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>

-->

<!--
NOTE: Find the iframe details in YouTube by going to Share > Embed
-->

### GraphQL

<!-- Bulleted list of relevant GraphQL docs; comment out section if none -->

### DataHub Blog

<!-- Bulleted list of relevant DataHub Blog posts; comment out section if none -->

## FAQ and Troubleshooting

<!-- Use the following format:

**Question in bold text**

Response in plain text

-->

*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*

### Related Features

<!-- Bulleted list of related features; comment out section if none -->
93 changes: 93 additions & 0 deletions docs/act-on-metadata/impact-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# About DataHub Lineage Impact Analysis

<FeatureAvailability/>

Lineage Impact Analysis is a powerful workflow for understanding the complete set of upstream and downstream dependencies of a Dataset, Dashboard, Chart, and many other DataHub Entities.

This allows Data Practitioners to proactively identify the impact of breaking schema changes or failed data pipelines on downstream dependencies, rapidly discover which upstream dependencies may have caused unexpected data quality issues, and more.

Lineage Impact Analysis is available via the DataHub UI and GraphQL endpoints, supporting manual and automated workflows.

## Lineage Impact Analysis Setup, Prerequisites, and Permissions

Lineage Impact Analysis is enabled for any Entity that has associated Lineage relationships with other Entities and does not require any additional configuration.

Any DataHub user with “View Entity Page” permissions is able to view the full set of upstream or downstream Entities and export results to CSV from the DataHub UI.

## Using Lineage Impact Analysis

Follow these simple steps to understand the full dependency chain of your data entities.

1. On a given Entity Page, select the **Lineage** tab

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-lineage-tab.png"/>
</p>

2. Easily toggle between **Upstream** and **Downstream** dependencies

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-choose-upstream-downstream.png"/>
</p>

3. Choose the **Degree of Dependencies** you are interested in. The default filter is “1 Degree of Dependency” to minimize processor-intensive queries.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-filter-dependencies.png"/>
</p>

4. Slice and dice the result list by Entity Type, Platfrom, Owner, and more to isolate the relevant dependencies

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-apply-filters.png"/>
</p>

5. Export the full list of dependencies to CSV

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-export-full-list.png"/>
</p>

6. View the filtered set of dependencies via CSV, with details about assigned ownership, domain, tags, terms, and quick links back to those entities within DataHub

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-view-export-results.png"/>
</p>

## Additional Resources

### Videos

**DataHub 201: Impact Analysis**

<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/BHG_kzpQ_aQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>

### GraphQL

* [searchAcrossLineage](../../graphql/queries.md#searchacrosslineage)
* [searchAcrossLineageInput](../../graphql/inputObjects.md#searchacrosslineageinput)

### DataHub Blog

* [Dependency Impact Analysis, Data Validation Outcomes, and MORE! - Highlights from DataHub v0.8.27 & v.0.8.28](https://blog.datahubproject.io/dependency-impact-analysis-data-validation-outcomes-and-more-1302604da233)


### FAQ and Troubleshooting

**The Lineage Tab is greyed out - why can’t I click on it?**

This means you have not yet ingested Lineage metadata for that entity. Please see the Lineage Guide to get started.

**Why is my list of exported dependencies incomplete?**

We currently limit the list of dependencies to 10,000 records; we suggest applying filters to narrow the result set if you hit that limit.

*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*

### Related Features

* [DataHub Lineage](./docs/lineage/intro.md)
7 changes: 0 additions & 7 deletions docs/wip/impact-analysis.md

This file was deleted.