Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lineage) Add column-level impact analysis feature #6272

Conversation

chriscollins3456
Copy link
Collaborator

Adds the ability to view impact analysis specifically at the column level. You can now select a column to view impact analysis on from the schema tab with a new menu or select from a dropdown on the lineage tab. You'll see the impacted columns and you can even click on the column name in order to see the whole path for how you get between the two columns.

This involved some changes to the how we index upstream lineage. We now check if the upstream lineage aspect that we're indexing has fineGrainedLineage, and if so, we draw edges manually between the two schemaFields. That way we can query to get edges based on a schemaField urn.

Unforunately this means that for those users that already have fineGrainedLineage ingested, they will need to restore their indices in order to get old fineGrainedLineage data to work with impact analysis

Here's some screenshots for how this looks in action:

The new column menu
image

Viewing impact analysis for the email column
image

The paths modal to see the full path between entities and their columns (all of these happen to be named email)
image

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Oct 24, 2022
@github-actions
Copy link

github-actions bot commented Oct 24, 2022

Unit Test Results (build & test)

597 tests  ±0   593 ✔️ ±0   11m 44s ⏱️ -13s
147 suites ±0       4 💤 ±0 
147 files   ±0       0 ±0 

Results for commit c23fc47. ± Comparison against base commit 18df38e.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Oct 24, 2022

Unit Test Results (metadata ingestion)

       8 files         8 suites   56m 43s ⏱️
   748 tests    745 ✔️ 3 💤 0
1 498 runs  1 492 ✔️ 6 💤 0

Results for commit c23fc47.

♻️ This comment has been updated with latest results.

Urn resourceUrn = Urn.createFromString(schemaFieldUrn.getEntityKey().get(0));
result.setParent(UrnToEntityMapper.map(resourceUrn));
} catch (Exception e) {
log.error("Error converting schemaField parent urn string to Urn", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no issue if we can't convert right? It's still safe to return the mapped object without this field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's a good catch - SchemaFieldEntity is expecting parent to not be null here. so I think our options are to either make parent nullable or raise an error if this situation occurs (which it shouldn't). I don't think it really makes sense to have parent be null since the schemaField should always have a valid parent reference urn in its urn.. so i'm leaning towards raising an error. what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked IRL and agreed on throwing an exception here

@@ -2343,21 +2343,31 @@ type KeyValueSchema {
Standalone schema field entity. Differs from the SchemaField struct because it is not directly nested inside a
schema field
"""
type SchemaFieldEntity {
type SchemaFieldEntity implements Entity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@@ -55,4 +55,29 @@ describe("impact analysis", () => {
cy.contains("User Creations").should("not.exist");
cy.contains("User Deletions");
});

it("can view column level impact analysis and turn it off", () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you!

@chriscollins3456 chriscollins3456 merged commit cd1331f into datahub-project:master Oct 26, 2022
cccs-tom pushed a commit to CybercentreCanada/datahub that referenced this pull request Nov 18, 2022
cccs-tom pushed a commit to CybercentreCanada/datahub that referenced this pull request Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants