Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lineage) Implement CLL impact analysis for inputFields #6426

Conversation

chriscollins3456
Copy link
Collaborator

Currently, impact analysis only works for fineGrainedLineage but it needs to work for inputFields as well. Both of these aspects use schemaFields so all the infrastructure for fetching impact analysis for schema fields is already there, we just were not indexing inputFields to draw lines between schemaFields like we were for the upstreamLineage aspect yet. This adds that step to draw those edges manually.

Something to note is that we need to encode the fieldPath of the schemaField urn that we generate for the graph index. this is because ingestion always does that for schemaField urns but here we are creating a schemaField urn from the schemaField prop of inputFields and the fieldPath there is not encoded. schemaField urns need to be consistent across our DB and graph index as we request using the encoded urn from the FE and all over.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Nov 14, 2022
@github-actions
Copy link

github-actions bot commented Nov 14, 2022

Unit Test Results (build & test)

615 tests  +2   611 ✔️ +2   12m 10s ⏱️ +22s
152 suites +1       4 💤 ±0 
152 files   +1       0 ±0 

Results for commit 61b6dee. ± Comparison against base commit 7ea9797.

♻️ This comment has been updated with latest results.

@@ -180,15 +183,45 @@ private void updateFineGrainedEdgesAndRelationships(
}
}

private Urn generateSchemaFieldUrn(@Nonnull String resourceUrn, @Nonnull String fieldPath) {
// we rely on schemaField fieldPaths to be encoded since we do that with fineGrainedLineage on the ingestion side
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thanks for the explanation

}

private void updateInputFieldEdgesAndRelationships(
Urn urn,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can any of these arguments be null?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If yes, add @nullable. If no, add @nonnull - the IDE will help point out scenarios in which you may violate these constraints)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, adding this nonnull annotations now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

List<Edge> edgesToAdd,
HashMap<Urn, Set<String>> urnToRelationshipTypesBeingAdded
) {
InputFields inputFields = new InputFields(aspect.data());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider passing InputFields as a parameter type, instead of RecordTemplate

InputFields inputFields = new InputFields(aspect.data());
if (inputFields.hasFields()) {
for (InputField field : inputFields.getFields()) {
if (field.hasSchemaFieldUrn() && field.hasSchemaField() && field.getSchemaField().hasFieldPath()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this already have a schemaFieldUrn in this case? Why cannot we use this URN? Is it not encoded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is part of the confusing modeling thing here - schemaFieldUrn is actually the upstream urn for this inputField and it could be null. So the schemaFieldUrn and the schemaField properties here are going to be referencing two different schemaFields

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the schemaFieldUrn is encoded

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it ty

private Pair<List<Edge>, HashMap<Urn, Set<String>>> getEdgesAndRelationshipTypesFromAspect(Urn urn, AspectSpec aspectSpec, RecordTemplate aspect) {
final List<Edge> edgesToAdd = new ArrayList<>();
final HashMap<Urn, Set<String>> urnToRelationshipTypesBeingAdded = new HashMap<>();

// we need to manually set schemaField <-> schemaField edges for fineGrainedLineage and inputFields
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Ideally this domain-specific schema field logic does not reside inside a much more generic UpdateIndicesHook. There should be some abstraction for encapsulating such special case logic and a way to register this logic with the index updater.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For a future refactor)

The idea of UpdateIndicesHook is to be completely agnostic of domain-specific logic that exists

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this all makes sense to me and is good to call out

updateFineGrainedEdgesAndRelationships(aspect, edgesToAdd, urnToRelationshipTypesBeingAdded);
}
if (aspectSpec.getName().equals(Constants.INPUT_FIELDS_ASPECT_NAME)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qq - Do we have unit tests for this class? If not, we absolutely need to backfill since it's so important

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not have anything for this class...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a TODO to do this on this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm about to push up the beginning of a test file for this, but will add a TODO at the top of the class to backfill the rest of the functionality!

@@ -180,15 +183,45 @@ private void updateFineGrainedEdgesAndRelationships(
}
}

private Urn generateSchemaFieldUrn(@Nonnull String resourceUrn, @Nonnull String fieldPath) {
// we rely on schemaField fieldPaths to be encoded since we do that with fineGrainedLineage on the ingestion side
String encodedFieldPath = fieldPath.replaceAll("\\(", "%28").replaceAll("\\)", "%29").replaceAll(",", "%2C");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: always good to make any fields that aren't going to change final

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for any function parameters!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

private Urn generateSchemaFieldUrn(@Nonnull String resourceUrn, @Nonnull String fieldPath) {
// we rely on schemaField fieldPaths to be encoded since we do that with fineGrainedLineage on the ingestion side
String encodedFieldPath = fieldPath.replaceAll("\\(", "%28").replaceAll("\\)", "%29").replaceAll(",", "%2C");
String urnString = String.format("urn:li:schemaField:(%s,%s)", resourceUrn, encodedFieldPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding urn:li:schemaField, can we add it a constant somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call and in fact i'll do you one better and use EntityKeyUtils.convertEntityKeyToUrn to be even more safe/systematic!

}

private void updateInputFieldEdgesAndRelationships(
Urn urn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

updateFineGrainedEdgesAndRelationships(aspect, edgesToAdd, urnToRelationshipTypesBeingAdded);
}
if (aspectSpec.getName().equals(Constants.INPUT_FIELDS_ASPECT_NAME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a TODO to do this on this file?

Copy link
Contributor

@aditya-radhakrishnan aditya-radhakrishnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

import static com.linkedin.metadata.Constants.DATASET_ENTITY_NAME;
import static com.linkedin.metadata.search.utils.QueryUtils.newRelationshipFilter;

public class UpdateIndicesHookTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing these!

@chriscollins3456 chriscollins3456 merged commit 1d944be into datahub-project:master Nov 15, 2022
cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants