Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lineage) Implement CLL impact analysis for inputFields #6426

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,15 @@ export default function ColumnsLineageSelect({
</Select.Option>
);
})}
{entityData?.inputFields?.fields?.map((field, idx) => {
const fieldPath = downgradeV2FieldPath(field?.schemaField?.fieldPath);
const key = `${field?.schemaField?.fieldPath}-${idx}`;
return (
<Select.Option key={key} value={field?.schemaField?.fieldPath || ''}>
<Tooltip title={fieldPath}>{fieldPath}</Tooltip>
</Select.Option>
);
})}
</StyledSelect>
)}
<Tooltip title={columnButtonTooltip}>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,11 @@

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.linkedin.common.InputField;
import com.linkedin.common.InputFields;
import com.linkedin.common.Status;
import com.linkedin.common.urn.Urn;
import com.linkedin.common.urn.UrnUtils;
import com.linkedin.data.template.RecordTemplate;
import com.linkedin.dataset.FineGrainedLineage;
import com.linkedin.dataset.UpstreamLineage;
Expand Down Expand Up @@ -180,15 +183,45 @@ private void updateFineGrainedEdgesAndRelationships(
}
}

private Urn generateSchemaFieldUrn(@Nonnull String resourceUrn, @Nonnull String fieldPath) {
// we rely on schemaField fieldPaths to be encoded since we do that with fineGrainedLineage on the ingestion side
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thanks for the explanation

String encodedFieldPath = fieldPath.replaceAll("\\(", "%28").replaceAll("\\)", "%29").replaceAll(",", "%2C");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: always good to make any fields that aren't going to change final

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for any function parameters!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

String urnString = String.format("urn:li:schemaField:(%s,%s)", resourceUrn, encodedFieldPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding urn:li:schemaField, can we add it a constant somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call and in fact i'll do you one better and use EntityKeyUtils.convertEntityKeyToUrn to be even more safe/systematic!

return UrnUtils.getUrn(urnString);
}

private void updateInputFieldEdgesAndRelationships(
Urn urn,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can any of these arguments be null?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If yes, add @nullable. If no, add @nonnull - the IDE will help point out scenarios in which you may violate these constraints)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, adding this nonnull annotations now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

RecordTemplate aspect,
List<Edge> edgesToAdd,
HashMap<Urn, Set<String>> urnToRelationshipTypesBeingAdded
) {
InputFields inputFields = new InputFields(aspect.data());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider passing InputFields as a parameter type, instead of RecordTemplate

if (inputFields.hasFields()) {
for (InputField field : inputFields.getFields()) {
if (field.hasSchemaFieldUrn() && field.hasSchemaField() && field.getSchemaField().hasFieldPath()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this already have a schemaFieldUrn in this case? Why cannot we use this URN? Is it not encoded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is part of the confusing modeling thing here - schemaFieldUrn is actually the upstream urn for this inputField and it could be null. So the schemaFieldUrn and the schemaField properties here are going to be referencing two different schemaFields

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the schemaFieldUrn is encoded

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it ty

Urn sourceFieldUrn = generateSchemaFieldUrn(urn.toString(), field.getSchemaField().getFieldPath());
edgesToAdd.add(new Edge(sourceFieldUrn, field.getSchemaFieldUrn(), DOWNSTREAM_OF));
Set<String> relationshipTypes = urnToRelationshipTypesBeingAdded.getOrDefault(sourceFieldUrn, new HashSet<>());
relationshipTypes.add(DOWNSTREAM_OF);
urnToRelationshipTypesBeingAdded.put(sourceFieldUrn, relationshipTypes);
}
}
}
}

private Pair<List<Edge>, HashMap<Urn, Set<String>>> getEdgesAndRelationshipTypesFromAspect(Urn urn, AspectSpec aspectSpec, RecordTemplate aspect) {
final List<Edge> edgesToAdd = new ArrayList<>();
final HashMap<Urn, Set<String>> urnToRelationshipTypesBeingAdded = new HashMap<>();

// we need to manually set schemaField <-> schemaField edges for fineGrainedLineage and inputFields
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Ideally this domain-specific schema field logic does not reside inside a much more generic UpdateIndicesHook. There should be some abstraction for encapsulating such special case logic and a way to register this logic with the index updater.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For a future refactor)

The idea of UpdateIndicesHook is to be completely agnostic of domain-specific logic that exists

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this all makes sense to me and is good to call out

// since @Relationship only links between the parent entity urn and something else.
if (aspectSpec.getName().equals(Constants.UPSTREAM_LINEAGE_ASPECT_NAME)) {
// we need to manually set schemaField <-> schemaField edges for fineGrainedLineage since
// @Relationship only links between the parent entity urn and something else.
updateFineGrainedEdgesAndRelationships(aspect, edgesToAdd, urnToRelationshipTypesBeingAdded);
}
if (aspectSpec.getName().equals(Constants.INPUT_FIELDS_ASPECT_NAME)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qq - Do we have unit tests for this class? If not, we absolutely need to backfill since it's so important

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not have anything for this class...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a TODO to do this on this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm about to push up the beginning of a test file for this, but will add a TODO at the top of the class to backfill the rest of the functionality!

updateInputFieldEdgesAndRelationships(urn, aspect, edgesToAdd, urnToRelationshipTypesBeingAdded);
}

Map<RelationshipFieldSpec, List<Object>> extractedFields =
FieldExtractor.extractFields(aspect, aspectSpec.getRelationshipFieldSpecs());
Expand Down