-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lineage) Implement CLL impact analysis for inputFields #6426
feat(lineage) Implement CLL impact analysis for inputFields #6426
Conversation
@@ -180,15 +183,45 @@ private void updateFineGrainedEdgesAndRelationships( | |||
} | |||
} | |||
|
|||
private Urn generateSchemaFieldUrn(@Nonnull String resourceUrn, @Nonnull String fieldPath) { | |||
// we rely on schemaField fieldPaths to be encoded since we do that with fineGrainedLineage on the ingestion side |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - thanks for the explanation
} | ||
|
||
private void updateInputFieldEdgesAndRelationships( | ||
Urn urn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - can any of these arguments be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, adding this nonnull annotations now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
List<Edge> edgesToAdd, | ||
HashMap<Urn, Set<String>> urnToRelationshipTypesBeingAdded | ||
) { | ||
InputFields inputFields = new InputFields(aspect.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider passing InputFields as a parameter type, instead of RecordTemplate
InputFields inputFields = new InputFields(aspect.data()); | ||
if (inputFields.hasFields()) { | ||
for (InputField field : inputFields.getFields()) { | ||
if (field.hasSchemaFieldUrn() && field.hasSchemaField() && field.getSchemaField().hasFieldPath()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this already have a schemaFieldUrn in this case? Why cannot we use this URN? Is it not encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is part of the confusing modeling thing here - schemaFieldUrn
is actually the upstream urn for this inputField and it could be null. So the schemaFieldUrn
and the schemaField
properties here are going to be referencing two different schemaFields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the schemaFieldUrn
is encoded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it ty
private Pair<List<Edge>, HashMap<Urn, Set<String>>> getEdgesAndRelationshipTypesFromAspect(Urn urn, AspectSpec aspectSpec, RecordTemplate aspect) { | ||
final List<Edge> edgesToAdd = new ArrayList<>(); | ||
final HashMap<Urn, Set<String>> urnToRelationshipTypesBeingAdded = new HashMap<>(); | ||
|
||
// we need to manually set schemaField <-> schemaField edges for fineGrainedLineage and inputFields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Ideally this domain-specific schema field logic does not reside inside a much more generic UpdateIndicesHook. There should be some abstraction for encapsulating such special case logic and a way to register this logic with the index updater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For a future refactor)
The idea of UpdateIndicesHook is to be completely agnostic of domain-specific logic that exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this all makes sense to me and is good to call out
updateFineGrainedEdgesAndRelationships(aspect, edgesToAdd, urnToRelationshipTypesBeingAdded); | ||
} | ||
if (aspectSpec.getName().equals(Constants.INPUT_FIELDS_ASPECT_NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qq - Do we have unit tests for this class? If not, we absolutely need to backfill since it's so important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not have anything for this class...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a TODO to do this on this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'm about to push up the beginning of a test file for this, but will add a TODO at the top of the class to backfill the rest of the functionality!
@@ -180,15 +183,45 @@ private void updateFineGrainedEdgesAndRelationships( | |||
} | |||
} | |||
|
|||
private Urn generateSchemaFieldUrn(@Nonnull String resourceUrn, @Nonnull String fieldPath) { | |||
// we rely on schemaField fieldPaths to be encoded since we do that with fineGrainedLineage on the ingestion side | |||
String encodedFieldPath = fieldPath.replaceAll("\\(", "%28").replaceAll("\\)", "%29").replaceAll(",", "%2C"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: always good to make any fields that aren't going to change final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for any function parameters!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
private Urn generateSchemaFieldUrn(@Nonnull String resourceUrn, @Nonnull String fieldPath) { | ||
// we rely on schemaField fieldPaths to be encoded since we do that with fineGrainedLineage on the ingestion side | ||
String encodedFieldPath = fieldPath.replaceAll("\\(", "%28").replaceAll("\\)", "%29").replaceAll(",", "%2C"); | ||
String urnString = String.format("urn:li:schemaField:(%s,%s)", resourceUrn, encodedFieldPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding urn:li:schemaField
, can we add it a constant somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call and in fact i'll do you one better and use EntityKeyUtils.convertEntityKeyToUrn
to be even more safe/systematic!
} | ||
|
||
private void updateInputFieldEdgesAndRelationships( | ||
Urn urn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
updateFineGrainedEdgesAndRelationships(aspect, edgesToAdd, urnToRelationshipTypesBeingAdded); | ||
} | ||
if (aspectSpec.getName().equals(Constants.INPUT_FIELDS_ASPECT_NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a TODO to do this on this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
import static com.linkedin.metadata.Constants.DATASET_ENTITY_NAME; | ||
import static com.linkedin.metadata.search.utils.QueryUtils.newRelationshipFilter; | ||
|
||
public class UpdateIndicesHookTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing these!
Currently, impact analysis only works for
fineGrainedLineage
but it needs to work forinputFields
as well. Both of these aspects useschemaFields
so all the infrastructure for fetching impact analysis for schema fields is already there, we just were not indexinginputFields
to draw lines betweenschemaFields
like we were for theupstreamLineage
aspect yet. This adds that step to draw those edges manually.Something to note is that we need to encode the
fieldPath
of theschemaField
urn that we generate for the graph index. this is because ingestion always does that forschemaField
urns but here we are creating a schemaField urn from theschemaField
prop ofinputFields
and thefieldPath
there is not encoded.schemaField
urns need to be consistent across our DB and graph index as we request using the encoded urn from the FE and all over.Checklist