-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ui) Add upgrade step to enable CLL impact analysis for existing data #6427
feat(ui) Add upgrade step to enable CLL impact analysis for existing data #6427
Conversation
|
||
private int getAndRestoreUpstreamLineageIndices(int start, AuditStamp auditStamp) | ||
throws Exception { | ||
final AspectSpec upstreamLineageAspectSpec = _entityRegistry.getEntitySpec(Constants.DATASET_ENTITY_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this AspectSpec? Is there any other method on entityService to produce MCL that doesn't require the AspectSpec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm all the methods I'm seeing from entityService that produce an MCL use an aspect spec. is there something specific you had in mind/know about?
final AspectSpec upstreamLineageAspectSpec = _entityRegistry.getEntitySpec(Constants.DATASET_ENTITY_NAME) | ||
.getAspectSpec(Constants.UPSTREAM_LINEAGE_ASPECT_NAME); | ||
SearchResult datasetResult = | ||
_entitySearchService.search(Constants.DATASET_ENTITY_NAME, "", null, null, start, BATCH_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a single entity type, this will only work up to 10k entities. After 10k, elastic doesn't allow us to paginate by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something to note. If you want fuller ability to paginate you can try to use the "scroll" method.
I'm also wondering if there's a way to do this that doesn't require that we go to elasticsearch. There is a listLatestAspects
method on the EntityService class that would allow you get the latest of a specific aspect for every URN. That may help here...
Also, is there any Field-Level Lineage in DataJob -> Dataset that we need to worry about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(listLatestAspects
will hit MySQL in a slightly more performant query)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It's used in the browse path upgrade step as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good! moved to using listLatestAspects
- gonna be faster and more efficient overall
|
||
return entityResult.getNumEntities(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Consider adding @nullable
public RestoreColumnLineageIndices(EntityService entityService, EntitySearchService entitySearchService, | ||
EntityRegistry entityRegistry) { | ||
super(entityService, VERSION, UPGRADE_ID); | ||
_entitySearchService = entitySearchService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor -> consider adding
_entitySearchService = Objects.requireNonNull(entitySearchService, 'entitySearchService must not be null'
Basically this allows us to fail faster (in the constructor at init time) as opposed to waiting for an NPE to arise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add @nonnull and final annotations on these parameters!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally
final Urn upgradeEntityUrn = UrnUtils.getUrn(COLUMN_LINEAGE_UPGRADE_URN); | ||
com.linkedin.upgrade.DataHubUpgradeRequest upgradeRequest = new com.linkedin.upgrade.DataHubUpgradeRequest().setVersion(version); | ||
Map<String, EnvelopedAspect> upgradeRequestAspects = new HashMap<>(); | ||
upgradeRequestAspects.put(Constants.DATA_HUB_UPGRADE_REQUEST_ASPECT_NAME, new EnvelopedAspect().setValue(new Aspect(upgradeRequest.data()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thorough! Nice!!
public RestoreColumnLineageIndices(EntityService entityService, EntitySearchService entitySearchService, | ||
EntityRegistry entityRegistry) { | ||
super(entityService, VERSION, UPGRADE_ID); | ||
_entitySearchService = entitySearchService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add @nonnull and final annotations on these parameters!
import java.util.HashSet; | ||
import java.util.Map; | ||
|
||
public class RestoreColumnLineageIndicesTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Adds a new upgrade step for restoring indices for the
upstreamLineage
aspects as well as theinputFields
aspect. TheupstreamLineage
aspect hasfineGrainedLineage
that needs to be reindexed with our new logic for impact analysis. Same this withinputFields
. After this, users will have their graph index set up properly to do impact analysis with all columns.Checklist