Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(platform): Support @Searchable + @Relationship Annotations for Timeseries Aspects #6455

Merged

Conversation

jjoyce0510
Copy link
Collaborator

@jjoyce0510 jjoyce0510 commented Nov 16, 2022

Summary

In this PR, we add support for marking fields in a Timeseries aspect as @searchable. This allows us to easily index the most recent value of a particular field. We also add indexing to the following fields:

  • Dataset row + column count
  • Dataset last operation time
  • Dataset storage size in bytes

We do this by simply indexing the Searchable fields during normal MCL processing. On each new Timeseries aspect, the previous value will be overwritten.

What does this enable?

This allows us to easily filter by the fields in a Timeseries aspect.

Status

Ready for review. Validated everything manually as well.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Nov 16, 2022
entitySpec = _entityRegistry.getEntitySpec(event.getEntityType());
} catch (IllegalArgumentException e) {
log.error("Error while processing entity type {}: {}", event.getEntityType(), e.toString());
public void invoke(@Nonnull final MetadataChangeLog event) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly just organizing code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • adding javadocs.


// Step 0. If the aspect is timeseries, add to its timeseries index.
if (aspectSpec.isTimeseries()) {
updateTimeseriesFields(event.getEntityType(), event.getAspectName(), urn, aspect, aspectSpec,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the important part: Only update timeseries index if its a time series aspect. Do everything else either way.

@jjoyce0510 jjoyce0510 changed the title feat(platform): Support @Searchable Annotation for Timeseries Aspects feat(platform): Support @Searchable + @Relationship Annotations for Timeseries Aspects Nov 16, 2022
@github-actions
Copy link

github-actions bot commented Nov 16, 2022

Unit Test Results (metadata ingestion)

       8 files         8 suites   1h 3m 46s ⏱️
   765 tests    762 ✔️ 2 💤 1
1 532 runs  1 526 ✔️ 5 💤 1

For more details on these failures, see this check.

Results for commit fd93883.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Nov 16, 2022

Unit Test Results (build & test)

621 tests   - 1   617 ✔️  - 1   16m 0s ⏱️ -3s
157 suites ±0       4 💤 ±0 
157 files   ±0       0 ±0 

Results for commit fd93883. ± Comparison against base commit 1fe0f01.

This pull request removes 7 and adds 6 tests. Note that renamed tests count towards both.
com.datahub.authorization.RoleServiceTest ‑ testAssignRoleToActorDoesNotExist
com.datahub.authorization.RoleServiceTest ‑ testAssignRoleToActorExists
com.datahub.authorization.RoleServiceTest ‑ testRoleDoesNotExist
com.datahub.authorization.RoleServiceTest ‑ testRoleExists
com.linkedin.datahub.graphql.resolvers.role.BatchAssignRoleResolverTest ‑ testAllActorsExist
com.linkedin.datahub.graphql.resolvers.role.BatchAssignRoleResolverTest ‑ testRoleDoesNotExistFails
com.linkedin.datahub.graphql.resolvers.role.BatchAssignRoleResolverTest ‑ testSomeActorsExist
com.datahub.authorization.RoleServiceTest ‑ testAssignNullRoleToActorAllActorsExist
com.datahub.authorization.RoleServiceTest ‑ testBatchAssignRoleAllActorsExist
com.datahub.authorization.RoleServiceTest ‑ testBatchAssignRoleNoActorExists
com.datahub.authorization.RoleServiceTest ‑ testBatchAssignRoleSomeActorExists
com.linkedin.datahub.graphql.resolvers.role.BatchAssignRoleResolverTest ‑ testNotNullRole
com.linkedin.datahub.graphql.resolvers.role.BatchAssignRoleResolverTest ‑ testNullRole

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! good change but especially good code cleanup (the method before you made this change was a doozy)

Comment on lines 12 to 40
record DatasetProfile includes TimeseriesAspectBase {
/**
* The total number of rows
*/
@Searchable = {
"fieldType": "COUNT"
}
rowCount: optional long

/**
* The total number of columns (or schema fields)
*/
@Searchable = {
"fieldType": "COUNT"
}
columnCount: optional long

/**
* Profiles for each column (or schema field)
*/
fieldProfiles: optional array[DatasetFieldProfile]

/**
* Storage size in bytes
*/
@Searchable = {
"fieldType": "COUNT"
}
sizeInBytes: optional long
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it just the diff view or does the indentation of your additions seem off here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoa no this seems off

@shirshanka shirshanka added the platform PR-s that make changes to core parts of the platform label Nov 21, 2022
@jjoyce0510 jjoyce0510 merged commit 7136dd5 into datahub-project:master Nov 29, 2022
cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform PR-s that make changes to core parts of the platform product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants