Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(models) : Joins (Datasets) Schema #7961

Closed
wants to merge 8 commits into from

Conversation

rtekal
Copy link
Contributor

@rtekal rtekal commented May 4, 2023

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

return _joinId;
}

public static JoinUrn createFromString(String rawUrn) throws URISyntaxException {

Check warning

Code scanning / QDJVMC

Method tries to override 'static' method of superclass

Method 'createFromString()' tries to override a static method of a superclass
import java.net.URISyntaxException;


public class JoinUrn extends Urn {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, why is this class needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that is totally fine, but why is there a JoinUrn? I don't see an Urn like this for any other EntityType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* Standardized join identifier.
*/
@java.class = "com.linkedin.common.urn.JoinUrn"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary to me. Can you explain the rationale here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see above comment.

/**
* All fields from dataset A that are required for the join, maps to bFields 1:1
*/
afield: SchemaFieldPath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this leftDatasetFields and rightDatasetFields, or something else that is more descriptive? I think these names are somewhat confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset A and Dataset B are symmetric. You can look at this Join from either dataset. If you look from Dataset A's perspective, your names left and right are ok. Where as if you look from Dataset B's perspective, the left and right are not correct.

"name": "joinA",
"entityTypes": [ "dataset" ]
}
datasetA: DatasetUrn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left and right seem more standard than A and B

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comment above.

@rtekal
Copy link
Contributor Author

rtekal commented May 31, 2023

We decided that since Joins are probably created by different groups, multiple Joins between two datasets will be different join entities.

I did all the changes, and added the resolvers for 'Get', 'CreateJoin' and 'UpdateJoin'.

Here are some samples:
updateJoin_gql.txt
queryJoin_gql.txt
createJoin_gql.txt

@@ -25,4 +25,5 @@ typeref Snapshot = union[
DataHubPolicySnapshot,
SchemaFieldSnapshot,
DataHubRetentionSnapshot,
JoinSnapshot,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do not need any Snapshots anymore but instead rely entirely on the entity-registry.yml file you've already updated. so you can remove this as well as JoinSnapshot.pdl and JoinAspect.pdl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was new to me. So had to do some validation; took time.
Updated.

Comment on lines 7 to 10
@Aspect = {
"name": "joins"
}
record Joins {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like you've added this aspect to any entities in entity-registry.yml - did you mean to add this to datasets? or what were your thoughts there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is not used. In my initial design, just like Ownership, I wanted to add Join aspect to the Dataset, but our team members suggested that we can use relationships annotations in the pdl. So, I decided not to use Joins.pdl. I will remove it.

@rtekal
Copy link
Contributor Author

rtekal commented Jun 16, 2023

Here is my plan:
1/ Just let me know if the model is reasonable - then
2/ I will demo my local to you of the UI changes
3/ Then I will also commit all the UI changes and then you can approve the PR if all is good.

@anshbansal anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jun 23, 2023
@rtekal rtekal closed this Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants