-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(models) : Joins (Datasets) Schema #7961
Conversation
import java.net.URISyntaxException; | ||
|
||
|
||
public class JoinUrn extends Urn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused, why is this class needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join will be a entity by itself.
This was proposed in the model:
https://docs.google.com/document/d/1JPtHbNEY-_E9ZwsCsFNPks35g1WVyWQEFd_K0fLF0wk/edit#heading=h.1t4gdnvyeais
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that is totally fine, but why is there a JoinUrn? I don't see an Urn like this for any other EntityType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check this directory:
https://github.com/datahub-project/datahub/tree/master/li-utils/src/main/javaPegasus/com/linkedin/common/urn
/** | ||
* Standardized join identifier. | ||
*/ | ||
@java.class = "com.linkedin.common.urn.JoinUrn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem necessary to me. Can you explain the rationale here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see above comment.
/** | ||
* All fields from dataset A that are required for the join, maps to bFields 1:1 | ||
*/ | ||
afield: SchemaFieldPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we call this leftDatasetFields
and rightDatasetFields
, or something else that is more descriptive? I think these names are somewhat confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataset A and Dataset B are symmetric. You can look at this Join from either dataset. If you look from Dataset A's perspective, your names left and right are ok. Where as if you look from Dataset B's perspective, the left and right are not correct.
"name": "joinA", | ||
"entityTypes": [ "dataset" ] | ||
} | ||
datasetA: DatasetUrn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left
and right
seem more standard than A
and B
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comment above.
We decided that since Joins are probably created by different groups, multiple Joins between two datasets will be different join entities. I did all the changes, and added the resolvers for 'Get', 'CreateJoin' and 'UpdateJoin'. Here are some samples: |
@@ -25,4 +25,5 @@ typeref Snapshot = union[ | |||
DataHubPolicySnapshot, | |||
SchemaFieldSnapshot, | |||
DataHubRetentionSnapshot, | |||
JoinSnapshot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do not need any Snapshots anymore but instead rely entirely on the entity-registry.yml
file you've already updated. so you can remove this as well as JoinSnapshot.pdl
and JoinAspect.pdl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was new to me. So had to do some validation; took time.
Updated.
@Aspect = { | ||
"name": "joins" | ||
} | ||
record Joins { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't look like you've added this aspect to any entities in entity-registry.yml - did you mean to add this to datasets? or what were your thoughts there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it is not used. In my initial design, just like Ownership, I wanted to add Join aspect to the Dataset, but our team members suggested that we can use relationships annotations in the pdl. So, I decided not to use Joins.pdl. I will remove it.
Here is my plan: |
Checklist