-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dataset): improve glossary term load performance for datasets #6396
Conversation
Hey we are reviewing this. Will get back shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I'm actually going to request you make a pivot to place your change elsewhere as I think this can benefit the performance everywhere that we load glossary terms. thanks for digging into this!
nodes { | ||
urn | ||
type | ||
properties { | ||
name | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change here is amazing (so that we don't also fetch children
of nodes in the parentNodesFields
when we don't actually need to). in fact, so good that I think we should apply this everywhere!
In order to do that, I think you can actually drop this change and simply change the fragment parentNodesFields
from:
fragment parentNodesFields on ParentNodesResult {
count
nodes {
...glossaryNode
}
}
to:
fragment parentNodesFields on ParentNodesResult {
count
nodes {
urn
type
properties {
name
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then this performance change will benefit all entities and wherever we fetch parentNodes (on existing nodes as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea! I’m out of town this week so I don’t have access to my computer to make the change, but I can do so this weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay! i'm going to merge this PR once CI passes and then go and make the additional change right when it gets in, just cuz it'll be a nice simple fix.
Thanks again for putting this up!
Improves the performance of loading datasets by reducing the amount of information being fetched from the graph database. Data was being fetched that wasn't used and resulted in potentially hundreds of calls to the graph database. This issue is explained more in issue #6395.
The heavy use of fragments in the affected portion of the graphql query means that the problematic code (in the glossaryNode fragment) cannot be changed directly as this additional information is needed by other queries which use this fragment, namely getGlossaryNode(). Additionally, the glossaryNode fragment is four levels of abstraction away from the primary fragment of the getDataset() query (nonSiblingDatasetFields -> glossaryTerms -> glossaryTerm -> parentNodesFields -> glossaryNode). Instead of creating four new fragments for one change at the fourth layer, I combined them into a single new fragment which can then be inserted as a whole into the getDataset() query. If this is not preferred or if the fragment could be better named as something else, then I can make those changes.
Checklist