fix(dataset): improve glossary term load performance for datasets #6396

Reilman79 · 2022-11-09T20:54:00Z

Improves the performance of loading datasets by reducing the amount of information being fetched from the graph database. Data was being fetched that wasn't used and resulted in potentially hundreds of calls to the graph database. This issue is explained more in issue #6395.

The heavy use of fragments in the affected portion of the graphql query means that the problematic code (in the glossaryNode fragment) cannot be changed directly as this additional information is needed by other queries which use this fragment, namely getGlossaryNode(). Additionally, the glossaryNode fragment is four levels of abstraction away from the primary fragment of the getDataset() query (nonSiblingDatasetFields -> glossaryTerms -> glossaryTerm -> parentNodesFields -> glossaryNode). Instead of creating four new fragments for one change at the fourth layer, I combined them into a single new fragment which can then be inserted as a whole into the getDataset() query. If this is not preferred or if the fragment could be better named as something else, then I can make those changes.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

jjoyce0510 · 2022-11-15T01:06:37Z

Hey we are reviewing this. Will get back shortly.

chriscollins3456

This looks great! I'm actually going to request you make a pivot to place your change elsewhere as I think this can benefit the performance everywhere that we load glossary terms. thanks for digging into this!

chriscollins3456 · 2022-11-15T01:44:13Z

datahub-web-react/src/graphql/dataset.graphql

+                nodes {
+                    urn
+                    type
+                    properties {
+                        name
+                    }
+                }
+            }


the change here is amazing (so that we don't also fetch children of nodes in the parentNodesFields when we don't actually need to). in fact, so good that I think we should apply this everywhere!

In order to do that, I think you can actually drop this change and simply change the fragment parentNodesFields from:

fragment parentNodesFields on ParentNodesResult { count nodes { ...glossaryNode } }

to:

fragment parentNodesFields on ParentNodesResult { count nodes { urn type properties { name } } }

then this performance change will benefit all entities and wherever we fetch parentNodes (on existing nodes as well)

This is awesome!

This is a great idea! I’m out of town this week so I don’t have access to my computer to make the change, but I can do so this weekend.

okay! i'm going to merge this PR once CI passes and then go and make the additional change right when it gets in, just cuz it'll be a nice simple fix.

Thanks again for putting this up!

github-actions · 2022-11-15T18:25:44Z

Unit Test Results (build & test)

613 tests ±0 609 ✔️ ±0 11m 53s ⏱️ -8s
151 suites ±0     4 💤 ±0
151 files ±0     0 ❌ ±0

Results for commit 8c2dc02. ± Comparison against base commit ef5c712.

…tahub-project#6396)

Reilman79 added 2 commits November 9, 2022 14:58

fix(dataset): improve glossary term load performance for fields

bb36c80

fix(dataset): improve glossary term load performance for datasets

8c2dc02

github-actions bot added the product PR or Issue related to the DataHub UI/UX label Nov 9, 2022

maggiehays added the community-contribution PR or Issue raised by member(s) of DataHub Community label Nov 14, 2022

aditya-radhakrishnan requested a review from chriscollins3456 November 15, 2022 00:13

jjoyce0510 requested a review from gabe-lyons November 15, 2022 01:06

chriscollins3456 reviewed Nov 15, 2022

View reviewed changes

chriscollins3456 approved these changes Nov 15, 2022

View reviewed changes

chriscollins3456 merged commit 6e415ca into datahub-project:master Nov 15, 2022

chriscollins3456 mentioned this pull request Nov 15, 2022

fix(ui) Fix parentNodes overfetching everywhere it's used #6446

Merged

5 tasks

cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023

fix(dataset): improve glossary term load performance for datasets (da…

6f59820

…tahub-project#6396)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dataset): improve glossary term load performance for datasets #6396

fix(dataset): improve glossary term load performance for datasets #6396

Reilman79 commented Nov 9, 2022

jjoyce0510 commented Nov 15, 2022

chriscollins3456 left a comment

chriscollins3456 Nov 15, 2022

chriscollins3456 Nov 15, 2022

aditya-radhakrishnan Nov 15, 2022

Reilman79 Nov 15, 2022

chriscollins3456 Nov 15, 2022

github-actions bot commented Nov 15, 2022

fix(dataset): improve glossary term load performance for datasets #6396

fix(dataset): improve glossary term load performance for datasets #6396

Conversation

Reilman79 commented Nov 9, 2022

Checklist

jjoyce0510 commented Nov 15, 2022

chriscollins3456 left a comment

Choose a reason for hiding this comment

chriscollins3456 Nov 15, 2022

Choose a reason for hiding this comment

chriscollins3456 Nov 15, 2022

Choose a reason for hiding this comment

aditya-radhakrishnan Nov 15, 2022

Choose a reason for hiding this comment

Reilman79 Nov 15, 2022

Choose a reason for hiding this comment

chriscollins3456 Nov 15, 2022

Choose a reason for hiding this comment

github-actions bot commented Nov 15, 2022

Unit Test Results (build & test)