feat(gms): add ingestProposalBatch endpoint #10706

hsheth2 · 2024-06-14T05:23:01Z

In my local test of emitting 1000 MCPs, it was approximately 7x faster to emit MCPs in batches of 100 instead of emitting one at a time.

This will need to be an opt-in mechanism in the rest sink until the backend GMS API has been released and stabilized. I have not built that yet.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

In my local tests of emitting 1000 MCPs, it was approximately 7x faster to emit MCPs in batches of 100 instead of emitting one at a time. This will need to be an opt-in mechanism in the rest sink until the backend GMS API has been released and stabilized. I have not built that yet.

RyanHolstien · 2024-06-14T16:03:48Z

...restli-servlet-impl/src/main/java/com/linkedin/metadata/resources/entity/AspectResource.java

-            return resultUrn.toString();
+
+            // TODO: We don't actually use this return value anywhere. Maybe we should just stop returning it altogether?
+            return "success";


I do think it would be better to return the list of urns that you successfully ingested in the response here to keep in line with what was happening previously, but it is unlikely to cause issues. Just returning "success" does not provide any information beyond what the status code already provides.

imo returning the urn also doesn't provide much information either - only the status code is important. Ingestion also only looks at the status code, and not the body

I did this for simplicity - I wanted to keep the return type Task<String> for both methods so that the code would be less complex. I can refactor it if you think it's important, but imo we don't gain anything by doing it

hsheth2 added 2 commits June 13, 2024 22:17

tweak

b532434

github-actions bot added ingestion PR or Issue related to the ingestion of metadata devops PR or Issue related to DataHub backend & deployment labels Jun 14, 2024

vercel bot deployed to Preview June 14, 2024 05:39 View deployment

david-leifker approved these changes Jun 14, 2024

View reviewed changes

RyanHolstien reviewed Jun 14, 2024

View reviewed changes

RyanHolstien approved these changes Jun 14, 2024

View reviewed changes

hsheth2 merged commit 402bf31 into master Jun 14, 2024
59 of 61 checks passed

hsheth2 deleted the ingest-proposal-batch branch June 14, 2024 19:30

sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024

feat(gms): add ingestProposalBatch endpoint (datahub-project#10706)

d6395dc

yoonhyejin pushed a commit that referenced this pull request Jul 16, 2024

feat(gms): add ingestProposalBatch endpoint (#10706)

e91f3dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gms): add ingestProposalBatch endpoint #10706

feat(gms): add ingestProposalBatch endpoint #10706

hsheth2 commented Jun 14, 2024 •

edited

Loading

RyanHolstien Jun 14, 2024

hsheth2 Jun 14, 2024

feat(gms): add ingestProposalBatch endpoint #10706

feat(gms): add ingestProposalBatch endpoint #10706

Conversation

hsheth2 commented Jun 14, 2024 • edited Loading

Checklist

RyanHolstien Jun 14, 2024

Choose a reason for hiding this comment

hsheth2 Jun 14, 2024

Choose a reason for hiding this comment

hsheth2 commented Jun 14, 2024 •

edited

Loading