Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gms): add ingestProposalBatch endpoint #10706

Merged
merged 2 commits into from
Jun 14, 2024
Merged

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Jun 14, 2024

In my local test of emitting 1000 MCPs, it was approximately 7x faster to emit MCPs in batches of 100 instead of emitting one at a time.

This will need to be an opt-in mechanism in the rest sink until the backend GMS API has been released and stabilized. I have not built that yet.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

hsheth2 added 2 commits June 13, 2024 22:17
In my local tests of emitting 1000 MCPs, it was approximately 7x faster
to emit MCPs in batches of 100 instead of emitting one at a time.

This will need to be an opt-in mechanism in the rest sink until the backend
GMS API has been released and stabilized. I have not built that yet.
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata devops PR or Issue related to DataHub backend & deployment labels Jun 14, 2024
return resultUrn.toString();

// TODO: We don't actually use this return value anywhere. Maybe we should just stop returning it altogether?
return "success";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it would be better to return the list of urns that you successfully ingested in the response here to keep in line with what was happening previously, but it is unlikely to cause issues. Just returning "success" does not provide any information beyond what the status code already provides.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo returning the urn also doesn't provide much information either - only the status code is important. Ingestion also only looks at the status code, and not the body

I did this for simplicity - I wanted to keep the return type Task<String> for both methods so that the code would be less complex. I can refactor it if you think it's important, but imo we don't gain anything by doing it

@hsheth2 hsheth2 merged commit 402bf31 into master Jun 14, 2024
59 of 61 checks passed
@hsheth2 hsheth2 deleted the ingest-proposal-batch branch June 14, 2024 19:30
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants