Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-part data corpora downloads #677

Merged
merged 1 commit into from
Oct 5, 2024

Conversation

gkamat
Copy link
Collaborator

@gkamat gkamat commented Oct 4, 2024

Description

Permits data corpus files to be downloaded in parts. This is not intended for performance, but rather, to work around the restriction on file size that services like CloudFront might have.

Issues Resolved

#543

Testing

Lint, unit and integ tests. Added a unit test to exercise the feature.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Should we open an issue in the workloads repository for splitting the 1 TB file in Big5 workload?

@gkamat
Copy link
Collaborator Author

gkamat commented Oct 5, 2024

This looks good. Should we open an issue in the workloads repository for splitting the 1 TB file in Big5 workload?

Will just make the change directly.

@gkamat gkamat merged commit 4609bfd into opensearch-project:main Oct 5, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants