Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace ZipInputStream with ZipFile #10899

Merged
merged 8 commits into from
Oct 25, 2024

Conversation

jo-pol
Copy link
Contributor

@jo-pol jo-pol commented Oct 1, 2024

What this PR does / why we need it:

It will no longer fail to upload individual files from a zip downloaded from an ownCloud service.
See issue for further details.

Which issue(s) this PR closes:

Special notes for your reviewer:

You will get less differences when ignoring white space changes:
https://github.com/IQSS/dataverse/compare/develop...DANS-KNAW-jp:dataverse:10898-own-cloud-zips?w=1

Replaced ZipInputStream with ZipFile in:

  • CreateNewDataFilesCommand
  • ShapefileHandler (abandoned ShapefileHandler constructor with FileInputStream to allow the use of ZipFile)

Additional

  • for the new iteration method over zip entries in CreateNewDataFilesCommand: extracted methods filteredZipEntries, openZipFile, getShortName and isFileToSkip The extracted code is slightly different from the ShapeFileHandler.isFileToSkip but changing behavior is beyond the scope of the issue.
  • introduced some unit tests for CreateNewDataFilesCommand or should I call it an integragtion test for the changed classes.
  • to allow (or at least simplify) the new unit test, FileUtil.determineFileType catches Bag exceptions The method will now return application/zip rather than throw. Without catching I would need complex mocking to get a BagitFileHandler via CDI just to test the rest.

Suggestions on how to test this:

see issue

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

updated from comment:
Unzip during upload now supports more variations of the zip format, including the zip files generated by ownCloud.

Additional documentation:

@pdurbin pdurbin added the Size: 10 A percentage of a sprint. 7 hours. label Oct 1, 2024
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'd suggest adding a one-line release note, e.g. Unzip during upload now supports more variations of the zip format, including the zip files generated by ownCloud.

@qqmyers qqmyers added Size: 3 A percentage of a sprint. 2.1 hours. and removed Size: 10 A percentage of a sprint. 7 hours. labels Oct 1, 2024
@cmbz cmbz added the FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) label Oct 2, 2024
PaulBoon added a commit to DANS-KNAW/dataverse that referenced this pull request Oct 8, 2024
@cmbz cmbz added the FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) label Oct 9, 2024
@pdurbin pdurbin added the Type: Bug a defect label Oct 9, 2024
@stevenwinship
Copy link
Contributor

Test is failing:

execute_rezips_sets_of_shape_files_from_uploaded_zip – edu.harvard.iq.dataverse.engine.command.impl.CreateNewDataFilesTest

*/
var nrOfZipFiles = 20;
var avgNrOfFilesPerZip = 300;
var avgFileLength = 5000;
Copy link
Contributor Author

@jo-pol jo-pol Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results of this relatively small stress test executed with the old an new implementation:

ZipFile       :: Total time: 150,909ms; nr of zips 20 total nr of files 6,404; total file size 32,030,650
ZipInputStream:: Total time: 148,432ms; nr of zips 20 total nr of files 6,211; total file size 31,570,383

# Conflicts:
#	src/main/java/edu/harvard/iq/dataverse/util/ShapefileHandler.java
@coveralls
Copy link

coveralls commented Oct 21, 2024

Coverage Status

coverage: 21.191% (+0.3%) from 20.87%
when pulling aeb6d2a on DANS-KNAW-jp:10898-own-cloud-zips
into f970ab3 on IQSS:develop.

Intellij shows directory labels like
ewDataFilesTest/tmp/temp/shp_2024-10-22-01-57-21-833/dataDir/extra
possibly different environments have different values
@cmbz cmbz added the FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) label Oct 23, 2024
@ofahimIQSS ofahimIQSS assigned ofahimIQSS and unassigned ofahimIQSS Oct 25, 2024
@ofahimIQSS
Copy link
Contributor

Fix looks good.
Testing of 10899.docx

@ofahimIQSS ofahimIQSS merged commit d09e509 into IQSS:develop Oct 25, 2024
10 of 11 checks passed
@pdurbin pdurbin added this to the 6.5 milestone Oct 25, 2024
@jo-pol jo-pol deleted the 10898-own-cloud-zips branch February 3, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) Size: 3 A percentage of a sprint. 2.1 hours. Type: Bug a defect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zip files created with an own cloud service are ingested as is
7 participants