Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry upload if it failed? #9696

Open
alejandratenorio opened this issue Jul 5, 2023 · 4 comments
Open

retry upload if it failed? #9696

alejandratenorio opened this issue Jul 5, 2023 · 4 comments
Labels
Feature: File Upload & Handling Type: Feature a feature request User Role: Depositor Creates datasets, uploads data, etc.

Comments

@alejandratenorio
Copy link
Contributor

Dear Dataverse Support,

What steps does it take to reproduce the issue?
You start uploading a large file, then your network goes down for a short time.

  • When does this issue occur?
    When your network goes down

  • Which page(s) does it occurs on?
    Upload with HTTP via your browser

  • What happens?
    Your upload fails and you need to start over.

  • To whom does it occur (all users, curators, superusers)?
    All users

  • What did you expect to happen?
    Is it possible to set up an upload retry like other File Transfer Tools such as SFTP or Filezilla?

  • Which version of Dataverse are you using?**
    5.10

  • Any related open or closed issues to this bug report?**

image

No matter the issue, screenshots are always welcome.

To add a screenshot, please use one of the following formats and/or methods described here:

@pdurbin
Copy link
Member

pdurbin commented Jul 12, 2023

@alejandratenorio hi! We don't have a great solution for restarting file upload. We did add support for rsync but we're probably going to remove it or at least deprecate it:

Do you happen to store your files on S3? I'm asking because there's a feature we call S3 direct upload where the files travel from the user's computer directly to S3 instead of passing through Dataverse.

@alejandratenorio
Copy link
Contributor Author

Hi @pdurbin,

Thanks for your response. Yes, we store on S3 and have enabled S3 direct upload. We upload many files simultaneously per dataset and usually don't have problems. But when our network goes down, we have to restart the upload manually.

We will be upgrading our Dataverse soon, We have an increasing need to upload more and more files per dataset and would use rsync as a solution to upload it, but if it were to be removed, what other tool could we use to facilitate file uploads?

Thanks,

@pdurbin
Copy link
Member

pdurbin commented Jul 12, 2023

Hmm. Another option might be Globus.

I checked with the team and @qqmyers had this to say (thanks, Jim):

"Globus does do retries, not sure when it does partial retries (not resending bytes that made it)."

Here's a handy link to the docs: https://guides.dataverse.org/en/5.13/developers/big-data-support.html#globus-file-transfer

Another workaround might be to keep the files as zips. But there are tradeoffs. 😬

@ErykKul was recently talking about a dataset with thousands of files in #9558. Maybe he has some thoughts.

You could also ask at https://groups.google.com/g/dataverse-community of course! 😄

@qqmyers
Copy link
Member

qqmyers commented Jul 12, 2023

FWIW: If the issue is failures where whole files have been uploaded (and not partial files), a tool like the DVUploader might help - it can be run repeatedly to upload n files at time (versus trying to upload all files in a long list at once). If that works, it may not be too hard to add a similar limit in the UI direct upload and dvwebloader plugin. (Those both try to register all files with Dataverse at once since that is most efficient, but they could be changed to push every n files. This could raise the issue of trying to explain partial successes in the UI - the dvwebloader might be better there since it already can detect and show when files on your disk already exist in the dataset.) In any case, those types of changes would require some programming work, but the DVUploader could be scripted now/with the current Dataverse release, etc.

@pdurbin pdurbin added Type: Feature a feature request Feature: File Upload & Handling User Role: Depositor Creates datasets, uploads data, etc. labels Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: File Upload & Handling Type: Feature a feature request User Role: Depositor Creates datasets, uploads data, etc.
Projects
None yet
Development

No branches or pull requests

3 participants