Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative venues to store JAR data outside Ipea server #226

Closed
rafapereirabr opened this issue Feb 6, 2022 · 9 comments
Closed

Alternative venues to store JAR data outside Ipea server #226

rafapereirabr opened this issue Feb 6, 2022 · 9 comments

Comments

@rafapereirabr
Copy link
Member

Context:

R5's JAR file is aprox. 45 MB and so it cannot be included inside the r5r package due to CRAN's policies, which limit package size to 5MB. The alternative we've be using so far is to store all R5's JAR files in our data server at Ipea.

Problem:

The positive aspects of using this server is that it is limitless, it's free and really simple to use/update with new versions. The downside, is that Ipea's server is not super stable. The server went offline for maintenance or other problems a few times in the past year. This is more frequent than we would like to.

Alternatives:

We have two alternatives.

  1. Stick with our current solution at Ipea, and discuss with our IT team what can be done to make the server more stable.
  2. Find an alternative data storage that hosts a large amount of data for free, and that's easy to use / update.

There are a number of data storage websites/services. I list some of them below, even though I haven't looked into their terms yet.

  • Amazon S3
  • Bitbucket
  • Dataverse
  • Dropbox
  • figshare
  • Google Drive
  • OSF:
    • Each project can store up to 50 GB for free. If necessary, we could create two or more projects.
  • OneDrive
  • ownCloud
  • Quilt
@mvpsaraiva
Copy link
Collaborator

My suggestion on how to fix this issue involves Conveyal, so I'm bringing @abyrd and @ansoncfit into the conversation.

I believe the best way to proceed is to download an 'official' R5 Jar built by Conveyal, from an official source. This would also help further decouple r5r and R5, and reassure our users that they are using a 'pure' version of R5 that has not been messed by anyone. Even though we agreed a long time ago that we would not make changes to R5's code just to accommodate r5r's requirements, the process is still subject to human error - my human error, specifically. For example, if I build R5 from the wrong commit by mistake, we would probably take a long time to notice if thats the cause of some bug.

So, my question for @abyrd and @ansoncfit: do you think this is a good way to proceed? If yes, should we download R5 from the packages released here?

@ansoncfit
Copy link

Hi @mvpsaraiva and @rafapereirabr, I'm just getting caught up here. It looks like the solution in 2e8b9ac is downloading from releases that Ipea tags in its own repo.

If you would prefer to use the Conveyal-built packages, you could try e.g. https://github.com/conveyal/r5/packages/433194?version=v6.6

BTW, our recently released v6.6 includes wider route_type support and a number of other improvements.

@mvpsaraiva
Copy link
Collaborator

Hi @ansoncfit. The solution we implemented today was the quickest and cleanest we could find, considering this week's "server stability crisis" we went through. We tried using the Conveyal-built packages that you linked, but we could not find a way to get a persistent link to the jar file so that it could be downloaded automatically from R. We'll look again in the future to try to make this work.

Thanks for letting us know about the new R5 release. We're working on a new version of r5r to be released soon, and we'll definitely upgrade R5 as well.

@ansoncfit
Copy link

Oh wow, the stable GitHub Package Registry links require authentication. There's extensive commentary on this, and a suggested workaround at https://github.community/t/download-from-github-package-registry-without-authentication/14407/111

Indeed, https://maven.pkg.github.com/conveyal/r5/com/conveyal/r5/v6.6/r5-v6.6-all.jar works but requires a token with an appropriate scope.

@abyrd
Copy link

abyrd commented Feb 15, 2022

Hi everyone, also just catching up on this. Yes, it's surprisingly hard to access packages on GHPR - you even have to jump through hoops to get one Maven project to use another project published to GHPR by the same organization as a dependency. We used to publish each build of R5 to AWS S3 and pass JARs between deploy steps using S3. I don't think we'd want to re-enable this for every build but it should be straightforward to do it only when building tags (releases). This S3 bucket was never regarded as a stable external service, so we'd need to make a decision to support that. I'll put it on the agenda for a meeting this week and we'll get back to you.

@rafapereirabr
Copy link
Member Author

Thank you @ansoncfit and @abyrd for your thoughts on this. The temporary solution we recently used was to share the R5 Jar file on a github relsease. In our case, we did this on a fork or the R5 repo but perhaps this is something you might consider doing on the original R5 repo?

@abyrd
Copy link

abyrd commented Feb 17, 2022

Hi @rafapereirabr, I just attached the JARs and associated MD5s to the release at https://github.com/conveyal/r5/releases/tag/v6.6. We plan to do this for future releases, either automatically in our builds or manually. It's a bit strange (and inefficient) to download the JARs from GH packages, then re-upload them via a browser to attach them to the release, but it gets the job done.

@rafapereirabr
Copy link
Member Author

This is great, @abyrd . Thank you ! This way we can point r5r to donwnload R5 JAR directly from your releases.

@rafapereirabr
Copy link
Member Author

This has been implemented in the dev version and will be on our next CRAN release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants