Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tags instead of branches for different versions #2

Open
cthoyt opened this issue Oct 11, 2021 · 3 comments
Open

Use tags instead of branches for different versions #2

cthoyt opened this issue Oct 11, 2021 · 3 comments

Comments

@cthoyt
Copy link

cthoyt commented Oct 11, 2021

Effectively, tags are kind of like branches, but GitHub has much deeper support for tags/releases. Additionally, you could hook this up to Zenodo to automatically provide an archived backup for each if you used tags.

@dhimmel
Copy link
Member

dhimmel commented Oct 11, 2021

Elaborate on your envisioned design. Are you suggesting having a single output branch where tags are attached to the latest commit building a specific release? Or continue with the current branches but just add tags to their heads and then configure Zenodo?

Things to consider:

  1. when we add new features, we may want to rebuild existing releases. Therefore, we might need to force reattach tags. Would this break Zenodo?
  2. it's nice to be able to see a diff between multiple outputs for the same ensembl release

Another question is where should we store the data? Currently it's tracked in git. My preference would be to track with Git LFS, but GitHub's billing makes that prohibitive. I want to avoid using something like GCS since that might be a barrier for some users, costs money, and makes things less self contained. At some point, we may exceed the github repo size recommendation with the current design:

We recommend repositories remain small, ideally less than 1 GB, and less than 5 GB is strongly recommended.

Releases can have attached files, and this looks like a hack for free storage:

Each file included in a release must be under 2 GB. There is no limit on the total size of a release, nor bandwidth usage.

But there is something nice about having the data as part of the repo... less centralization risk.

Do you have any repos where you use CI to deploy tags / releases that would be good references?

@cthoyt
Copy link
Author

cthoyt commented Oct 11, 2021

Elaborate on your envisioned design. Are you suggesting having a single output branch where tags are attached to the latest commit building a specific release? Or continue with the current branches but just add tags to their heads and then configure Zenodo?

Things to consider:

1. when we add new features, we may want to rebuild existing releases. Therefore, we might need to force reattach tags. Would this break Zenodo?

2. it's nice to be able to see a diff between multiple outputs for the same ensembl release

When I've done this, I keep a single main branch and update it only on data changes. This would be a pretty big issue for 1). If you want to be able to rebuild existing releases, I guess you have to maintain different code to go with each considering the schema might change. Regarding 2) I think you can still diff between different tags, so this would work in both scenarios.

Another question is where should we store the data? Currently it's tracked in git. My preference would be to track with Git LFS, but GitHub's billing makes that prohibitive. I want to avoid using something like GCS since that might be a barrier for some users, costs money, and makes things less self contained. At some point, we may exceed the github repo size recommendation with the current design:

We recommend repositories remain small, ideally less than 1 GB, and less than 5 GB is strongly recommended.

Releases can have attached files, and this looks like a hack for free storage:

Each file included in a release must be under 2 GB. There is no limit on the total size of a release, nor bandwidth usage.

But there is something nice about having the data as part of the repo... less centralization risk.

Agreed but like you mentioned, there are some practicality issues with storing bigger files. I've been pretty happy with using Zenodo. Its a bit of a pain to automate uploading new files and updating old records. I wrote a wrapper around their API https://github.com/cthoyt/zenodo-client/ to take care of some of that, but it relies on being able to persist some configuration, which would be sort antithetical to my other idea to automate everything 😅

Do you have any repos where you use CI to deploy tags / releases that would be good references?

Yes, see https://github.com/biopragmatics/bioregistry/blob/main/.github/workflows/update.yml. I hacked into the tox environment called bumpversion-release to automatically make a tag. I guess this can also be done with a GitHub action directly too.

@dhimmel
Copy link
Member

dhimmel commented Oct 11, 2021

When I've done this, I keep a single main branch and update it only on data changes.

I'd rather have just source code on main and a distinct branch with output, since it's nice to keep an unbloated history of just the code. I see the benefit to everyone on main branch so Zenodo can ingest the entire codebase and dataset together.

If you want to be able to rebuild existing releases, I guess you have to maintain different code to go with each considering the schema might change.

I envision that the codebase will only support a single ensembl DB schema version, such that you cannot rebuild old releases that had a different upstream schema. Would accomplish this by adding a min_release check once we encounter a change that breaks the old code.

We could also add patch numbers to our tags, like 104-rs1. Versions with patches would go to Zenodo. The patchless tag like 104 could get reattached and wouldn't go to Zenodo.

I'm leaning towards branches + patchless tags + patch tags. Although the main benefit to this appears to be ability to upload to Zenodo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants