Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database update process #74

Open
multimeric opened this issue Feb 16, 2023 · 3 comments
Open

Database update process #74

multimeric opened this issue Feb 16, 2023 · 3 comments

Comments

@multimeric
Copy link
Collaborator

multimeric commented Feb 16, 2023

Count Data Update

  1. The data in nectar is deleted using swift delete harmonised-human-atlas
  2. The new data is uploaded using:
    swift upload harmonised-human-atlas /vast/projects/cellxgene_curated/splitted_DB2_data_0.2 --object-name original --segment-size 5000000000
    swift upload harmonised-human-atlas /vast/projects/cellxgene_curated/splitted_DB2_data_scaled_0.2 --object-name cpm --segment-size 5000000000
  3. The REMOTE_URL is updated in the R package
  4. Local cache needs to be given appropriate permissions:
    chmod --recursive a+rX /vast/projects/cellxgene_curated/splitted_DB2_data_scaled_0.2 /vast/projects/cellxgene_curated/splitted_DB2_data_0.2

Metadata File Update

  1. The old metadata is deleted using swift delete metadata
  2. The new metadata is uploaded using swift upload metadata /vast/projects/RCP/human_cell_atlas/metadata.0.2.2.parquet --object-name metadata.0.2.2.parquet
  3. The default remote_url for the metadata is updated to this new path
@stemangiola
Copy link
Owner

Fabulous, for now, is good.

We should think about future updates, especially regarding the data. What disruption happens in the update process?

  • downtime for users
  • old versions of the API stop working because pointing to non-existent files
  • ... other

@multimeric
Copy link
Collaborator Author

There will be downtime no matter what, because we can't keep old data. What I might do is add a message to the user if the download fails to check if they have the latest R package version, because it might mean that we have updated and they haven't.

@stemangiola
Copy link
Owner

Yes, but also in future we might use Anndata for everything, doubling our capacity, and if the API is successful ask on top for 5x resources that can include 2, 3 DB versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants