Genomics data index examples

This repository contains examples for the genomics-data-index project, which is a system which can index large amounts of genomics data and enable rapid querying of this data.

Indexing breaks genomes up into individual features (nucleotide mutations, kmers, or genes/MLST) and stores the index in a directory which can easily be shared with other people. Indexes can be generated direct from sequence data or loaded from existing intermediate files (e.g., VCF files).

# Index features in VCF files listed in vcf-files.txt
gdi load vcf vcf-files.txt

Querying provides both a Python API and Command-line interface to select sets of samples using this index or attached external data (e.g., phylogenetic trees or DataFrames of metadata).

# Select samples with a 26568 C > A mutation
r = s.hasa('MN996528.1:26568:C:A')

Tutorial

Tutorials and a demonstration of the genomics-data-index software are available below. You can select the [launch | binder] badge to launch each of these tutorials in an interactive Jupyter environment within the cloud environment using Binder.

Tutorial 1: Querying (Salmonella) -
- In case GitHub link is not rendering try here
Tutorial 2: Indexing assemblies (SARS-CoV-2) -
- In case GitHub link is not rendering try here
Tutorial 3: Querying overview -
- In case GitHub link is not rendering try here

Alternatively, you can run these tutorials on your local machine. In order to run these tutorials you will first have to install the genomics-data-index software (see the Installation section for details). In addition, you will have to install Jupyter Lab. If you have already installed the genomics-data-index software with conda you can install Jupyter Lab as follows:

conda activate gdi
conda install jupyterlab

To run Jupyter you can run the following:

jupyter lab

Please see the instructions for Jupyter Lab for details on using Jupyter.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
binder		binder
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomics data index examples

Tutorial

About

Releases

Packages

Languages

License

apetkau/genomics-data-index-examples

Folders and files

Latest commit

History

Repository files navigation

Genomics data index examples

Tutorial

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages