Skip to content

Documentation and example code for an index of SNVs, kmers, and MLST (genomics-data-index).

License

Notifications You must be signed in to change notification settings

apetkau/genomics-data-index-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genomics data index examples

Binder

This repository contains examples for the genomics-data-index project, which is a system which can index large amounts of genomics data and enable rapid querying of this data.

Indexing breaks genomes up into individual features (nucleotide mutations, kmers, or genes/MLST) and stores the index in a directory which can easily be shared with other people. Indexes can be generated direct from sequence data or loaded from existing intermediate files (e.g., VCF files).

# Index features in VCF files listed in vcf-files.txt
gdi load vcf vcf-files.txt

Querying provides both a Python API and Command-line interface to select sets of samples using this index or attached external data (e.g., phylogenetic trees or DataFrames of metadata).

# Select samples with a 26568 C > A mutation
r = s.hasa('MN996528.1:26568:C:A')

Tutorial

Tutorials and a demonstration of the genomics-data-index software are available below. You can select the [launch | binder] badge to launch each of these tutorials in an interactive Jupyter environment within the cloud environment using Binder.

  1. Tutorial 1: Querying (Salmonella) - Binder
    • In case GitHub link is not rendering try here
  2. Tutorial 2: Indexing assemblies (SARS-CoV-2) - Binder
    • In case GitHub link is not rendering try here
  3. Tutorial 3: Querying overview - Binder
    • In case GitHub link is not rendering try here

Alternatively, you can run these tutorials on your local machine. In order to run these tutorials you will first have to install the genomics-data-index software (see the Installation section for details). In addition, you will have to install Jupyter Lab. If you have already installed the genomics-data-index software with conda you can install Jupyter Lab as follows:

conda activate gdi
conda install jupyterlab

To run Jupyter you can run the following:

jupyter lab

Please see the instructions for Jupyter Lab for details on using Jupyter.

About

Documentation and example code for an index of SNVs, kmers, and MLST (genomics-data-index).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages