Sphetcher

A software package for sampling massive singe-cell RNA secquencing datasets based on the spherical thresholding algorithm. It selects a small subset of cells referred to as sketch that evenly cover the transcriptomic space occupied by the original dataset. Such a sketch can accelerate downstream analyses and highlight rare cell types.

Installation

A compiler that supports C++11 is needed to build sphetcher. You can download and compile the latest code from github as follows:

git clone  https://github.com/canzarlab/Sphetcher
cd src
make

Running Sphetcher

To begin

First you need to provide a matrix where rows are samples (cells) and columns are features (genes, transcripts, principal components PCs).

Additionally you can provide the prior information (e.g, cell label, collection time point) in case you want to preserve certain number of samples from each category.

An example of inputs is provided in the directory /data.

Usage

Once you have compiled Sphetcher it can be run easily with one of the following two options:

sphetcher expression_matrix.csv sketch_size sketch_indicator_output.csv

or

sphetcher expression_matrix.csv sketch_size class_labels.csv l_min sketch_indicator_output.csv

For an example provided in /data

sphetcher zeisel_pca.csv 1000 sketch_indicator_output.csv
or 
sphetcher zeisel_pca.csv 1000 zeisel_pca_labels.csv 3 sketch_indicator_output.csv

Input/Output formats

Input:

expression_matrix.csv : expression matrix in comma-separated values (CSV) format: rows are cells, columns are features.
sketch_size : number of samples to obtain from the data set.
class_labels.csv : prior information stored in a column vector, each class is presented by an integer between 1 and K, where K is the number of classes.
l_min : minimum number of representatives we want to sample from each class

Output:

sketch_indicator_output.csv : an indicator vector of n samples where 1 indicates the sample is in the sketch, 0 otherwise (there are sketch_size 1's in the vector).

Sketches of large single cell datasets

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
data		data
img		img
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sphetcher

Installation

Running Sphetcher

To begin

Usage

Input/Output formats

Sketches of large single cell datasets

Adult mouse brain cells from Saunders et al. (2018) (665,858 cells)

Mouse organogenesis cell atlas (MOCA) from Cao et al. (2019) (2,026,641 cells)

Mouse nervous system from Zeisel et al. (2018) (465,281 cells)

About

Releases

Packages

Contributors 2

Languages

canzarlab/Sphetcher

Folders and files

Latest commit

History

Repository files navigation

Sphetcher

Installation

Running Sphetcher

To begin

Usage

Input/Output formats

Sketches of large single cell datasets

Adult mouse brain cells from Saunders et al. (2018) (665,858 cells)

Mouse organogenesis cell atlas (MOCA) from Cao et al. (2019) (2,026,641 cells)

Mouse nervous system from Zeisel et al. (2018) (465,281 cells)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages