Skip to content
cziegenhain edited this page Apr 22, 2019 · 12 revisions

zUMIs' parent output directory contains following files:

This is an intermediate file where all the cell barcodes with total number of reads are listed

<Project_name>.BCstats.txt
  • Log files generated by STAR and modified yaml file with parameters inferred from the input
<Project_name>.*.out
<Project_name>.postmap.yaml
  • BAM files generated by every module (Filtering, Mapping and Counting)
# filtered, unmapped BAM file tagged by cell barcode (BC) and UMI (UB)
will be found temporarily in zUMIs_output/.tmpMerge/

# filtered, STAR mapped BAM file with retained BC and UB tags
<Project_name>.filtered.tagged.Aligned.out.bam

# filtered, STAR mapped BAM file tagged by featureCounts with assignment to exonic regions
<Project_name>.filtered.tagged.Aligned.out.bam.ex.featureCounts.bam

# filtered, STAR mapped BAM file tagged by featureCounts with assignment to exonic regions
<Project_name>.filtered.tagged.Aligned.out.bam.in.featureCounts.bam

The final output is structured in two subdirectories:

zUMIs_output/expression
zUMIs_output/stats

The retained cell barcodes with total number of reads are listed in zUMIs_output/<Project_name>kept_barcodes.txt

  • "expression" contains .rds files
    • list of count matrices as sparseMatrix (*.dgecounts.rds)
  • "stats" contains plots and data files with descriptive statistics

Structure of the output dgecounts object in <project_name>.dgecounts.rds

zUMIs produces dge output in .rds format that can be read in R with the following command.

AllCounts <- readRDS("zUMIs_output/expression/example.dgecounts.rds")
names(AllCounts)
[1] "umicount"  "readcount"

names(AllCounts$umicount)
[1] "exon"   "inex"   "intron"

names(AllCounts$umicount$exon)
[1] "all"          "downsampling"

AllCounts is a list of lists with all the count matrices as sparseMatrix. The parent list contains UMI and read count quantification. In each of these counting types, you will find the three feature types (introns,exons and intron+exon). Each of those contain a sparseMatrix generating using all reads observed and a list for the downsampling sizes requested. Each of the downsampling list elements is also a sparseMatrix.

The sparseMatrix can be converted into a conventional count table using "as.matrix" function in R and saved as a text file using the code below.

#Feature exons umicounts table
dge <- as.matrix(AllCounts$umicount$exon$all)
write.table(dge,"exons.umicounts.txt",quote=F,sep="\t")

NOTE: It is highly recommended to retain sparseMatrix format for the downstream analysis, especially when you have >20,000 cells to save time and space.

Loading zUMIs RDS output in Python

To load zUMIs output as a pandas object, please find code below Thanks to @adamgayoso for this!

import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
from rpy2.robjects.packages import STAP

def zumis_output2pandas(filename):
    mfunc = 'to_df <- function(dobj){return(as.data.frame(as.matrix(dobj)))}'
    rsparse2pandas = STAP(mfunc, "to_df")

    readRDS = robjects.r['readRDS']
    zumis_data = readRDS(filename)
    zd = dict(zip(zumis_data.names, list(zumis_data)))
    for key in ['umicount', 'readcount']:
        zd[key] = dict(zip(zd[key].names, list(zd[key])))
        zd[key]['exon'] = dict(zip(zd[key]['exon'].names, list(zd[key]['exon'])))
        zd[key]['exon']['all'] = rsparse2pandas.to_df(zd[key]['exon']['all']).transpose()
        zd[key]['exon']['downsampling'] = dict(zip(zd[key]['exon']['downsampling'].names, list(zd[key]['exon']['downsampling'])))
        for k in zd[key]['exon']['downsampling'].keys():
            zd[key]['exon']['downsampling'][k] = rsparse2pandas.to_df(zd[key]['exon']['downsampling'][k]).transpose()
    return zd
Clone this wiki locally