Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev - GEX QC and Aggregate, BAM, and Clean-Up #2

Merged
merged 18 commits into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
66c6872
Minor updates to summary file scripts changing python version and add…
chenv3 Oct 11, 2023
964166c
Add script to create CSV file to run GEX aggregate
chenv3 Oct 11, 2023
ea9dedb
Add filter flag to pipeline and handle it in Snakefile
chenv3 Oct 11, 2023
0e82724
Add aggregate and initial QC to GEX pipeline; add R script for initia…
chenv3 Oct 11, 2023
e5dda37
Add report for initial GEX sample QC
chenv3 Oct 12, 2023
383fbcd
Add cell filter summary for GEX samples
chenv3 Oct 12, 2023
c61ddab
Add cluster configuration for GEX seuratQC rule
chenv3 Oct 18, 2023
4d434d0
Extract tmpdir config parameter in Snakefile; Change how the GEX Seur…
chenv3 Oct 24, 2023
0993ee0
Add aggregate and create-bam flags to pipeline wrapper; add support f…
chenv3 Oct 25, 2023
debd415
Add cleanup rules for GEX pipeline
chenv3 Oct 25, 2023
3da4d7c
Add eample of filter flag in help documentation
chenv3 Oct 25, 2023
241dad4
Add sample cleanup rules to CITE and VDJ pipelines; Change CITE pipel…
chenv3 Oct 26, 2023
4e8e51a
Edit generate summary file scripts to not check for intermediate cell…
chenv3 Oct 26, 2023
62441bb
Add cleanup functionality to ATAC pipeline
chenv3 Oct 27, 2023
8265e30
Add cleanup functionality and change BAM file generation to off by de…
chenv3 Oct 27, 2023
e387da2
Add intermediate folder cleanup to multiome pipeline
chenv3 Nov 16, 2023
bb46f15
seuratSampleQC.R - fix typo and add additional resolutions for cluste…
chenv3 Nov 16, 2023
b8ed5df
Update run documentation and organizing flags by version
chenv3 Nov 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 85 additions & 10 deletions cell-seek
Original file line number Diff line number Diff line change
Expand Up @@ -245,9 +245,10 @@ def parsed_arguments(name, description):
[--dry-run] [--job-name JOB_NAME] [--mode {{slurm,local}}] \\
[--sif-cache SIF_CACHE] [--singularity-cache SINGULARITY_CACHE] \\
[--silent] [--threads THREADS] [--tmp-dir TMP_DIR] \\
[--libraries LIBRARIES] [--features FEATURES] \\
[--cmo-reference CMOREFERENCE] [--cmo-sample CMOSAMPLE] \\
[--exclude-introns] \\
[--aggregate {{mapped, none}}][--libraries LIBRARIES] \\
[--features FEATURES] [--cmo-reference CMOREFERENCE] \\
[--cmo-sample CMOSAMPLE] [--exclude-introns] [--filter FILTER] \\
[--create-bam] \\
--input INPUT [INPUT ...] \\
--output OUTPUT \\
--version {{gex, ...}} \\
Expand Down Expand Up @@ -288,10 +289,21 @@ def parsed_arguments(name, description):
options: hg38, mm10.
Example: --genome hg38
{3}{4}Analysis options:{5}
--aggregate {{mapped,none}}
Cell Ranger aggregate. This option defines the
normalization mode that should be used. Mapped is what
Cell Ranger would run by default, which subsamples reads
from higher depth samples until each library type has an
equal number of reads per cell that are confidently mapped.
None means to not normalize at all. If this flag is not
used then aggregate will not be run. To run Cell Ranger
aggregate, please select one of the following options:
mapped, none.
Example: --aggregate mapped
--libraries LIBRARIES
Libraries file. A CSV file containing information about
each library. This file is used in feature barcode (cite),
multi, and multiome analysis.It contains each sample's
each library. This file is used in feature barcode (cite),
multi, and multiome analysis.It contains each sample's
name, flowcell, demultiplexed name, and library type.
Here is an example libraries.csv file:
Name,Flowcell,Sample,Type
Expand Down Expand Up @@ -359,7 +371,7 @@ def parsed_arguments(name, description):
not contain whitespace.
• sequence: Nucleotide barcode sequence associated
with this hashtag.
• feature_type: Type of the feature. This should always be
• feature_type: Type of the feature. This should always be
multiplexing capture.
• read: Specifies which RNA sequencing read contains
the Feature Barcode sequence. Must be R1 or R2, but
Expand All @@ -386,15 +398,50 @@ def parsed_arguments(name, description):
• sample_id: Unique sample ID for this hashtagged sample.
Must not contain, whitespace, quote or comma characters.
Each sample ID must be unique.
• cmo_ids: Unique CMO ID(s) that the sample is hashtagged
• cmo_ids: Unique CMO ID(s) that the sample is hashtagged
with. Must match either entries in cmo_reference.csv file
or 10x CMO IDs.
Example: --cmo-sample cmo_sample.csv
--exclude-introns
Exclude introns from the count alignment. This flag is
only applicable when dealing with gene expression related
Exclude introns from the count alignment. This flag is
only applicable when dealing with gene expression related
data.
Example: --exclude-introns
--filter FILTER
Filter threshold file. A CSV file containing the different
thresholds to be applied for individual samples within the
project during the QC analysis. The file should contain a
header row with Sample as the column name for the sample IDs,
and the name of each metric that will be filtered along with
if it is the high or low threshold for that metric. Each row
is then the entries for each sample that the manual thresholds
will be applied. If no file is provided then the default
thresholds will be used. If a cell is left blank for a sample
then that sample would not be filtered based on that criteria.
This flag is currently only applicable when dealing with GEX
projects.
Here is an example filter.csv file:
Sample,nFeature_RNA_low,nFeature_RNA_high,percent.mito_high
sample1,500,6000,15
sample2,500,6000,5
sample4,500,6000,5
where:
• Sample: Unique sample ID that should match the sample name
used for Cell Ranger count.
• nFeature_RNA_low,nFeature_RNA_high,percent.mito_high: Example
entries that can be used for manual thresholding. The column
names need to be formatted as metadataname_high/low. Entries
that ends with high will be treated as the upper threshold.
Entries that ends with low will be treated as the lower
threshold. Valid metadata names include nCount_RNA,
nFeature_RNA, and percent.mito.
Example: --filter filter.csv
--create-bam
Create bam files. By default the no-bam flag is used when running
Cell Ranger. Use this flag to ensure that a bam file is created for
each sample during analysis. This flag is only applicable when
dealing with gene expression related data.
Example: --create-bam

{3}{4}Orchestration options:{5}
--mode {{slurm,local}}
Expand Down Expand Up @@ -537,7 +584,7 @@ def parsed_arguments(name, description):
'--version',
type = str.lower,
required = True,
default = "slurm",
default = "gex",
choices = ['gex', 'cite', 'multi', 'vdj', 'atac', 'multiome'],
help = argparse.SUPPRESS
)
Expand Down Expand Up @@ -608,6 +655,34 @@ def parsed_arguments(name, description):
help = argparse.SUPPRESS
)

# How to run Cell Ranger aggregate
subparser_run.add_argument(
'--aggregate',
type = str.lower,
required = False,
default = "",
choices = ['none', 'mapped'],
help = argparse.SUPPRESS
)

# Thresholds to use for filtering in QC Analysis
subparser_run.add_argument(
'--filter',
# Check if the file exists and if it is readable
type = lambda file: permissions(parser, file, os.R_OK),
required = False,
help = argparse.SUPPRESS
)

# Create BAM file during run
subparser_run.add_argument(
'--create-bam',
action = 'store_true',
required = False,
default = False,
help = argparse.SUPPRESS
)

# Orchestration Options
# Execution Method, run locally
# on a compute node or submit to
Expand Down
5 changes: 5 additions & 0 deletions config/cluster.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,10 @@
"threads": "16",
"mem": "150g",
"time": "2-00:00:00"
},
"seuratQC": {
"threads": "8",
"mem": "150g",
"time": "1-00:00:00"
}
}
Loading