Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating Single VCF file for Each scRNA-Seq Sample #125

Open
hkarakurt8742 opened this issue May 30, 2024 · 3 comments
Open

Generating Single VCF file for Each scRNA-Seq Sample #125

hkarakurt8742 opened this issue May 30, 2024 · 3 comments

Comments

@hkarakurt8742
Copy link

Hello,
I am relatively new in variant calling using scRNA-Seq. I have 17 datasets from 17 patients. I want to call the variants for each patient. I only need the list of variants in each sample.
Can I use cellranger output bam file "possorted_genome_bam.bam" as pseudobulk as suggested in manual:

# 10x scRNA-seq sample in a pseudo-bulk manner cellsnp-lite -s $BAM -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag UB --gzip

Thank you in advance

@hxj5
Copy link
Collaborator

hxj5 commented May 31, 2024

Hi, thanks for the question.

Yes, you can call the variants in a pseudobulk manner on the cellranger BAM file. However, it is recommended to subset the BAM file first, to filter the reads from invalid cells with poor sequencing qualities. I quote the manual:

"To genotype 10x scRNA-seq data in a pseudo-bulk manner with cellsnp-lite mode 1b (or mode 2b), it is recommended to subset the BAM file first, by extracting the alignment records with valid cell barcodes only. Here the valid cell barcodes are typically the cell barcodes stored in the cellranger output folder filtered_gene_bc_matrices, which are the cells with high-quality sequencing data."

@hkarakurt8742
Copy link
Author

Hi, thanks for the question.

Yes, you can call the variants in a pseudobulk manner on the cellranger BAM file. However, it is recommended to subset the BAM file first, to filter the reads from invalid cells with poor sequencing qualities. I quote the manual:

"To genotype 10x scRNA-seq data in a pseudo-bulk manner with cellsnp-lite mode 1b (or mode 2b), it is recommended to subset the BAM file first, by extracting the alignment records with valid cell barcodes only. Here the valid cell barcodes are typically the cell barcodes stored in the cellranger output folder filtered_gene_bc_matrices, which are the cells with high-quality sequencing data."

Thank you for your reply. I will filter the barcodes.
I have another question, I want to use a reference fasta (with faidx) with cellsnp-lite, is fastq file enough by itself or a specific version is required? I will use the same fasta that I used as CellRanger reference but because of the "--refseq" option I wanted to be sure.

Thank you

@hxj5
Copy link
Collaborator

hxj5 commented Jun 4, 2024

The FASTA file the same as cellranger reference is good for --refseq option. In general, the genomic build version of the FASTA file should be the same as the BAM file, e.g., both are hg38 or hg19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants