Large number of unassigned #110

Run2309 · 2024-11-14T09:18:57Z

Hi!
Thanks for the great tools.
I have single-cell RNA sequencing data (Becton,Dickinson and Company) for CD45+ cells from decidua (1 bam file), planceta (1 bam file), peripheral blood (1 bam file) and cord blood (1 bam file) from one pregnant donor.

Since cells in decidua and placenta are from mother or feta, I want to identify the cell origin.

My strategy is to consider the data of decidua or placenta as pooled scRNA-seq data and data of peripheral blood (mother) or cord blood (feta) as non-pooled scRNA-seq data.I call variants for the non-pooled bam files using cellsnp-lite and then use them as donor vcf inputs for vireo in order to demultiplex the pooled one.

However, the result showed a large number of unassigned. The “n_vars” of “unassigned” range from 1 to 1259 (decidua) and 5 to 1080 (placenta). The value of “prob_max” of part of “unassigned” reach 0.9. (the result figure are presented)

Could you give me some advise regarding how can i imporve it? Thanks in advance! looking forward to your reply.
a. how can I imporve the result
b. what are the meaning of "best_singlet" and "best_doublet" in the output file, donor_ids.tsv, in vireo.

Step1:subset the BAM file by extracting the alignment records with valid cell barcodes(PB,CB,DCD,PLCT)
Step2:call variants for the non-pooled bam files using cellsnp-lite (mode2b,PB,CB)
Step3:merge vcf files and the get the all_samples.vcf.gz file as the input donor vcf input for vireo
Step4:call variants for decidua and placenta scRNA-seq data by using cellsnp(mode2b+1a=mode2a)
Step5:Step5: demultiplex the pooled data by using vireo(mode2)

Code:
#step1: subset the BAM file by extracting the alignment records with valid cell barcodes
#for peripheral blood
[1]samtools index PB-CD45.bam
[2]subset-bam_linux --bam PB-CD45.bam --cell-barcodes PB_barcodes.tsv --out-bam filtered_PB-CD45.bam
[3]samtools index filtered_PB-CD45.bam

#for cord blood
[1]samtools index CB-CD45.bam
[2]subset-bam_linux --bam CB-CD45.bam --cell-barcodes CB_barcodes.tsv --out-bam filtered_CB-CD45.bam
[3]samtools index filtered_CB-CD45.bam

#Step2:call variants for the non-pooled bam files using cellsnp-lite (mode2b)
#for peripheral blood
[1]BAM=filtered_PB-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/PB
[3]cellsnp-lite -s $BAM -I maternal -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype

#for cord blood
[1]BAM=filtered_CB-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/CB
[3]cellsnp-lite -s $BAM -I fetal -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype
#Step3: merge vcf file and the get the all_samples.vcf.gz file
[1]bcftools index Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/PB_cellSNP.cells.vcf.gz
[2]bcftools index Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/CB_cellSNP.cells.vcf.gz
[3]bcftools merge Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/*.vcf.gz -O z -o Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/all_samples.vcf.gz

#Step4:call variants for decidua and placenta scRNA-seq data by using cellsnp
#mode2b + mode1a = mode2a
#mode2b_for_decidua
[1]BAM=filtered_DCD-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/DCD
[3]cellsnp-lite -s $BAM -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype

#mode2b_for_placenta
[1]BAM=filtered_PLCT-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/PLCT
[3]cellsnp-lite -s $BAM -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype

#mode1a_for_decidua
[1]BAM=filtered_DCD-CD45.bam
[2]BARCODE=DCD_barcodes.tsv
[3]OUT_DIR=/Output_pseudo_mode2b_f/mode1a_after_mode2b/DCD
[4]REGION_VCF=/Output_pseudo_mode2b_f/DCD/cellSNP.cells.vcf.gz
[5]cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 10 --minMAF 0.1 --cellTAG CB --minCOUNT 20 --UMItag MA --gzip --genotype
#mode1a_for_placenta
[1]BAM=filtered_PLCT-CD45.bam
[2]BARCODE=PLCT_barcodes.tsv
[3]OUT_DIR=/Output_pseudo_mode2b_f/mode1a_after_mode2b/PLCT
[4]REGION_VCF=/Output_pseudo_mode2b_f/PLCT/cellSNP.cells.vcf.gz
[5]cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 10 --minMAF 0.1 --minCOUNT 20 --UMItag MA --gzip --genotype

#Step5: demultiplex the pooled data by using vireo
#mode2_for_decidua
[1]CELL_FILE=/Output_pseudo_mode2b_f/mode1a_after_mode2b/DCD/cellSNP.cells.vcf.gz
[2]DONOR_FILE=/Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/all_samples.vcf.gz
[3]OUT_DIR=/Output_pseudo_mode2b_f/viero_result_DCD_update
[4]nohup vireo -c $CELL_FILE -d $DONOR_FILE -o $OUT_DIR -N 2 --randSeed 2 > viero_result_DCD_update_output.log 2>&1 &

#mode2_for_placenta
[1]CELL_FILE=/Output_pseudo_mode2b_f/mode1a_after_mode2b/PLCT/cellSNP.cells.vcf.gz
[2]DONOR_FILE=/Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/all_samples.vcf.gz
[3]OUT_DIR=/Output_pseudo_mode2b_f/viero_result_PLCT_update
nohup vireo -c $CELL_FILE -d $DONOR_FILE -o $OUT_DIR -N 2 --randSeed 2 > viero_result_PLCT_update_output.log 2>&1 &

For decidua

For placneta

Run2309 mentioned this issue Nov 14, 2024

scRNA-seq from different donors as a genotype-vcf input for vireo single-cell-genetics/cellsnp-lite#100

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large number of unassigned #110

Large number of unassigned #110

Run2309 commented Nov 14, 2024

Large number of unassigned #110

Large number of unassigned #110

Comments

Run2309 commented Nov 14, 2024