Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of unassigned #110

Open
Run2309 opened this issue Nov 14, 2024 · 0 comments
Open

Large number of unassigned #110

Run2309 opened this issue Nov 14, 2024 · 0 comments

Comments

@Run2309
Copy link

Run2309 commented Nov 14, 2024

Hi!
Thanks for the great tools.
I have single-cell RNA sequencing data (Becton,Dickinson and Company) for CD45+ cells from decidua (1 bam file), planceta (1 bam file), peripheral blood (1 bam file) and cord blood (1 bam file) from one pregnant donor.

Since cells in decidua and placenta are from mother or feta, I want to identify the cell origin.

My strategy is to consider the data of decidua or placenta as pooled scRNA-seq data and data of peripheral blood (mother) or cord blood (feta) as non-pooled scRNA-seq data.I call variants for the non-pooled bam files using cellsnp-lite and then use them as donor vcf inputs for vireo in order to demultiplex the pooled one.

However, the result showed a large number of unassigned. The “n_vars” of “unassigned” range from 1 to 1259 (decidua) and 5 to 1080 (placenta). The value of “prob_max” of part of “unassigned” reach 0.9. (the result figure are presented)

Could you give me some advise regarding how can i imporve it? Thanks in advance! looking forward to your reply.
a. how can I imporve the result
b. what are the meaning of "best_singlet" and "best_doublet" in the output file, donor_ids.tsv, in vireo.

Step1:subset the BAM file by extracting the alignment records with valid cell barcodes(PB,CB,DCD,PLCT)
Step2:call variants for the non-pooled bam files using cellsnp-lite (mode2b,PB,CB)
Step3:merge vcf files and the get the all_samples.vcf.gz file as the input donor vcf input for vireo
Step4:call variants for decidua and placenta scRNA-seq data by using cellsnp(mode2b+1a=mode2a)
Step5:Step5: demultiplex the pooled data by using vireo(mode2)

Code:
#step1: subset the BAM file by extracting the alignment records with valid cell barcodes
#for peripheral blood
[1]samtools index PB-CD45.bam
[2]subset-bam_linux --bam PB-CD45.bam --cell-barcodes PB_barcodes.tsv --out-bam filtered_PB-CD45.bam
[3]samtools index filtered_PB-CD45.bam

#for cord blood
[1]samtools index CB-CD45.bam
[2]subset-bam_linux --bam CB-CD45.bam --cell-barcodes CB_barcodes.tsv --out-bam filtered_CB-CD45.bam
[3]samtools index filtered_CB-CD45.bam

#Step2:call variants for the non-pooled bam files using cellsnp-lite (mode2b)
#for peripheral blood
[1]BAM=filtered_PB-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/PB
[3]cellsnp-lite -s $BAM -I maternal -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype

#for cord blood
[1]BAM=filtered_CB-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/CB
[3]cellsnp-lite -s $BAM -I fetal -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype
#Step3: merge vcf file and the get the all_samples.vcf.gz file
[1]bcftools index Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/PB_cellSNP.cells.vcf.gz
[2]bcftools index Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/CB_cellSNP.cells.vcf.gz
[3]bcftools merge Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/*.vcf.gz -O z -o Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/all_samples.vcf.gz

#Step4:call variants for decidua and placenta scRNA-seq data by using cellsnp
#mode2b + mode1a = mode2a
#mode2b_for_decidua
[1]BAM=filtered_DCD-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/DCD
[3]cellsnp-lite -s $BAM -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype

#mode2b_for_placenta
[1]BAM=filtered_PLCT-CD45.bam
[2]OUT_DIR=Output_pseudo_mode2b_f/PLCT
[3]cellsnp-lite -s $BAM -f genome.fa -O $OUT_DIR -p 10 --minMAF 0.1 --minCOUNT 20 --cellTAG None --UMItag MA --gzip --genotype

#mode1a_for_decidua
[1]BAM=filtered_DCD-CD45.bam
[2]BARCODE=DCD_barcodes.tsv
[3]OUT_DIR=/Output_pseudo_mode2b_f/mode1a_after_mode2b/DCD
[4]REGION_VCF=/Output_pseudo_mode2b_f/DCD/cellSNP.cells.vcf.gz
[5]cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 10 --minMAF 0.1 --cellTAG CB --minCOUNT 20 --UMItag MA --gzip --genotype
#mode1a_for_placenta
[1]BAM=filtered_PLCT-CD45.bam
[2]BARCODE=PLCT_barcodes.tsv
[3]OUT_DIR=/Output_pseudo_mode2b_f/mode1a_after_mode2b/PLCT
[4]REGION_VCF=/Output_pseudo_mode2b_f/PLCT/cellSNP.cells.vcf.gz
[5]cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 10 --minMAF 0.1 --minCOUNT 20 --UMItag MA --gzip --genotype

#Step5: demultiplex the pooled data by using vireo
#mode2_for_decidua
[1]CELL_FILE=/Output_pseudo_mode2b_f/mode1a_after_mode2b/DCD/cellSNP.cells.vcf.gz
[2]DONOR_FILE=/Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/all_samples.vcf.gz
[3]OUT_DIR=/Output_pseudo_mode2b_f/viero_result_DCD_update
[4]nohup vireo -c $CELL_FILE -d $DONOR_FILE -o $OUT_DIR -N 2 --randSeed 2 > viero_result_DCD_update_output.log 2>&1 &

#mode2_for_placenta
[1]CELL_FILE=/Output_pseudo_mode2b_f/mode1a_after_mode2b/PLCT/cellSNP.cells.vcf.gz
[2]DONOR_FILE=/Output_pseudo_mode2b_f/merge_CB_PB_CELL_VCF/all_samples.vcf.gz
[3]OUT_DIR=/Output_pseudo_mode2b_f/viero_result_PLCT_update
nohup vireo -c $CELL_FILE -d $DONOR_FILE -o $OUT_DIR -N 2 --randSeed 2 > viero_result_PLCT_update_output.log 2>&1 &

For decidua
image
For placneta
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant