HiScanner is a python package for high-resolution single-cell copy number analysis.
- Prerequisites
- Pipeline Overview
- Running the Pipeline
- Output Structure
- Required External Files
- Troubleshooting
- Citation
HiScanner requires bcftools
, which must be included in PATH
. All other dependencies should be installed automatically with instructions below.
# Create new conda environment with all dependencies
conda create -n hiscanner_test python=3.8
conda activate hiscanner_test
conda install -c conda-forge r-base
conda install -c conda-forge r-mgcv>=1.8
conda install bioconda::snakemake
# conda install -c bioconda samtools>=1.9 bcftools>=1.9 tabix py-bgzip
# conda install -c conda-forge graphviz
# Install HiScanner
pip install .
HiScanner works in a modular fashion with five main steps:
- SNP Calling (via SCAN2, requires separate environment)
- Heterozygous SNP Selection & BAF Computation
- ADO Pattern Analysis
- Normalization & Segmentation
- CNV Calling
SCAN2 needs to be run separately before using HiScanner. If you have already run SCAN2, ensure you have:
-
VCF file with raw variants (
gatk/hc_raw.mmq60.vcf.gz
) -
Phased heterozygous variants (
shapeit/phased_hets.vcf
) -
Additionally, we note that the phased genotype field in
phased_hets.vcf
should be named asphasedgt
. This is the expected output from the SCAN2 pipeline that we have tested with. If your VCF file has a different field name, please manually rename it tophasedgt
in the VCF.
The expected location is scan2_out/
in your project directory.
- Initialize project:
hiscanner init --output ./my_project
cd my_project
- Edit config.yaml with your paths:
outdir: "./hiscanner_output"
metadata_path: "./metadata.txt" # Path to your metadata file
scan2_output: "./scan2_out" # Path to your SCAN2 results
# External tools and reference files
fasta_folder: "/path/to/reference/split"
mappability_folder_stem: "/path/to/mappability/hg19.CRC.100mer."
bicseq_norm: "/path/to/NBICseq-norm.pl"
bicseq_seg: "/path/to/NBICseq-seg.pl"
# Analysis parameters
binsize: 500000
max_wgd: 1
batch_size: 5
depth_filter: 0
ado_threshold: 0.2
- Prepare metadata file (metadata.txt):
bamID bam singlecell
bulk1 /path/to/bulk.bam N
cell1 /path/to/cell1.bam Y
cell2 /path/to/cell2.bam Y
- Run the pipeline:
# Run individual steps
hiscanner run --step phase # Process SCAN2 results
hiscanner run --step ado # ADO analysis
hiscanner run --step segment # Normalization and segmentation
hiscanner run --step cnv # CNV calling
# Or run all steps after SCAN2
hiscanner run --step all
hiscanner_output/
├── phased_hets/ # Processed heterozygous SNPs
├── ado/ # ADO analysis results
├── bins/ # Binned read depth
├── segs/ # Segmentation results
└── final_calls/ # Final CNV calls
- Reference genome (hg19/v37 recommended)
- Mappability files
- SCAN2 output files
- NBICseq tools (for segmentation)
Common issues:
- Missing SCAN2 results: Ensure scan2_output directory is correctly specified
- File permissions: Check access to BAM files and reference data
- Memory issues: Adjust batch_size in config.yaml
For more detailed information, check the log files in hiscanner_output/logs/
HiScanner is currently under active development. For support or questions, please open an issue on our GitHub repository.
If you use HiScanner in your research, please cite:
@article{zhao2024high, title={High-resolution detection of copy number alterations in single cells with HiScanner}, author={\textbf{Yifan Zhao} and Luquette, Lovelace J and Veit, Alexander D and Wang, Xiaochen and Xi, Ruibin and Viswanadham, Vinayak V and Shao, Diane D and Walsh, Christopher A and Yang, Hong Wei and Johnson, Mark D and Park, Peter J}, journal={Under revision at Nature Communications}, year={2024}, } [Citation information to be added]