HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)

HiScanner is a python package for high-resolution single-cell copy number analysis.

Prerequisites

HiScanner requires bcftools, which must be included in PATH. All other dependencies should be installed automatically with instructions below.

Installation

# Create new conda environment with all dependencies

conda create -n hiscanner_test python=3.8
conda activate hiscanner_test

conda install -c conda-forge r-base
conda install -c conda-forge r-mgcv>=1.8
conda install bioconda::snakemake
# conda install -c bioconda samtools>=1.9 bcftools>=1.9 tabix py-bgzip
# conda install -c conda-forge graphviz

# Install HiScanner
pip install .

Pipeline Overview

HiScanner works in a modular fashion with five main steps:

SNP Calling (via SCAN2, requires separate environment)
Heterozygous SNP Selection & BAF Computation
ADO Pattern Analysis
Normalization & Segmentation
CNV Calling

Running the Pipeline

Step 1: SNP Calling (Prerequisites)

SCAN2 needs to be run separately before using HiScanner. If you have already run SCAN2, ensure you have:

VCF file with raw variants (gatk/hc_raw.mmq60.vcf.gz)
Phased heterozygous variants (shapeit/phased_hets.vcf)
Additionally, we note that the phased genotype field in phased_hets.vcf should be named as phasedgt. This is the expected output from the SCAN2 pipeline that we have tested with. If your VCF file has a different field name, please manually rename it to phasedgt in the VCF.

The expected location is scan2_out/ in your project directory.

Steps 2-5: HiScanner Analysis

Initialize project:

hiscanner init --output ./my_project
cd my_project

Edit config.yaml with your paths:

outdir: "./hiscanner_output"
metadata_path: "./metadata.txt"  # Path to your metadata file
scan2_output: "./scan2_out"      # Path to your SCAN2 results

# External tools and reference files
fasta_folder: "/path/to/reference/split"
mappability_folder_stem: "/path/to/mappability/hg19.CRC.100mer."
bicseq_norm: "/path/to/NBICseq-norm.pl"
bicseq_seg: "/path/to/NBICseq-seg.pl"

# Analysis parameters
binsize: 500000
max_wgd: 1
batch_size: 5
depth_filter: 0
ado_threshold: 0.2

Prepare metadata file (metadata.txt):

bamID    bam    singlecell
bulk1    /path/to/bulk.bam    N
cell1    /path/to/cell1.bam   Y
cell2    /path/to/cell2.bam   Y

Run the pipeline:

# Run individual steps
hiscanner run --step phase    # Process SCAN2 results
hiscanner run --step ado      # ADO analysis
hiscanner run --step segment  # Normalization and segmentation
hiscanner run --step cnv      # CNV calling

# Or run all steps after SCAN2
hiscanner run --step all

Output Structure

hiscanner_output/
├── phased_hets/       # Processed heterozygous SNPs
├── ado/               # ADO analysis results
├── bins/             # Binned read depth
├── segs/             # Segmentation results
└── final_calls/      # Final CNV calls

Required External Files

Reference genome (hg19/v37 recommended)
Mappability files
SCAN2 output files
NBICseq tools (for segmentation)

Troubleshooting

Common issues:

Missing SCAN2 results: Ensure scan2_output directory is correctly specified
File permissions: Check access to BAM files and reference data
Memory issues: Adjust batch_size in config.yaml

For more detailed information, check the log files in hiscanner_output/logs/

Support

HiScanner is currently under active development. For support or questions, please open an issue on our GitHub repository.

Citation

If you use HiScanner in your research, please cite:

@article{zhao2024high, title={High-resolution detection of copy number alterations in single cells with HiScanner}, author={\textbf{Yifan Zhao} and Luquette, Lovelace J and Veit, Alexander D and Wang, Xiaochen and Xi, Ruibin and Viswanadham, Vinayak V and Shao, Diane D and Walsh, Christopher A and Yang, Hong Wei and Johnson, Mark D and Park, Peter J}, journal={Under revision at Nature Communications}, year={2024}, } [Citation information to be added]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
hiscanner		hiscanner
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requiremets.txt		requiremets.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)

Table of Contents

Prerequisites

Installation

Pipeline Overview

Running the Pipeline

Step 1: SNP Calling (Prerequisites)

Steps 2-5: HiScanner Analysis

Output Structure

Required External Files

Troubleshooting

Support

Citation

About

Releases

Packages

Languages

License

parklab/HiScanner

Folders and files

Latest commit

History

Repository files navigation

HiScanner (HIgh-resolution Single-Cell Allelic copy Number callER)

Table of Contents

Prerequisites

Installation

Pipeline Overview

Running the Pipeline

Step 1: SNP Calling (Prerequisites)

Steps 2-5: HiScanner Analysis

Output Structure

Required External Files

Troubleshooting

Support

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages