CFBI (Calculating Frequencies of Base-substitutions and Indels) is the custome Perl and Shell scripts for calculating frequencies of base subsutitution and indels.
Version: 1.0.0
Calculate base substitution and indel frequencies of target gene for CRISPR genome editing.
BAM file was originally mapped form BWA-MEM.
Software / Package
To calculate base substitution and indel frequencies, BAM format file was generated firstly.
All Perl and Shell scripts were marked 'bold italic'.
-
- Please export the CFBI directory and BioPerl to your $PATH.
export PATH="~/CFBI-master:$PATH";
export PATH="~/Bio:$PATH";
-
- BWA index target sequences. '01_bwa_mem_index.sh'.
sh 01_bwa_mem_index.sh target.fa
-
- BWA-MEM mapping with DNA-seq reads (FASTQ). For paired-end sequencing, only R1 reads were used. '02_bwa_mem_mapping.sh'.
sh 02_bwa_mem_mapping.sh target.fa sample_R1.fq sample_R1
-
- Calculate base substitution with mapped reads (BAM). '03_base_substitution.sh'.
sh 03_base_substitution.sh sample_R1.bam target.fa sample_R1
Output file sample_R1.xls is an example result of base substitution.
-
- Calculate indel frequencies with target gene InDel location. '04_indel_frequencies.sh'.
sh 04_indel_frequencies.sh sample_R1.bam EXM1 250 300 ascii.txt
Output file EMX1_indel_frequencies.txt is an example for the number of indel frequencies of target EMX1.
- Target sequences with FASTQ format. (target.fa)
- DNA-seq R1 reads with FASTQ format. (sample_R1.fq)
- Name of BAM file. (sample_R1)
- Target gene symbol. (EMX1)
- Start location for indel frequencies calculation of target gene. (250, cutting site -25 bp for EMX1)
- End location for indel frequencies calculation of target gene. (300, cutting site +25 bp for EMX1)
- The sequence quality of reads by ASCII code. ascii.txt
See details in sample_R1.xls, the example output file for base substitution.
Field | Description |
---|---|
Gene symbol | Name of gene |
Mut location | Location of mutant site |
Ref base | Sequence of base |
Flanking base | ± 1 bp flanking sequence of base |
# of mapped reads | Number of mapped reads for target gene |
# of total mutant reads | Number of total mutant reads for target gene |
% of total mutant reads | Ratio of total mutant reads for target gene |
# of mutant A | Number of reads for mutant A |
% of mutant A | Ratio of reads for mutant A |
# of mutant C | Number of reads for mutant C |
% of mutant C | Ratio of reads for mutant C |
# of mutant G | Number of reads for mutant G |
% of mutant G | Ratio of reads for mutant G |
# of mutant N | Number of reads for mutant N |
% of mutant N | Ratio of reads for mutant N |
# of mutant T | Number of reads for mutant T |
% of mutant T | Ratio of reads for mutant T |
See details in EMX1_indel_frequencies.txt, the example output file for indel frequencies.
Field | Description |
---|---|
Gene symbol | Name of gene |
Type of reads | Type of all reads or indel reads |
# of reads | Number of all reads or indel reads |
The related paper about CFBI is submitted.
Copyright (C) 2021 YangLab. Licensed GPLv3 for open source use or contact YangLab ([email protected]) for commercial use.