xiezhq · Apr 3, 2021 · Apr 3, 2021 · Apr 3, 2021 · Apr 3, 2021 · Apr 3, 2021
Showing with 26 additions and 17 deletions.

+26 −17 README.md
diff --git a/README.md b/README.md
@@ -1,9 +1,19 @@
-# ISEScan [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/isescan/README.html)
-A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome 
+# ISEScan [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](https://bioconda.github.io/recipes/isescan/README.html) [![install with docker](https://img.shields.io/badge/install%20with-docker-important.svg?style=flat-square&logo=docker)](https://quay.io/repository/biocontainers/isescan)
+
+## A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
+- ISEScan can be used to identify/annotate full-length or non-full-length IS elements in any DNA sequence but ISEScan was only tested on prokarytoic genome including draft genome and meta-genome.
+- Among the existing tools identifying IS elements, ISEScan might be the only one that gives TIR (Terminal Inverted Repeat) sequences.
+- The input sequence file (namely, genome or meta-genome) of ISEScan can contain one or more sequences and there is no limit on the length of each sequence, though ISEScan was only tested on complete genome with one or more sequences, draft genome with many contigs, assembled meta-genome with many contigs. 
+- The only requirment for the input sequence file is: the sequence file must be in **FASTA** format. When ISEScan is started, it first scans the sequences in the FASTA file one by one, then identify/annotate the IS elements in each sequence independently, finally output all identified/annotated IS elements for each sequence and the statistics of identified/annotated IS elements from all sequences in the input FASTA file. 
+- Unknown bases are allowed in the sequences, e.g. ACACGCCCGTTGTTTT**NNNNNNNNN**, GGGTCAGGTCATCAACTTTAGCGTAACGC**NNNNN**GGG.
+- If you just want to identify potential transposases (not FULL or partial IS elements) in your sequences and don't like to install ISEScan, you can do so by following two steps: 1) download the transposase models (clusters.faa.hmm and clusters.single.fa) from ISEScan subdirectory [pHMMs](https://github.com/xiezhq/ISEScan/tree/master/pHMMs), 2) install and use software HMMER (version 3.1b2 or later) to search transposases in your sequences.
+- ISEScan users asked many good questions (see [issues](https://github.com/xiezhq/ISEScan/issues)) which have been answered by the developer of ISEScan. If you didn't find the answers you want at [issues](https://github.com/xiezhq/ISEScan/issues), you can open a new issue at [issues](https://github.com/xiezhq/ISEScan/issues).  
+- If you want to replace some (or all) of genes/proteins predicted by ISEScan (actually FraGeneScan called by ISEScan) to predict transposases and IS elements, you can try manually replacing gene boudaries and protein sequences in file `.faa` under directory `results/proteome` after you run ISEScan on your genome sequences. For how to do so, please check [my comments](https://github.com/xiezhq/ISEScan/issues/45) on May 2022. 
 
 ## Table of Contents
 - [Overview](#Overview)
 - [Citation](#Citation)
+- [Contact](#Contact)
 - [Installation](#Installation)
 	- [ISEScan on linux](#install-on-linux)
 	- [ISEScan on mac](#install-on-mac)
@@ -15,20 +25,23 @@ A python pipeline to identify IS (Insertion Sequence) elements in genome and met
 	- [How to run a set of genomes in a row](#lots-of-genomes)
 	- [Re-run ISEScan without gene/protein prediction and HMMER searching](#Re-run)
 - [Release History](#Release)
-- [Contact](#Contact)
 
 <a name="Overview"></a>
 ## Overview
 ISEScan is a python pipeline to identify IS (Insertion Sequence) elements in genome. It includes an option to report either complete IS elements or both complete and partial IS elements. It might be a good idea to try reporting both complete and partial IS elements when it is used to identify the IS elements in the assemblies of metegenome. ISEScan reports both complete and partial IS elements by default.
 
-ISEScan was developed using Python3. It 1) scanes genome (or metagenome) in fasta format; 2) predicts/translates (using FragGeneScan) genome into proteome; 3) searches the pre-built pHMMs (profile Hidden Markov Models) of transposases (two files shipped with ISEScan; clusters.faa.hmm and clusters.single.faa) against the proteome and identifies the transposase gene in genome; 4) then extends the identified transposase gene into the complete IS (Insertion Sequence) elements based on the common characteristics shared by the known IS elements reported by literatures and database; 5) finally reports the identified IS elements in a few result files (e.g. a file containing a list of IS elements, a file containing sequences of IS elements in fasta format, an annotation file in GFF3 format).
+ISEScan was developed using Python3. It 1) scans genome (or metagenome) in fasta format; 2) predicts/translates (using FragGeneScan) genome into proteome; 3) searches the pre-built pHMMs (profile Hidden Markov Models) of transposases (two files shipped with ISEScan; clusters.faa.hmm and clusters.single.faa) against the proteome and identifies the transposase gene in genome; 4) then extends the identified transposase gene into the complete IS (Insertion Sequence) elements based on the common characteristics shared by the known IS elements reported by literatures and database; 5) finally reports the identified IS elements in a few result files (e.g. a file containing a list of IS elements, a file containing sequences of IS elements in fasta format, an annotation file in GFF3 format).
 
 <a name="Citation"></a>
 ## Citation
 Zhiqun Xie, Haixu Tang. ISEScan: automated identification of Insertion Sequence Elements in prokaryotic genomes. *Bioinformatics*, 2017, 33(21): 3340-3347. 
 
 Download: [full text](https://doi.org/10.1093/bioinformatics/btx433), [SupplementaryMaterials.docx](publication/SupplementaryMaterials.docx), [SupplementaryMaterials.xlsx](publication/SupplementaryMaterials.xlsx).
 
+<a name="Contact"></a>
+## Contact
+Zhiqun Xie: `xiezhq@hotmail.com`
+
 <a name="Installation"></a>
 ## Installation
 <a name="install-on-linux"></a>
@@ -70,14 +83,14 @@ The steps below will install ISEScan package via bioconda to /apps/inst/minicond
 ```
 conda install isescan
 ```
-- Try ISEScan (You can find the available command options `isescan.py -h`).
+- Try ISEScan (You can find the available command options by running `isescan.py -h`).
 ```
 cp /apps/inst/miniconda3/test/NC_012624.fna ./
 isescan.py --seqfile NC_012624.fna --output results --nthread 2
 ```
 Note: replace `/apps/inst/miniconda3` in commands with your conda install path.
 
-If system reports `isescan.py: command not found...`, please add ISEScan package to your `PATH` (replace `/apps/inst/miniconda3` in commands with your conda install path):
+If system reports `isescan.py: command not found...`, please add ISEScan package to your `PATH` (replace `/apps/inst/miniconda3` in the command below with your conda install path):
 ```
 export PATH=/apps/inst/miniconda3/bin/:$PATH
 ```
@@ -137,13 +150,13 @@ isescan.py --seqfile NC_012624.fna --output results --nthread 2
 
 <a name="Upgrade"></a>
 ## Upgrade ISEScan to the latest version
-### Automated upgrade from Bioconda (easy and recommended)
-You can run the command below to upgrade the existing ISEScan if the existing ISEScan was installed by Bioconda.
+### Automated upgrade from Bioconda
+The lastest version becomes available on Bioconda is in a few hours or days after it is released on https://github.com/xiezhq/ISEScan. You can run the command below to upgrade the existing ISEScan if the existing ISEScan was installed by Bioconda.
 ```
 conda update isescan
 ```
-### Manual upgrade from existing ISEScan (easy)
-It is quite easy to upgrade the existing ISEScan to the latest version: copy all .py files from the latest version to the ISEScan install directory. 
+### Manual upgrade from existing ISEScan
+By manual upgrade, you may get the lastest version immediately from https://github.com/xiezhq/ISEScan). It is quite easy to upgrade the existing ISEScan to the latest version: copy all .py files from the latest version to the ISEScan install directory. 
 - Locate the existing ISEscan (ISEScan install directory). If you don't know where isescan.py is installed, you can run `which isescan.py` to help find where it is on your system. 
 	```
 	which isescan.py
@@ -173,15 +186,15 @@ It is quite easy to upgrade the existing ISEScan to the latest version: copy all
 ## Usage example
 Let's try an example, NC_012624.fna.
 
-- The command below scans NC_012624.fna (genome sequence of Sulfolobus_islandicus_Y_N_15_51, ~42 kb), and outputs all results in `prediction` directory:   
+- The command below scans NC_012624.fna (genome sequence of Sulfolobus_islandicus_Y_N_15_51, ~42 kb), and outputs all results in `results` directory:   
 	```
 	cp /apps/inst/miniconda3/test/NC_012624.fna ./
 	isescan.py --seqfile NC_012624.fna --output results --nthread 2
 	```
-
+  Note: run `isescan.py -h` or `isescan.py --help` to get help.
 - Wait for its finishing. It may take a while (~40 seconds) as ISEScan uses the HMMER to scan the genome sequences and it will use 621 profile HMM models to scan each protein sequence (predicted by FragGeneScan) in the genome sequence. HMMER searching is usually more sensitive but slower than the regular BLAST searching for remote homologs. The running time for larger genome will increase quickly, e.g. about 20 minutes for NC_000913.fna (genome sequence of Escherichia coli str. K-12 substr. MG1655, ~4.6 Mb) with two cpu cores on my virtual machine.
 
-- After ISEScan finish running, you can find the output files in prediction directory: 
+- After ISEScan finish running, you can find the output files in results directory: 
   - NC_012624.fna.sum: the summarization of IS copies for each IS family
   - NC_012624.fna.csv: details about IS copies in NC_012624, one copy per line, comma-separated tabular table
   - NC_012624.fna.tsv: details about IS copies in NC_012624, one copy per line, tab-separated tabular table
@@ -333,7 +346,3 @@ Let's try an example, NC_012624.fna.
   - Add option in `constants.py` to report either complete IS elements or both complete and partial IS elements
 - 1.0
   - The first proper release
-
-<a name="Contact"></a>
-## Contact
-`xiezhq@hotmail.com`