Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: xiezhq/ISEScan
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.7.2.3
Choose a base ref
...
head repository: xiezhq/ISEScan
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Able to merge. These branches can be automatically merged.

Commits on Apr 3, 2021

  1. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    3fd173c View commit details
  2. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    5b21bcd View commit details
  3. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    740a19c View commit details
  4. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    db726fa View commit details
  5. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    3f5f616 View commit details
  6. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    d181ef1 View commit details
  7. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    52bcd4c View commit details
  8. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    9f1b97f View commit details
  9. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    8beadbc View commit details
  10. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    3e61d6a View commit details
  11. Update README.md

    xiezhq authored Apr 3, 2021
    Copy the full SHA
    02af002 View commit details

Commits on Jun 23, 2021

  1. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    407c65d View commit details
  2. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    cb54f4e View commit details
  3. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    d6a41fc View commit details
  4. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    eeeada1 View commit details
  5. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    8085c4c View commit details
  6. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    b381070 View commit details
  7. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    cc1de09 View commit details
  8. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    2100edb View commit details
  9. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    5d10b87 View commit details
  10. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    369f18b View commit details
  11. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    0b50e1d View commit details
  12. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    8f5d471 View commit details
  13. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    d1a2d89 View commit details
  14. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    71fc1be View commit details
  15. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    a7b6fe7 View commit details
  16. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    e27feda View commit details
  17. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    6ac7e9c View commit details
  18. Update README.md

    xiezhq authored Jun 23, 2021
    Copy the full SHA
    b2c5fe0 View commit details

Commits on Oct 16, 2021

  1. Update README.md

    xiezhq authored Oct 16, 2021
    Copy the full SHA
    601ebde View commit details

Commits on Oct 25, 2021

  1. Update README.md

    xiezhq authored Oct 25, 2021
    Copy the full SHA
    d577749 View commit details
  2. Update README.md

    xiezhq authored Oct 25, 2021
    Copy the full SHA
    1c28079 View commit details

Commits on Nov 3, 2021

  1. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    c743f72 View commit details
  2. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    f7e11f2 View commit details
  3. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    4f9a269 View commit details
  4. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    5bc37e4 View commit details
  5. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    416f65d View commit details
  6. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    3e9a85e View commit details
  7. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    732e76c View commit details
  8. Update README.md

    xiezhq authored Nov 3, 2021
    Copy the full SHA
    eacea05 View commit details

Commits on Jan 13, 2022

  1. Update README.md

    xiezhq authored Jan 13, 2022
    Copy the full SHA
    4823e2c View commit details

Commits on Feb 25, 2022

  1. Update README.md

    xiezhq authored Feb 25, 2022
    Copy the full SHA
    5bfb6c8 View commit details
  2. Update README.md

    xiezhq authored Feb 25, 2022
    Copy the full SHA
    4983e3d View commit details
  3. Update README.md

    xiezhq authored Feb 25, 2022
    Copy the full SHA
    2c5a886 View commit details

Commits on Mar 11, 2022

  1. Update README.md

    xiezhq authored Mar 11, 2022
    Copy the full SHA
    78c74bc View commit details
  2. Update README.md

    xiezhq authored Mar 11, 2022
    Copy the full SHA
    5afb4c2 View commit details

Commits on Mar 24, 2022

  1. Update README.md

    xiezhq authored Mar 24, 2022
    Copy the full SHA
    98b6ec0 View commit details
  2. Update README.md

    xiezhq authored Mar 24, 2022
    Copy the full SHA
    3794f07 View commit details
  3. Update README.md

    xiezhq authored Mar 24, 2022
    Copy the full SHA
    394c734 View commit details
  4. Update README.md

    xiezhq authored Mar 24, 2022
    Copy the full SHA
    942bc72 View commit details
Showing with 26 additions and 17 deletions.
  1. +26 −17 README.md
43 changes: 26 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
# ISEScan [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/isescan/README.html)
A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
# ISEScan [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](https://bioconda.github.io/recipes/isescan/README.html) [![install with docker](https://img.shields.io/badge/install%20with-docker-important.svg?style=flat-square&logo=docker)](https://quay.io/repository/biocontainers/isescan)

## A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
- ISEScan can be used to identify/annotate full-length or non-full-length IS elements in any DNA sequence but ISEScan was only tested on prokarytoic genome including draft genome and meta-genome.
- Among the existing tools identifying IS elements, ISEScan might be the only one that gives TIR (Terminal Inverted Repeat) sequences.
- The input sequence file (namely, genome or meta-genome) of ISEScan can contain one or more sequences and there is no limit on the length of each sequence, though ISEScan was only tested on complete genome with one or more sequences, draft genome with many contigs, assembled meta-genome with many contigs.
- The only requirment for the input sequence file is: the sequence file must be in **FASTA** format. When ISEScan is started, it first scans the sequences in the FASTA file one by one, then identify/annotate the IS elements in each sequence independently, finally output all identified/annotated IS elements for each sequence and the statistics of identified/annotated IS elements from all sequences in the input FASTA file.
- Unknown bases are allowed in the sequences, e.g. ACACGCCCGTTGTTTT**NNNNNNNNN**, GGGTCAGGTCATCAACTTTAGCGTAACGC**NNNNN**GGG.
- If you just want to identify potential transposases (not FULL or partial IS elements) in your sequences and don't like to install ISEScan, you can do so by following two steps: 1) download the transposase models (clusters.faa.hmm and clusters.single.fa) from ISEScan subdirectory [pHMMs](https://github.com/xiezhq/ISEScan/tree/master/pHMMs), 2) install and use software HMMER (version 3.1b2 or later) to search transposases in your sequences.
- ISEScan users asked many good questions (see [issues](https://github.com/xiezhq/ISEScan/issues)) which have been answered by the developer of ISEScan. If you didn't find the answers you want at [issues](https://github.com/xiezhq/ISEScan/issues), you can open a new issue at [issues](https://github.com/xiezhq/ISEScan/issues).
- If you want to replace some (or all) of genes/proteins predicted by ISEScan (actually FraGeneScan called by ISEScan) to predict transposases and IS elements, you can try manually replacing gene boudaries and protein sequences in file `.faa` under directory `results/proteome` after you run ISEScan on your genome sequences. For how to do so, please check [my comments](https://github.com/xiezhq/ISEScan/issues/45) on May 2022.

## Table of Contents
- [Overview](#Overview)
- [Citation](#Citation)
- [Contact](#Contact)
- [Installation](#Installation)
- [ISEScan on linux](#install-on-linux)
- [ISEScan on mac](#install-on-mac)
@@ -15,20 +25,23 @@ A python pipeline to identify IS (Insertion Sequence) elements in genome and met
- [How to run a set of genomes in a row](#lots-of-genomes)
- [Re-run ISEScan without gene/protein prediction and HMMER searching](#Re-run)
- [Release History](#Release)
- [Contact](#Contact)

<a name="Overview"></a>
## Overview
ISEScan is a python pipeline to identify IS (Insertion Sequence) elements in genome. It includes an option to report either complete IS elements or both complete and partial IS elements. It might be a good idea to try reporting both complete and partial IS elements when it is used to identify the IS elements in the assemblies of metegenome. ISEScan reports both complete and partial IS elements by default.

ISEScan was developed using Python3. It 1) scanes genome (or metagenome) in fasta format; 2) predicts/translates (using FragGeneScan) genome into proteome; 3) searches the pre-built pHMMs (profile Hidden Markov Models) of transposases (two files shipped with ISEScan; clusters.faa.hmm and clusters.single.faa) against the proteome and identifies the transposase gene in genome; 4) then extends the identified transposase gene into the complete IS (Insertion Sequence) elements based on the common characteristics shared by the known IS elements reported by literatures and database; 5) finally reports the identified IS elements in a few result files (e.g. a file containing a list of IS elements, a file containing sequences of IS elements in fasta format, an annotation file in GFF3 format).
ISEScan was developed using Python3. It 1) scans genome (or metagenome) in fasta format; 2) predicts/translates (using FragGeneScan) genome into proteome; 3) searches the pre-built pHMMs (profile Hidden Markov Models) of transposases (two files shipped with ISEScan; clusters.faa.hmm and clusters.single.faa) against the proteome and identifies the transposase gene in genome; 4) then extends the identified transposase gene into the complete IS (Insertion Sequence) elements based on the common characteristics shared by the known IS elements reported by literatures and database; 5) finally reports the identified IS elements in a few result files (e.g. a file containing a list of IS elements, a file containing sequences of IS elements in fasta format, an annotation file in GFF3 format).

<a name="Citation"></a>
## Citation
Zhiqun Xie, Haixu Tang. ISEScan: automated identification of Insertion Sequence Elements in prokaryotic genomes. *Bioinformatics*, 2017, 33(21): 3340-3347.

Download: [full text](https://doi.org/10.1093/bioinformatics/btx433), [SupplementaryMaterials.docx](publication/SupplementaryMaterials.docx), [SupplementaryMaterials.xlsx](publication/SupplementaryMaterials.xlsx).

<a name="Contact"></a>
## Contact
Zhiqun Xie: `xiezhq@hotmail.com`

<a name="Installation"></a>
## Installation
<a name="install-on-linux"></a>
@@ -70,14 +83,14 @@ The steps below will install ISEScan package via bioconda to /apps/inst/minicond
```
conda install isescan
```
- Try ISEScan (You can find the available command options `isescan.py -h`).
- Try ISEScan (You can find the available command options by running `isescan.py -h`).
```
cp /apps/inst/miniconda3/test/NC_012624.fna ./
isescan.py --seqfile NC_012624.fna --output results --nthread 2
```
Note: replace `/apps/inst/miniconda3` in commands with your conda install path.

If system reports `isescan.py: command not found...`, please add ISEScan package to your `PATH` (replace `/apps/inst/miniconda3` in commands with your conda install path):
If system reports `isescan.py: command not found...`, please add ISEScan package to your `PATH` (replace `/apps/inst/miniconda3` in the command below with your conda install path):
```
export PATH=/apps/inst/miniconda3/bin/:$PATH
```
@@ -137,13 +150,13 @@ isescan.py --seqfile NC_012624.fna --output results --nthread 2

<a name="Upgrade"></a>
## Upgrade ISEScan to the latest version
### Automated upgrade from Bioconda (easy and recommended)
You can run the command below to upgrade the existing ISEScan if the existing ISEScan was installed by Bioconda.
### Automated upgrade from Bioconda
The lastest version becomes available on Bioconda is in a few hours or days after it is released on https://github.com/xiezhq/ISEScan. You can run the command below to upgrade the existing ISEScan if the existing ISEScan was installed by Bioconda.
```
conda update isescan
```
### Manual upgrade from existing ISEScan (easy)
It is quite easy to upgrade the existing ISEScan to the latest version: copy all .py files from the latest version to the ISEScan install directory.
### Manual upgrade from existing ISEScan
By manual upgrade, you may get the lastest version immediately from https://github.com/xiezhq/ISEScan). It is quite easy to upgrade the existing ISEScan to the latest version: copy all .py files from the latest version to the ISEScan install directory.
- Locate the existing ISEscan (ISEScan install directory). If you don't know where isescan.py is installed, you can run `which isescan.py` to help find where it is on your system.
```
which isescan.py
@@ -173,15 +186,15 @@ It is quite easy to upgrade the existing ISEScan to the latest version: copy all
## Usage example
Let's try an example, NC_012624.fna.

- The command below scans NC_012624.fna (genome sequence of Sulfolobus_islandicus_Y_N_15_51, ~42 kb), and outputs all results in `prediction` directory:
- The command below scans NC_012624.fna (genome sequence of Sulfolobus_islandicus_Y_N_15_51, ~42 kb), and outputs all results in `results` directory:
```
cp /apps/inst/miniconda3/test/NC_012624.fna ./
isescan.py --seqfile NC_012624.fna --output results --nthread 2
```

Note: run `isescan.py -h` or `isescan.py --help` to get help.
- Wait for its finishing. It may take a while (~40 seconds) as ISEScan uses the HMMER to scan the genome sequences and it will use 621 profile HMM models to scan each protein sequence (predicted by FragGeneScan) in the genome sequence. HMMER searching is usually more sensitive but slower than the regular BLAST searching for remote homologs. The running time for larger genome will increase quickly, e.g. about 20 minutes for NC_000913.fna (genome sequence of Escherichia coli str. K-12 substr. MG1655, ~4.6 Mb) with two cpu cores on my virtual machine.

- After ISEScan finish running, you can find the output files in prediction directory:
- After ISEScan finish running, you can find the output files in results directory:
- NC_012624.fna.sum: the summarization of IS copies for each IS family
- NC_012624.fna.csv: details about IS copies in NC_012624, one copy per line, comma-separated tabular table
- NC_012624.fna.tsv: details about IS copies in NC_012624, one copy per line, tab-separated tabular table
@@ -333,7 +346,3 @@ Let's try an example, NC_012624.fna.
- Add option in `constants.py` to report either complete IS elements or both complete and partial IS elements
- 1.0
- The first proper release

<a name="Contact"></a>
## Contact
`xiezhq@hotmail.com`