Skip to content

Commit

Permalink
Merge pull request #92 from gbouras13/1.0.1
Browse files Browse the repository at this point in the history
v1.1.0
  • Loading branch information
gbouras13 authored Jan 13, 2025
2 parents 97f2ad0 + cb9631c commit 02bc62f
Show file tree
Hide file tree
Showing 16 changed files with 348,243 additions and 204 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:

strategy:
matrix:
os: [macos-12, ubuntu-latest]
os: [macos-13, ubuntu-latest]
python-version: ["3.9"]

steps:
Expand Down
8 changes: 7 additions & 1 deletion HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# History

# 1.1.0 (2025-01-13)

* Adds support for reorienting contigs where the gene of interest spands the contig ends - [fixes this issue](https://github.com/gbouras13/dnaapler/issues/90). Thanks @marade @oschwengers.
* Specifically, this is done by rotating each contig in the input by half the genome length, then running `MMseqs2` for both the original and rotated contigs. The `MMseqs2` hit with the highest bitscore across the original and rotated contigs will be chosen as the top hit to rotate by, therefor enabling detection of partial hits (on the original contig) that span the contig ends.
* This has only been implemented for `dnaapler all` (this should be the command used by 99% of users).

# 1.0.1 (2024-11-22)

* Thanks to the inimitable @[rrwick](https://github.com/rrwick), v1.0.1 is a patch fixing a string-parsing bug.
* If your contig headers were integers, `dnaapler` did not rotate the found `BLAST/MMseqs2` hits. This was pre-existing (not introduced by v1.0.0).
* If your contig headers were integers, `dnaapler` did not rotate the found `BLAST/MMseqs2` hits. This was a pre-existing issue (not introduced by v1.0.0).

# 1.0.0 (2024-11-21)

Expand Down
27 changes: 23 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,16 @@ conda install -c bioconda dnaapler
# runs dnaapler all
dnaapler all -i input_mixed_contigs.fasta -o output_directory_path -p my_bacteria_name -t 8
# runs dnaapler chromosome
dnaapler chromosome -i input_chromosome.fasta -o output_directory_path -p my_bacteria_name -t 8
```

* If you have a MacOS machine with Apple Silicon (M1/M2/M3/M4), please try

```
conda create --platform osx-64 -n dnaapler_env dnaapler
conda activate dnaapler_env
dnaapler all -i input_mixed_contigs.fasta -o output_directory_path -p my_bacteria_name -t 8
```

## Paper
Expand All @@ -59,7 +66,15 @@ Larralde, M., (2022). Pyrodigal: Python bindings and interface to Prodigal, an e
Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119.
```

## v1
## v1 and other recent changes

# 1.1.0

* Adds support for reorienting contigs where the gene of interest spands the contig ends - [fixes this issue](https://github.com/gbouras13/dnaapler/issues/90). Thanks @marade @oschwengers.
* Specifically, this is done by rotating each contig in the input by half the genome length, then running `MMseqs2` for both the original and rotated contigs. The `MMseqs2` hit with the highest bitscore across the original and rotated contigs will be chosen as the top hit to rotate by, therefor enabling detection of partial hits (on the original contig) that span the contig ends.
* This has only been implemented for `dnaapler all` (this should be the command used by 99% of users).

# v1.0

* **BREAKING CHANGE** - `dnaapler` now uses `MMSeqs2 v13.45111` rather than `BLAST`. You will need to install [MMSeqs2](https://github.com/soedinglab/MMseqs2) if you upgrade (if you use conda, it should be handled for you). The CLI is identical.
* There are 2 reasons for this:
Expand All @@ -78,7 +93,9 @@ If you don't want to install `dnaapler` locally, you can run `dnaapler all` with
- [dnaapler](#dnaapler)
- [Quick Start](#quick-start)
- [Paper](#paper)
- [v1](#v1)
- [v1 and other recent changes](#v1-and-other-recent-changes)
- [1.1.0](#110)
- [v1.0](#v10)
- [Google Colab Notebooks](#google-colab-notebooks)
- [Table of Contents](#table-of-contents)
- [Description](#description)
Expand Down Expand Up @@ -110,6 +127,8 @@ Additionally, you can also reorient multiple bacterial chromosomes/plasmids/phag

If your input FASTA is mixed (e.g. has chromosome and plasmids), you can also use `dnaapler all`, with the option to ignore some contigs with the `--ignore` parameter.

**As of v1, in practice, `dnaapler all` is the only command you will likely need, as it contains all the functionality of `bulk`, `chromosome`, `plasmid`, `phage` but with much more flexibility and user-friendliness**

## Documentation

The full documentation for `dnaapler` can be found [here](https://dnaapler.readthedocs.io).
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "dnaapler"
version = "1.0.1" # change VERSION too
version = "1.1.0" # change VERSION too
description = "Reorients assembled microbial sequences"
authors = ["George Bouras <[email protected]>"]
license = "MIT"
Expand Down
15 changes: 13 additions & 2 deletions src/dnaapler/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,14 @@
)
from dnaapler.utils.constants import DNAAPLER_DB
from dnaapler.utils.external_tools import ExternalTool
from dnaapler.utils.processing import rotate_input
from dnaapler.utils.util import (
begin_dnaapler,
check_duplicate_headers,
end_dnaapler,
get_version,
print_citation,
remove_file,
run_autocomplete,
)
from dnaapler.utils.validation import (
Expand Down Expand Up @@ -841,7 +843,7 @@ def bulk(
default="all",
type=click.STRING,
callback=validate_choice_db,
help="Lets you choose a subset of databases rather than all 3. Must be one of: 'all', 'dnaa', 'repa', terl', 'dnaa,repa', 'dnaa,terl' or 'repa,terl' ",
help="Lets you choose a subset of databases rather than all 4. Must be one of: 'all', 'dnaa', 'repa', terl', 'cog1474', 'dnaa,repa', 'dnaa,terl', 'repa,terl', 'dnaA,cog1474', 'cog1474,terl', 'cog1474,repa', 'dnaa,cog1474,repa', 'dnaa,cog1474,terl' or 'cog1474,repa,terl'",
show_default=True,
)
@click.option(
Expand Down Expand Up @@ -968,9 +970,14 @@ def all(
else:
custom_db = None

# rotate all replicons by half the length of the contig
# the rotated input for MMSeqs2 will have the original contigs + the rotated ones with suffix "rotated_"
rotated_input = os.path.join(output, "rotated_input.fasta")
rotate_input(input, rotated_input)

# runs bulk MMseqs2
run_bulk_MMseqs2(
ctx, input, output, prefix, gene, evalue, threads, custom_db=custom_db
ctx, rotated_input, output, prefix, gene, evalue, threads, custom_db=custom_db
)

# rerorients MMseqs2
Expand Down Expand Up @@ -998,8 +1005,12 @@ def all(
autocomplete,
seed_value,
custom_db=custom_db,
gene=gene,
)

# remove the rotated input
remove_file(Path(rotated_input))

# end dnaapler
end_dnaapler(start_time)

Expand Down
2 changes: 1 addition & 1 deletion src/dnaapler/utils/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.0.1
1.1.0
Loading

0 comments on commit 02bc62f

Please sign in to comment.