Skip to content

Commit

Permalink
fix user provided pseudo genes in GenBank files #94
Browse files Browse the repository at this point in the history
  • Loading branch information
oschwengers committed Jan 31, 2022
1 parent 69b3103 commit 0f0165d
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 1 deletion.
2 changes: 1 addition & 1 deletion bakta/expert/protein_sequences.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ def write_user_protein_sequences(aa_fasta_path):
with xopen(str(cfg.user_proteins), threads=0) as fh_in:
for record in SeqIO.parse(fh_in, 'genbank'):
for feature in record.features:
if(feature.type.lower() == 'cds'):
if(feature.type.lower() == 'cds' and 'pseudo' not in feature.qualifiers):
user_proteins.append(parse_user_protein_sequences_genbank(feature))
except Exception as e:
log.error('provided user proteins file GenBank format not valid!', exc_info=True)
Expand Down
15 changes: 15 additions & 0 deletions test/data/user-proteins.gbff
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,21 @@ FEATURES Location/Qualifiers
/transl_table=11
/protein_id="gnl|Bakta|hypo-mock-test"
/inference="ab initio prediction:Prodigal:2.6"
gene 1001..1038
/locus_tag="pseudo-mock-test"
/gene="mock2"
/pseudo
CDS 1001..1038
/locus_tag="pseudo-mock-test"
/gene="mock2"
/db_xref="USERDB:MOCK2"
/EC_number="0.0.0.0"
/product="mock pseudo user protein 2"
/pseudo
/codon_start=1
/transl_table=11
/protein_id="gnl|Bakta|hypo-mock-test"
/inference="ab initio prediction:Prodigal:2.6"
ORIGIN
1 ttcttctgcg agttcgtgca gcttctcaca catggtggcc tgctcgtcag catcgagtgc
61 gtccagtttt tcgagcagcg tcaggctctg gctttttatg aatcccgcca tgttgagtgc
Expand Down

0 comments on commit 0f0165d

Please sign in to comment.