Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecoli no 1 profile warning #163

Closed
karinlag opened this issue Mar 15, 2017 · 1 comment
Closed

Ecoli no 1 profile warning #163

karinlag opened this issue Mar 15, 2017 · 1 comment

Comments

@karinlag
Copy link

Hi!
I get this warning when I get the no 1 ecoli profile. Could you expand on what this really means?

(ariba)[karinlag@abel mlst]$ python3 /work/projects/nn9305k/bin/virtenv/ariba/bin/ariba pubmlstget "Escherichia coli#1" get_mlst1
WARNING: Same profile found twice in input file, but two different STs. Going to use the ST with the smaller number (7066)
... STs are 7066 7067 and alleles are adk:10, fumC:957, gyrB:4, icd:8, mdh:601, purA:8, recA:2
WARNING: Median sequence length is 469 but fumC.798 has length 382 which is too long or short. Removing

@andrewjpage
Copy link
Member

Hi,
Some of the MLST databases have poor quality data in them and ARIBA warns you about this. The first warning message indicates identical allele profiles have different STs (which should never happen). In the second case it is warning about a very long allele sequence compared to the rest. Normally the length of the allele sequences is very similar, and to have one thats way outside of that can indicate poor quality data (its very rare that its real). With some databases where manual curation is lax, this occurs quite a bit (e.g. sequences which dont translate to proteins, truncated).
Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants