Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CARD 3.0.6 introduces white space in seq id causing fasta/metadata mismatch #284

Closed
rpetit3 opened this issue Oct 17, 2019 · 2 comments
Closed

Comments

@rpetit3
Copy link
Contributor

rpetit3 commented Oct 17, 2019

ariba getref card card
Getting available CARD versions
Downloading "https://card.mcmaster.ca/download" and saving as "download.html" ... done
Found versions:
1.0.0   https://card.mcmaster.ca/download/0/broadstreet-v1.0.0.tar.bz2
1.0.1   https://card.mcmaster.ca/download/0/broadstreet-v1.0.1.tar.bz2
1.0.2   https://card.mcmaster.ca/download/0/broadstreet-v1.0.2.tar.bz2
1.0.3   https://card.mcmaster.ca/download/0/broadstreet-v1.0.3.tar.bz2
1.0.4   https://card.mcmaster.ca/download/0/broadstreet-v1.0.4.tar.bz2
1.0.5   https://card.mcmaster.ca/download/0/broadstreet-v1.0.5.tar.bz2
1.0.6   https://card.mcmaster.ca/download/0/broadstreet-v1.0.6.tar.bz2
1.0.7   https://card.mcmaster.ca/download/0/broadstreet-v1.0.7.tar.bz2
1.0.8   https://card.mcmaster.ca/download/0/broadstreet-v1.0.8.tar.bz2
1.0.9   https://card.mcmaster.ca/download/0/broadstreet-v1.0.9.tar.bz2
1.1.0   https://card.mcmaster.ca/download/0/broadstreet-v1.1.0.tar.bz2
1.1.1   https://card.mcmaster.ca/download/0/broadstreet-v1.1.1.tar.bz2
1.1.2   https://card.mcmaster.ca/download/0/broadstreet-v1.1.2.tar.bz2
1.1.3   https://card.mcmaster.ca/download/0/broadstreet-v1.1.3.tar.bz2
1.1.4   https://card.mcmaster.ca/download/0/broadstreet-v1.1.4.tar.bz2
1.1.5   https://card.mcmaster.ca/download/0/broadstreet-v1.1.5.tar.bz2
1.1.6   https://card.mcmaster.ca/download/0/broadstreet-v1.1.6.tar.bz2
1.1.7   https://card.mcmaster.ca/download/0/broadstreet-v1.1.7.tar.bz2
1.1.8   https://card.mcmaster.ca/download/0/broadstreet-v1.1.8.tar.bz2
1.1.9   https://card.mcmaster.ca/download/0/broadstreet-v1.1.9.tar.bz2
1.2.0   https://card.mcmaster.ca/download/0/broadstreet-v1.2.0.tar.bz2
1.2.1   https://card.mcmaster.ca/download/0/broadstreet-v1.2.1.tar.bz2
2.0.0   https://card.mcmaster.ca/download/0/broadstreet-v2.0.0.tar.gz
2.0.1   https://card.mcmaster.ca/download/0/broadstreet-v2.0.1.tar.gz
2.0.2   https://card.mcmaster.ca/download/0/broadstreet-v2.0.2.tar.gz
2.0.3   https://card.mcmaster.ca/download/0/broadstreet-v2.0.3.tar.gz
3.0.0   https://card.mcmaster.ca/download/0/broadstreet-v3.0.0.tar.gz
3.0.1   https://card.mcmaster.ca/download/0/broadstreet-v3.0.1.tar.gz
3.0.2   https://card.mcmaster.ca/download/0/broadstreet-v3.0.2.tar.gz
3.0.3   https://card.mcmaster.ca/download/0/broadstreet-v3.0.3.tar.gz
3.0.4   https://card.mcmaster.ca/download/0/broadstreet-v3.0.4.tar.gz
3.0.5   https://card.mcmaster.ca/download/0/broadstreet-v3.0.5.tar.gz
3.0.6   https://card.mcmaster.ca/download/0/broadstreet-v3.0.6.tar.gz
Getting version 3.0.6
Working in temporary directory /home/rpetit3/test-grounds/slurm-test/card.download
Downloading data from card: https://card.mcmaster.ca/download/0/broadstreet-v3.0.6.tar.gz
syscall: wget -O card.tar.bz2 https://card.mcmaster.ca/download/0/broadstreet-v3.0.6.tar.gz
...finished downloading
Extracted json data file ./card.json. Reading its contents...
Found 2913 records in the json file. Analysing...
Extracted data and written ARIBA input files

Finished. Final files are:
        /home/rpetit3/test-grounds/slurm-test/card.fa
        /home/rpetit3/test-grounds/slurm-test/card.tsv

You can use them with ARIBA like this:
ariba prepareref -f /home/rpetit3/test-grounds/slurm-test/card.fa -m /home/rpetit3/test-grounds/slurm-test/card.tsv output_directory

If you use this downloaded data, please cite:
"The Comprehensive Antibiotic Resistance Database", McArthur et al 2013, PMID: 23650175
and in your methods say that version 3.0.6 of the database was used


ariba prepareref -f /home/rpetit3/test-grounds/slurm-test/card.fa -m /home/rpetit3/test-grounds/slurm-test/card.tsv output_directory
Traceback (most recent call last):
  File "/home/rpetit3/miniconda3/envs/bactopia-1.2.1/bin/ariba", line 312, in <module>
    args.func(args)
  File "/home/rpetit3/miniconda3/envs/bactopia-1.2.1/lib/python3.6/site-packages/ariba/tasks/prepareref.py", line 34, in run
    preparer.run(options.outdir)
  File "/home/rpetit3/miniconda3/envs/bactopia-1.2.1/lib/python3.6/site-packages/ariba/ref_preparer.py", line 186, in run
    genetic_code=self.genetic_code,
  File "/home/rpetit3/miniconda3/envs/bactopia-1.2.1/lib/python3.6/site-packages/ariba/reference_data.py", line 34, in __init__
    self.sequences, self.metadata = ReferenceData._load_input_files_and_check_seq_names(fasta_files, metadata_tsv_files)
  File "/home/rpetit3/miniconda3/envs/bactopia-1.2.1/lib/python3.6/site-packages/ariba/reference_data.py", line 143, in _load_input_files_and_check_seq_names
    raise Error('Sequence "' + seq_name + '" found in input fasta file but not in metadata file. Cannot continue')
ariba.reference_data.Error: Sequence "AAC(6')-29b.3002584.NG_048576.1" found in input fasta file but not in metadata file. Cannot continue

I added a line to spit out whats in the metadata dict and the fasta seq_names
(~https://github.com/sanger-pathogens/ariba/blob/master/ariba/reference_data.py#L139), here's what it is returning:

METADATA    AAC(6')-29b.3002584.NG_048576.1 101-502 (+).100-502.5869       
SEQ_NAME    AAC(6')-29b.3002584.NG_048576.1 False

Which leads me back to the seq.id = seq.id.split()[0] at https://github.com/sanger-pathogens/ariba/blob/master/ariba/reference_data.py#L139

Looks like there was whitespace introduced in v3.0.6, here's the diff for the sequence.

# v3.0.6
>gb|NG_048576.1 101-502 (+)|+|100-502|ARO:3002584|AAC(6')-29b [Pseudomonas aeruginosa]

# v3.0.5
> >gb|EU118148|+|2006-2402|ARO:3002584|AAC(6')-29b [Pseudomonas aeruginosa]

Might be a typo on CARDs end, definitely looks like it considering the 101-502 (+) is repeated.

If you think I should take this up with CARD, feel free to close this issue, and I'll follow up with them.

Cheers! Thanks for all the great work you're doing!

@kpepper
Copy link
Member

kpepper commented Oct 18, 2019

Hi @rpetit3. Thanks for the analysis on this. Looks like CARD have it in hand.

@rpetit3
Copy link
Contributor Author

rpetit3 commented Oct 18, 2019

Agreed! Going to close.

@rpetit3 rpetit3 closed this as completed Oct 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants