Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous sequence reported in insert fasta file #55

Closed
Nifaste opened this issue Jul 18, 2024 · 4 comments
Closed

Erroneous sequence reported in insert fasta file #55

Nifaste opened this issue Jul 18, 2024 · 4 comments
Labels
question Further information is requested

Comments

@Nifaste
Copy link

Nifaste commented Jul 18, 2024

Ask away!

I have observed an error with the sequences of the inserts in the FASTA format: {{ alias }}.insert.fasta. I investigated the issue and it appears to originate from the following code in find_insert.py:

rev_comp = reverse_complement(whole_seq) strand_seq = {'-': rev_comp, '+': whole_seq} parse_seq = strand_seq[str(df['strand'][0])] final_seq = parse_seq[df['start'][0]::] + parse_seq[:df['end'][0]:] df['sequence'][0] = final_seq

The start and stop coordinates are not being adjusted when taking the reverse complement of the sequence.

mapping_inserts_to_ref

I fixed the issue with an update of the code

insert_seq=whole_seq[df['start'][0]::] + whole_seq[:df['end'][0]:] rev_comp = reverse_complement(insert_seq) strand_seq = {'-': rev_comp, '+': insert_seq} final_seq = strand_seq[str(df['strand'][0])]

Has anyone else experienced this? Is my bug fix correct?

@Nifaste Nifaste added the question Further information is requested label Jul 18, 2024
@sarahjeeeze
Copy link
Contributor

Hi, thanks for reporting this. We will investigate and amend if required and let you know once implemented.

@sarahjeeeze
Copy link
Contributor

sarahjeeeze commented Jul 22, 2024

Hi, After some investigation I see seqkit amplicon which we use for getting this sequence changed (fixed) the way it reported start and end sequences in the bed file from 2.4.0 which has resulted in this bug in the workflow. It previously did output the reverse complement start and end points. see - shenwei356/seqkit#367. We will amend with your fix. Thanks again for drawing our attention to this!

@sarahjeeeze
Copy link
Contributor

Hi, we have now released the fix for this in the latest release.

@sarahjeeeze
Copy link
Contributor

Closing as this is now fixed, let us know if you have further troubles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants