Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] possibly improper MD tag generation whej running atac data. #162

Open
LinearParadox opened this issue May 8, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@LinearParadox
Copy link

Describe the bug
It seems that the MD tag generated in Chromap sometimes can incorrectly begin with a letter. It seems this behavior, to my understanding occurs when the first character is a 0. More info can be found here:

macs3-project/MACS#643 (comment)

@LinearParadox LinearParadox added the bug Something isn't working label May 8, 2024
@taoliu
Copy link

taoliu commented May 8, 2024

@mourisl I tested on chromap 0.2.6.

Here is an example from samtools calmd on the SAM/BAM file generated by Chromap. We can see different types of illegal MD tags, that were fixed by samtools.

[bam_fillmd1] different MD for read 'SRR1822137.228451': '13T8GA4GT20' -> '13T8G0A4G0T20'
[bam_fillmd1] different MD for read 'SRR1822137.108273': 'T9T39' -> '0T9T39'
[bam_fillmd1] different MD for read 'SRR1822137.430621': '21T19A7C' -> '21T19A7C0'
[bam_fillmd1] different MD for read 'SRR1822137.374239': '6G8GC24^T5C3' -> '6G8G0C24^T5C3'
[bam_fillmd1] different MD for read 'SRR1822137.7023': '49G' -> '49G0'
[bam_fillmd1] different MD for read 'SRR1822137.153844': 'A7A15A25' -> '0A7A15A25'
[bam_fillmd1] different MD for read 'SRR1822137.153844': '49A' -> '49A0'
[bam_fillmd1] different MD for read 'SRR1822137.188115': 'A46' -> '0A46'
[bam_fillmd1] different MD for read 'SRR1822137.147298': 'T49' -> '0T49'
[bam_fillmd1] different MD for read 'SRR1822137.438430': '22TT3G20A1' -> '22T0T3G20A1'
[bam_fillmd1] different MD for read 'SRR1822137.3039': '32ATA1GA3T6G1' -> '32A0T0A1G0A3T6G1'
[bam_fillmd1] different MD for read 'SRR1822137.325144': '42C6G' -> '42C6G0'
[bam_fillmd1] different MD for read 'SRR1822137.254577': 'A49' -> '0A49'
[bam_fillmd1] different MD for read 'SRR1822137.68007': 'A23G1A23' -> '0A23G1A23'
[bam_fillmd1] different MD for read 'SRR1822137.435278': '14CA34' -> '14C0A34'
[bam_fillmd1] different MD for read 'SRR1822137.123068': 'A2G4A15C25' -> '0A2G4A15C25'
[bam_fillmd1] different MD for read 'SRR1822137.303383': '49A' -> '49A0'
[bam_fillmd1] different MD for read 'SRR1822137.155971': 'A31T17' -> '0A31T17'
[bam_fillmd1] different MD for read 'SRR1822137.100949': '49A' -> '49A0'
[bam_fillmd1] different MD for read 'SRR1822137.145484': 'G6C25G15' -> '0G6C25G15'
[bam_fillmd1] different MD for read 'SRR1822137.145484': '42C6T' -> '42C6T0'
[bam_fillmd1] different MD for read 'SRR1822137.310296': '22A8G7T3C3T1G' -> '22A8G7T3C3T1G0'
[bam_fillmd1] different MD for read 'SRR1822137.453342': 'T48' -> '0T48'
[bam_fillmd1] different MD for read 'SRR1822137.453342': '34C2A9^A2T' -> '34C2A9^A2T0'
...

Issues involve not only the beginning '0', but also the ending '0', and '0' between two or more bases. For example, '14CA34' should be '14C0A34'. According to Sequence Alignment/Map Optional Fields Specification, MD string should follow this format: MD:Z:[0-9]+(([A-Z]|\^[A-Z]+)[0-9]+)*, where + means >=1.

@mourisl
Copy link
Collaborator

mourisl commented May 8, 2024

Thanks for identifying this issue! We will fix it.

@mourisl mourisl self-assigned this Aug 15, 2024
@mourisl
Copy link
Collaborator

mourisl commented Aug 15, 2024

Sorry for the (much) delayed response...I've pushed an update to the li_dev8 branch. Could you please checkout this branch and give it a try? If it works, we will merge it to the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants