Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Accept user-defined tags #3256

Merged
merged 2 commits into from
Aug 12, 2024
Merged

Conversation

eseiler
Copy link
Member

@eseiler eseiler commented Jun 3, 2024

Resolves #3251

Copy link

vercel bot commented Jun 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
seqan3 ✅ Ready (Inspect) Visit Preview Jun 3, 2024 2:03pm

@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Jun 3, 2024
@eseiler eseiler force-pushed the feature/user_tags branch from 72f288d to 88e8642 Compare June 3, 2024 14:01
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Jun 3, 2024
Copy link
Contributor

@rrahn rrahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good. I just have some general comments/questions to the general parsing.
Thank you!

include/seqan3/io/sam_file/input_options.hpp Show resolved Hide resolved
include/seqan3/io/sam_file/format_sam.hpp Show resolved Hide resolved
{
consume_unsupported_tag_and_print_warning("HD", raw_tag);
parse_and_append_unhandled_tag_to_string(hdr.user_tags, raw_tag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for me to understand the format here. We store all parsed line field tags tab separated within a string. Or are official field tags from the specification stored in a different format?

Copy link
Member Author

@eseiler eseiler Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the header stuff (SQ, HD, RG) have strutcs that store the members defined in the SAM spec. Any user defined tags are stored as string. For the alignment records, tags, including User-defined, are stored in the SAM tag dictionary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but do I see it right that here we store the user defined field tags in a tab seperated string? If so, I feel like it would make more sense to put this into a vector of field tags instead.

Copy link
Member Author

@eseiler eseiler Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be better. Would be nice to store a struct with the name and value.

Anyway, I would do it as a followup, because it breaks api for the header tags where we already store the user tags in a string (SQ and RG).

If we do a vector, we should also throw on invalid formatting (it should be TAG:VALUE with some constraints on allowed characters) like samtools does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Would you mind tracking this on the issue board?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do when working on this issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eseiler eseiler marked this pull request as ready for review June 10, 2024 08:18
@eseiler eseiler requested a review from rrahn June 10, 2024 08:18
Copy link
Contributor

@rrahn rrahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Only updating the documentation of the Options struct which has currently no purpose. Thank you!!!

@eseiler eseiler force-pushed the feature/user_tags branch from 88e8642 to 32112bd Compare August 12, 2024 10:00
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Aug 12, 2024
@eseiler eseiler force-pushed the feature/user_tags branch from 697070f to d0eabcf Compare August 12, 2024 10:02
@eseiler eseiler enabled auto-merge August 12, 2024 10:03
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Aug 12, 2024
@seqan-actions
Copy link
Member

Documentation preview available at https://docs.seqan.de/preview/seqan/seqan3/3256

Copy link

codecov bot commented Aug 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.13%. Comparing base (5c917af) to head (d0eabcf).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3256      +/-   ##
==========================================
- Coverage   98.14%   98.13%   -0.01%     
==========================================
  Files         271      271              
  Lines       11955    11951       -4     
  Branches      104      104              
==========================================
- Hits        11733    11728       -5     
- Misses        222      223       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eseiler eseiler merged commit 1fcf9a6 into seqan:main Aug 12, 2024
40 checks passed
@eseiler eseiler deleted the feature/user_tags branch August 12, 2024 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unsupported sam headers get silently stripped
3 participants