-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small fixes for TRGT #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you kindly! Sorry for the state of the onion; we haven't had a contribution here in a while. Cool, we can try to clean up a bit in time to make it easier to piece in new stuff!
if not repeat_id: | ||
repeat_id = variant_info['info_dict'].get('TRID').split('_')[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like (its only two repetitions so far) a function like
def get_repeat_id(variant_info):
would be welcome, now that its a little bit more code. Besides it lowers apparent complexity with the less C-like and PEPpy.
if this:
do this thing
return this_thing
if that:
do that thing
return that_thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely!
stranger/utils.py
Outdated
@@ -236,6 +249,10 @@ def get_trgt_repeat_res(variant_info, repeat_info): | |||
for allele in mc.split(","): | |||
mcs = allele.split('_') | |||
# GT would have the index of the MC in the ALT field list if we wanted to be specific... | |||
|
|||
# What should we do if MC is . ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I hadn't seen that one? Is that from TRGT or sth from STRdust? Do they mean completely missing from motifs, or is it a shorthand for as-reference? ExHu was kind enough to calculate and pass the reference size given the current locus definitions, but I guess for TRGT we would perhaps pass it from the repeat definition file.
We should check what the callers intended here I think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for TRGT. Probably when the repeat could not be called. Some examples:
chr16 17470908 . GCCGCCGCCGCCGCCC . 0 . TRID=XYLT1;END=17470923;MOTIFS=GCC;STRUC=(GCC)n GT:AL:ALLR:SD:MC:MS:AP:AM .:.:.:.:.:.:.:.
chr16 24613440 . TTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTAT . 0 . TRID=TNRC6A;END=24613530;MOTIFS=TTTTA,TTTCA;STRUC=(TTTTA)n(TTTCA)n(TTTTA)n GT:AL:ALLR:SD:MC:MS:AP:AM .:.:.:.:.:.:.:.
chr16 66490399 . TAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAATAAAATAAAAA . 0 . TRID=BEAN1;END=66490467;MOTIFS=TGGAA,TAAAA;STRUC=(TGGAA)n(TAAAA)n GT:AL:ALLR:SD:MC:MS:AP:AM .:.:.:.:.:.:.:.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahoy! Yes, could be missing or reality saying hi with some unexpected alleles. I guess we also have to treat it as no-call rather than reference. It's unfortunate that TRGT does seem to set QUAL==0
and no FILTER=='.'
for everything. For ExHu we did set the size to 0 for .
so I guess we might as well here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run a test-dataset with reads that only covers a few of the pathogenic repeats.
I have made some updates the code @dnil, let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
repeat_id = variant_info['info_dict'].get('REPID') | ||
if not repeat_id: | ||
repeat_id = variant_info['info_dict'].get('TRID').split('_')[1] | ||
repeat_id = get_repeat_id(variant_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
|
||
# What should we do if MC is . ? | ||
if allele == ".": | ||
repeat_res.extend([0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Co-authored-by: Daniel Nilsson <[email protected]>
Let me know if it's ok to merge. Would be great if we could get a bioconda update as well, then I could add Stranger to Nallo (with TRGT for now). |
I think this is good to go, but I noticed we have no Changelog-missing warning automation! 😁 |
This PR adds | fixes:
STR_STATUS=no_call
or similar?How to prepare for test:
ssh
to ...bash servers/resources/SERVER.scilifelab.se/update-[THIS_TOOL]-stage.sh [THIS-BRANCH-NAME]
How to test:
Expected outcome:
Review:
This version is a: