Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as_symbolic_or_explicit_according_to_size - gives different results depending if starts as symbolic or not #1214

Open
davmlaw opened this issue Dec 17, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Dec 17, 2024

seq = 'AGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGCACAATCTTGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACACCATTCTCCTGCCTCAGCCTCCCGAGTAGCCGGGACTACAGGCGCCCACCACCACGCCCAGCTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTAGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCACCCACCTCGGCCTCCCAAAGCACTGGGATTACAGGCATGAGCCACCGCGCCGAGCCCCAAGACCTTTCTTTATTACCAGGGCTTCCACAGACCTGACACATGGTAGTTCCTCAATAAATAATTGCAGAATTACTGAAAAATTTTACTGTTAACTTAGGCAGTGGTAAAACCATTGTTTGGTAGCTCAGAACTCAGCAAGTAAATAGCAACATTTGCTGGAAGAACAGATAGTTTTTCAAATCCAATTCAAGGACTGGGTATGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCAGGCGTATCCAGGAGTTCGAGACTAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAAAATACAAAATTAGCCAGGTGTGGTGGTGGGCACCTGTAATCTCAGCTACTTGGGAGGCTGAGGCAGGAGAATCGCTTGAACCTGGTAGGCGGAGGTTGTAGTGAGCTGAGATTGTGCCATTGCTCTCCAGCCTGGGAAACAAGAGCAAAACTCCGTCTCAAAAAAAAAAAAAATCCAATTCAAATGATTATGGAAGTAGTGGAGAAATAAACAGGAAAATGATAAATAATTAAGATAATATATAATATGGCTATATTTTAATCTATTGTTGATATGATTTTCTCTTTTCCCCTTGGGATTAGTATCTATCTCTCTACTGGATATTAATTTGTTATATTTTCTCATTAGAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGG'
history

In [9]: vc = VariantCoordinate(chrom='NC_060927.1', position=37007425, ref='A', alt=seq)

In [10]: vc.as_internal_symbolic(GenomeBuild.t2tv2())
WARNING ClinGen does not support build T2T-CHM13v2.0
Out[10]: VariantCoordinate(chrom='NC_060927.1', position=37007425, ref='A', alt='<DUP>', svlen=999)

In [11]: vc.as_internal_symbolic(GenomeBuild.t2tv2()).as_symbolic_or_explicit_according_to_size(GenomeBuild.t2tv2())
WARNING ClinGen does not support build T2T-CHM13v2.0
Out[11]: VariantCoordinate(chrom='NC_060927.1', position=37007425, ref='A', alt='AGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGCACAATCTTGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACACCATTCTCCTGCCTCAGCCTCCCGAGTAGCCGGGACTACAGGCGCCCACCACCACGCCCAGCTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTAGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCACCCACCTCGGCCTCCCAAAGCACTGGGATTACAGGCATGAGCCACCGCGCCGAGCCCCAAGACCTTTCTTTATTACCAGGGCTTCCACAGACCTGACACATGGTAGTTCCTCAATAAATAATTGCAGAATTACTGAAAAATTTTACTGTTAACTTAGGCAGTGGTAAAACCATTGTTTGGTAGCTCAGAACTCAGCAAGTAAATAGCAACATTTGCTGGAAGAACAGATAGTTTTTCAAATCCAATTCAAGGACTGGGTATGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCAGGCGTATCCAGGAGTTCGAGACTAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAAAATACAAAATTAGCCAGGTGTGGTGGTGGGCACCTGTAATCTCAGCTACTTGGGAGGCTGAGGCAGGAGAATCGCTTGAACCTGGTAGGCGGAGGTTGTAGTGAGCTGAGATTGTGCCATTGCTCTCCAGCCTGGGAAACAAGAGCAAAACTCCGTCTCAAAAAAAAAAAAAATCCAATTCAAATGATTATGGAAGTAGTGGAGAAATAAACAGGAAAATGATAAATAATTAAGATAATATATAATATGGCTATATTTTAATCTATTGTTGATATGATTTTCTCTTTTCCCCTTGGGATTAGTATCTATCTCTCTACTGGATATTAATTTGTTATATTTTCTCATTAGAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGG', svlen=None)

Discovered due to VCF import issue - long seq had not been imported:

'AGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGCACAATCTTGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACACCATTCTCCTGCCTCAGCCTCCCGAGTAGCCGGGACTACAGGCGCCCACCACCACGCCCAGCTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTAGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCACCCACCTCGGCCTCCCAAAGCACTGGGATTACAGGCATGAGCCACCGCGCCGAGCCCCAAGACCTTTCTTTATTACCAGGGCTTCCACAGACCTGACACATGGTAGTTCCTCAATAAATAATTGCAGAATTACTGAAAAATTTTACTGTTAACTTAGGCAGTGGTAAAACCATTGTTTGGTAGCTCAGAACTCAGCAAGTAAATAGCAACATTTGCTGGAAGAACAGATAGTTTTTCAAATCCAATTCAAGGACTGGGTATGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCAGGCGTATCCAGGAGTTCGAGACTAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAAAATACAAAATTAGCCAGGTGTGGTGGTGGGCACCTGTAATCTCAGCTACTTGGGAGGCTGAGGCAGGAGAATCGCTTGAACCTGGTAGGCGGAGGTTGTAGTGAGCTGAGATTGTGCCATTGCTCTCCAGCCTGGGAAACAAGAGCAAAACTCCGTCTCAAAAAAAAAAAAAATCCAATTCAAATGATTATGGAAGTAGTGGAGAAATAAACAGGAAAATGATAAATAATTAAGATAATATATAATATGGCTATATTTTAATCTATTGTTGATATGATTTTCTCTTTTCCCCTTGGGATTAGTATCTATCTCTCTACTGGATATTAATTTGTTATATTTTCTCATTAGAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGG' Traceback (most recent call last): File "/opt/variantgrid/upload/tasks/vcf/import_vcf_step_task.py", line 73, in run items_processed = self.process_items(upload_step) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/variantgrid/upload/tasks/vcf/unknown_variants_task.py", line 100, in process_items bulk_inserter.process_vcf_record(v) File "/opt/variantgrid/upload/tasks/vcf/unknown_variants_task.py", line 54, in process_vcf_record self.variant_pk_lookup.add(variant_coordinate) File "/opt/variantgrid/snpdb/variant_pk_lookup.py", line 132, in add variant_hash = self.get_variant_coordinate_hash(variant_coordinate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/variantgrid/snpdb/variant_pk_lookup.py", line 123, in get_variant_coordinate_hash alt_id = self.sequence_pk_by_seq[variant_coordinate.alt] ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^ KeyError: 'AGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGCACAATCTTGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACACCATTCTCCTGCCTCAGCCTCCCGAGTAGCCGGGACTACAGGCGCCCACCACCACGCCCAGCTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTAGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCACCCACCTCGGCCTCCCAAAGCACTGGGATTACAGGCATGAGCCACCGCGCCGAGCCCCAAGACCTTTCTTTATTACCAGGGCTTCCACAGACCTGACACATGGTAGTTCCTCAATAAATAATTGCAGAATTACTGAAAAATTTTACTGTTAACTTAGGCAGTGGTAAAACCATTGTTTGGTAGCTCAGAACTCAGCAAGTAAATAGCAACATTTGCTGGAAGAACAGATAGTTTTTCAAATCCAATTCAAGGACTGGGTATGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCAGGCGTATCCAGGAGTTCGAGACTAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAAAATACAAAATTAGCCAGGTGTGGTGGTGGGCACCTGTAATCTCAGCTACTTGGGAGGCTGAGGCAGGAGAATCGCTTGAACCTGGTAGGCGGAGGTTGTAGTGAGCTGAGATTGTGCCATTGCTCTCCAGCCTGGGAAACAAGAGCAAAACTCCGTCTCAAAAAAAAAAAAAATCCAATTCAAATGATTATGGAAGTAGTGGAGAAATAAACAGGAAAATGATAAATAATTAAGATAATATATAATATGGCTATATTTTAATCTATTGTTGATATGATTTTCTCTTTTCCCCTTGGGATTAGTATCTATCTCTCTACTGGATATTAATTTGTTATATTTTCTCATTAGAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGG'

This was the VCF record:

3	37007425	1735212	A	AGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGCACAATCTTGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACACCATTCTCCTGCCTCAGCCTCCCGAGTAGCCGGGACTACAGGCGCCCACCACCACGCCCAGCTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTAGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCACCCACCTCGGCCTCCCAAAGCACTGGGATTACAGGCATGAGCCACCGCGCCGAGCCCCAAGACCTTTCTTTATTACCAGGGCTTCCACAGACCTGACACATGGTAGTTCCTCAATAAATAATTGCAGAATTACTGAAAAATTTTACTGTTAACTTAGGCAGTGGTAAAACCATTGTTTGGTAGCTCAGAACTCAGCAAGTAAATAGCAACATTTGCTGGAAGAACAGATAGTTTTTCAAATCCAATTCAAGGACTGGGTATGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCAGGCGTATCCAGGAGTTCGAGACTAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAAAATACAAAATTAGCCAGGTGTGGTGGTGGGCACCTGTAATCTCAGCTACTTGGGAGGCTGAGGCAGGAGAATCGCTTGAACCTGGTAGGCGGAGGTTGTAGTGAGCTGAGATTGTGCCATTGCTCTCCAGCCTGGGAAACAAGAGCAAAACTCCGTCTCAAAAAAAAAAAAAATCCAATTCAAATGATTATGGAAGTAGTGGAGAAATAAACAGGAAAATGATAAATAATTAAGATAATATATAATATGGCTATATTTTAATCTATTGTTGATATGATTTTCTCTTTTCCCCTTGGGATTAGTATCTATCTCTCTACTGGATATTAATTTGTTATATTTTCTCATTAGAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGG	.	.	ALLELEID=1793551;CLNDISDB=MONDO:MONDO:0015356,MeSH:D009386,MedGen:C0027672,Orphanet:140162;CLNDN=Hereditary_cancer-predisposing_syndrome;CLNHGVS=NC_000003.12:g.37006055_37007053dup;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=Duplication;CLNVCSO=SO:1000035;GENEINFO=MLH1:4292;MC=SO:0001574|splice_acceptor_variant;ORIGIN=1
@davmlaw davmlaw added the bug Something isn't working label Dec 17, 2024
davmlaw added a commit that referenced this issue Dec 17, 2024
@davmlaw
Copy link
Contributor Author

davmlaw commented Dec 17, 2024

I did a quick fix and patched to VG test to be able to re-import T2T clinvar

TODO: Make a unit test for this using above example

@davmlaw
Copy link
Contributor Author

davmlaw commented Dec 23, 2024

Made unit test, and what do you know found a bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant