Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark results in NoneType error #131

Closed
archmageirvine opened this issue Aug 17, 2022 · 1 comment
Closed

Benchmark results in NoneType error #131

archmageirvine opened this issue Aug 17, 2022 · 1 comment

Comments

@archmageirvine
Copy link

Version : Truvari v3.4.0

Describe the bug :

Running a benchmark comparison between NIST NA24385 standard and output from a depth-based CNV caller results in a NoneType error. The following example is from output of the DRAGEN depth-based caller, but the same exception is observed on RTG segment outputs (but no exception is seen on CNVkit outputs). The exception occurs whether or not --pctsim=0 is used.

To Reproduce :

truvari bench -b hg002-truth/HG002_GRCh38_CMRG_SV_v1.00.vcf.gz -c dragen-cnv/NA24385.cnv.vcf.gz -f ../ref/38/Homo_sapiens_assembly38.fasta -o truvari-out2 --pctsim=0   
2022-08-17 02:30:12,354 [INFO] Running /home/sean/BIO-11269-xg-cnv-comparison/truvari/bin/truvari bench -b hg002-truth/HG002_GRCh38_CMRG_SV_v1.00.vcf.gz -c dragen-cnv/NA24385.cnv.vcf.gz -f ../ref/38/Homo_sapiens_assembly38.fasta -o truvari-out2 --pctsim=0
2022-08-17 02:30:12,354 [INFO] Params:
{
    "base": "hg002-truth/HG002_GRCh38_CMRG_SV_v1.00.vcf.gz",
    "comp": "dragen-cnv/NA24385.cnv.vcf.gz",
    "output": "truvari-out2",
    "reference": "../ref/38/Homo_sapiens_assembly38.fasta",
    "giabreport": false,
    "debug": false,
    "prog": false,
    "refdist": 500,
    "pctsim": 0.0,
    "minhaplen": 50,
    "pctsize": 0.7,
    "pctovl": 0.0,
    "typeignore": false,
    "use_lev": false,
    "chunksize": 1000,
    "gtcomp": false,
    "bSample": null,
    "cSample": null,
    "sizemin": 50,
    "sizefilt": 30,
    "sizemax": 50000,
    "passonly": false,
    "no_ref": false,
    "includebed": null,
    "extend": 0,
    "multimatch": false
}
2022-08-17 02:30:12,354 [INFO] Truvari version: 3.4.0
Traceback (most recent call last):
  File "/home/sean/BIO-11269-xg-cnv-comparison/truvari/bin/truvari", line 10, in <module>
    sys.exit(main())
  File "/home/sean/BIO-11269-xg-cnv-comparison/truvari/lib/python3.7/site-packages/truvari/__main__.py", line 85, in main
    TOOLS[args.cmd](args.options)
  File "/home/sean/BIO-11269-xg-cnv-comparison/truvari/lib/python3.7/site-packages/truvari/bench.py", line 799, in bench_main
    for call in itertools.chain.from_iterable(map(compare_chunk, chunks)):
  File "/home/sean/BIO-11269-xg-cnv-comparison/truvari/lib/python3.7/site-packages/truvari/bench.py", line 365, in chunker
    if not matcher.filter_call(entry, key == 'base'):
  File "/home/sean/BIO-11269-xg-cnv-comparison/truvari/lib/python3.7/site-packages/truvari/bench.py", line 157, in filter_call
    size = truvari.entry_size(entry)
  File "/home/sean/BIO-11269-xg-cnv-comparison/truvari/lib/python3.7/site-packages/truvari/comparisons.py", line 421, in entry_size
    elif entry.alts[0].count("<"):
TypeError: 'NoneType' object is not subscriptable

Expected behavior :

Maybe this kind of comparison is not supported by Truvari, but it would be nice to have a clearer message regarding the problem.

Example Data :

The data from the copy number caller are not sequence resolved, initial records:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG002_NA24385
chr1    818023  DRAGEN:REF:chr1:818023-1325928  N       .       51      PASS    END=1325928;REFLEN=507906       GT:SM:CN:BC:PE  ./.:1.05255:2:407:4,7
chr1    1325928 DRAGEN:GAIN:chr1:1325929-1342978        N       <DUP>   10      PASS    SVLEN=17050;SVTYPE=CNV;END=1342978;REFLEN=17050 GT:SM:CN:BC:PE  ./1:1.29637:3:17:7,11
chr1    1342979 DRAGEN:REF:chr1:1342979-2653404 N       .       66      PASS    END=2653404;REFLEN=1310426      GT:SM:CN:BC:PE  ./.:1.02852:2:1031:11,207
...

The benchmark is sequence resolved:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG002
chr1    1041776 .       CCGGCCAGTGCCAGGGTCGAGGTGGGCGGCTCCCCCGGGGGAGGGCTG        C       30      .       REPTYPE=CONTRAC;BREAKSIMLENGTH=143;REFWIDENED=chr1:1041679-1041868      GT:AD   1|1:0,2
chr1    23884345        .       T       TTAGAGGCGTGAGCCACTACACCTGGCTATGTATACTTTAGTGATTATTTTTCCTCAAAACATTAAATATTAAAACACTAAAGTAGTTACAATTGAAAAATTGAAAGTTAAGTGTATTAAAACTTAAAATATTACTTTACTATTATTATTTAAAATAGAGGCCGGATATGGTAGCTCACACCTATAATCCCAGCACTTTGGGAGGCCAACGCTGGCAGATCACTTGAGGTCAGGAGTTTGAGATCAGCCTGGCCAACATGGTGAAACCCAGTCTCTACTAAAAGTACAAAAAAAGAAAAAACAATTAGCTGAGAATGGTGGCTCATGTCCGTATTCCCAGCTATTCCAGAGTCTGAGGCACGAGAATCACTGGAATGTGGAAGGGGGAGGTTGCAGTGAGCCAAGATCGAGCCACTGCACTCCAGCCTGGGCGACAAAAACTGTCTCAAAAACAAACAAACAAACAAACAAAAAACTGACATTTAAAATAGAGACAGGGTCTTGCTTGTTTGTTGCCCAGAAGGCTGGCCTCAAGCAGTCCTCCCTCCTCAGTCTCTGAAATTGCTAGTATTACAGGCATGAGCCACCATGCCTGGCCCACAGTATTATTTTATAAAGTATAAAGATATACGTATTTTTAAAACTTCTTTTAAAATAAAAAGAATATATATACAAATAGATTTTTTCTTTTGAAGACAGGGTGTGGCTGTCGCCTAGGCTGGAGTCCAGTGGTGCAATCATAGCTCACTGCAGCCTCGAACTCCTGGCCTCAAGTGATCCTCCTTCCTCGCCATCCCAAAGCTCTGGGATTACAAGTATGAACCACTTGCACCTGGTCTGATCTAGTTTTTTAAGCATGAGAACTGGGCTGTGCTGTAAATGTGGCTTCTCTGAATTGAGATGGGCTATAGGTAGAAAATACACACTGAATTTCAAAGGTTACGTGAGCAGAAGAATATAAAATATCTCAATTTTTTATATCGATGACAAGTTGAGATGACAATATTTTGGAATATATTGGGTTAAATAAAAATTTGTTTCACCAGTTTCTTTTTATTTTTATTTTATTTTTTTGAGATGGAGTCTCGCTCTTGTTGGCCAGGCTGGAGTACAGTGGCACTATCTCGGCTCACTGTAACCTCCGCCTCCTGGGTTTAAGTGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTGCAGGCATATACCACCATGCCCGGCTAATTTTTGTATTTTTAGTACAGACGGGGTTTCACCATGTTGGTCAGGCTGGTCTCGGACTCTTGACCTCAGGTGATCCGCCCACCTCAGCCTCCCAAAGTGCTGGGATTACAGGGGTGAGCCACCATGCCCAGCCTATTTTAAAATGTTTAGATGTGTCTATGGAAAATTTTATTTTATTTTATTTTATTTATTTATTTATTTTTGAGACAGAGTCTCGCTCTGTCACCCGGGCTGGAGTGCAGTGGCCTGATCTCAGCTCACTACAAGCTCTGCTTCCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGACTACATGTGCCCGCCACCACGCCCGGCTAATTTTTTGTTTGTTTGTTTGTATTTTTAGTAGAGACAGGGTTTCACTGTGTTAGCCAGGATAGTCTTGATCTCCTGACCTCGTGATCCACCCGCCTCAGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACCGCACCGGCTGGAAAATTTTTAAATCATCCACGTGTCTCATAGTTTTGCGGGACATCTGTAGACCCCAGGAGAGCTGCCTGCTGATCAGAACTCATTCCCCAGCCTGGCTCAGCTGGATGCCTCACAGTTCCCAAACAGTCCTGTTTTCTTGCCTCCACGCCTTTGCACCAGGGGTCCTTTCCCTGGCGCACTTCCCTCCTGACCTTTCCTCTGGGCTCATCTCCTGCCCAGTGAAGCCTTCCCTGATCCTGCAGTCACATCTGAAGCACCCGGCCCGCTCTGCGTGTGTGGTAACTGGTGCTGCACTGGTCTTTCCCCTCGACTAGATCACTAACCACGCTGACCAACATTTGAATGGAATTTAAATTCCACAAAATATTTTTCACAGCCACCATCTAGTTTTGTTGTTGTTGTTGTTTTGAGATAGGGTCTCCCTCTGTCA        30      .       REPTYPE=SIMPLEINS;BREAKSIMLENGTH=1;REFWIDENED=chr1:23884346-23884346    GT:AD   1|1:0,2
chr1    25405592        .       T       TGCAATGAGCTATGATTGTACCACTGGGAAGTGACAAAGGGCACCCTGGGGGATTTCAAATGGTGGTGGCCCTGGTTTGGTGTTGCTGCCAGGTGAGTCCTTAAGCTATA  30      .       REPTYPE=SIMPLEDEL;BREAKSIMLENGTH=8393;REFWIDENED=chr1:25415381-25405617 GT:AD   1|1:0,2
...
@ACEnglish
Copy link
Owner

ACEnglish commented Aug 17, 2022

This error occurs whenever pysam (a wrapper around htslib) encounters a monomorphic reference variant call and there is an attempt to access ALT records. You can filter these sites and create a new input for truvari with the commands:

bcftools view -c 1 input.vcf  -O z -o new_file.vcf.gz
tabix new_file.vcf.gz

See 31e4a49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants