Query regarding sclite_mode #22

feddybear · 2024-04-22T05:55:40Z

I don't know if I understand it correctly, but it seems like sclite_mode doesn't really do anything. I tried it with many reference-hypotheses pairs and the results will always be the same whether I set it to boolean True or False.

For reference, here's a small script I tested:

refs = [('a', 'b', 'c'), ('d', 'e', 'f')]
hyps = [('a', 's', 'x', 'c'), ('e', 'f', 'f')]
EPS = '*'
for ref, hyp in zip(refs, hyps):
    print(align(ref, hyp, EPS))
    print(edit_distance(ref, hyp, sclite_mode=False))
    print(edit_distance(ref, hyp, sclite_mode=True))

print(edit_distance(refs, hyps, sclite_mode=False))
print(edit_distance(refs, hyps, sclite_mode=True))
ans = bootstrap_wer_ci(refs, hyps)
print({"wer": ans["wer"], "ci95": ans["ci95"], "ci95min": ans["ci95min"], "ci95max": ans["ci95max"]})

and these are what gets printed:

[('a', 'a'), ('b', 's'), ('*', 'x'), ('c', 'c')]
{'ins': 1, 'del': 0, 'sub': 1, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
{'ins': 1, 'del': 0, 'sub': 1, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}

[('d', '*'), ('e', 'e'), ('f', 'f'), ('*', 'f')]
{'ins': 1, 'del': 1, 'sub': 0, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
{'ins': 1, 'del': 1, 'sub': 0, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}

For both cases above, the result seems to be giving the same penalty.

{'ins': 0, 'del': 0, 'sub': 2, 'total': 2, 'ref_len': 2, 'err_rate': 1.0}
{'ins': 0, 'del': 0, 'sub': 2, 'total': 2, 'ref_len': 2, 'err_rate': 1.0}
{'wer': 0.6666666666667462, 'ci95': 0.0, 'ci95min': 0.6666666666667462, 'ci95max': 0.6666666666667462}

The text was updated successfully, but these errors were encountered:

pzelasko · 2024-04-23T00:30:05Z

@desh2608 would you happen to have any test cases back from when you added the feature? Not sure if it’s a regression or something else.

desh2608 · 2024-04-23T01:06:33Z

SCLITE weighs ins, del, and sub as 3, 3, and 4 instead of equally. In most cases, however, I think the resulting edit distance would be the same. I had tried constructing some test cases for this but couldn't. I would be curious to see if someone can come up with examples where it would make a difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query regarding sclite_mode #22

Query regarding sclite_mode #22

feddybear commented Apr 22, 2024

pzelasko commented Apr 23, 2024

desh2608 commented Apr 23, 2024

Query regarding sclite_mode #22

Query regarding sclite_mode #22

Comments

feddybear commented Apr 22, 2024

pzelasko commented Apr 23, 2024

desh2608 commented Apr 23, 2024