Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first version of time-constrained ORC-WER (tcORC-WER) #52

Merged
merged 19 commits into from
Jan 30, 2024
Merged

Conversation

thequilo
Copy link
Member

This PR adds code to compute the tcORC-WER.

Example:

python -m meeteval.wer tcorcwer -h hyp.stm -r ref.stm --collar 5

The current version is tested on Libri-CSS with a system that produces 8 streams. It finished computation within 10 minutes and used less than 2GB of RAM (which is a huge improvement over ORC-WER!). These requirements should drop further when the number of streams is smaller.

The code is not fully optimized yet and contains many TODOs. I'll work on some of these TODOs and update the PR during the next few days.

I'll merge main back into this PR once #50 is merged.

# Add a segment index to the reference so that we can later find words that
# come from the same segment
for i, s in enumerate(reference):
s['segment_index'] = i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Should the assignment be before the filter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't make a difference because the segment index is here just used for grouping and not for sorting. But it could be easier to understand what's happening when moved before the filter operation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The filter will remove empty segments. So the assignment will not be valid for the input of this function.

Should the assignment also consider the empty segments? Or do we drop the empty segments in the apply assignment function?

We could add this to a ToDo list and solve this later. For the challenge start it is not important.

@thequilo thequilo merged commit f3d92c7 into main Jan 30, 2024
6 checks passed
@thequilo thequilo deleted the tcorcwer branch September 3, 2024 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants