You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
0 => array:4 [▼
"start" => 0.009
"end" => 0.469
"text" => " million dollars."
"words" => array:3 [▶]
] Skiped
1 => array:4 [▼
"start" => 20.396
"end" => 23.658
"text" => " You know, and a lot of rich people, there's nothing that I buy that brings me happiness."
"words" => array:17 [▶]
]
whisperx 2.wav --model large-v2 --chunk_size 20
Correct Output:
0 => array:4 [▼
"start" => 0.009
"end" => 0.55
"text" => " million dollars."
"words" => array:2 [▶]
]
1 => array:4 [▼
"start" => 0.59
"end" => 4.473
"text" => "You know, a lot of people equate money to happiness and it's not, it's more freedom."
"words" => array:16 [▶]
]
This is quite strange behaviour, I can't understand why this happens. Notice the sentence at the beginning of the audio and at 20 seconds are very similar. But when arg "chunk_size 30" (default) it skips the first 20 second. Setting "chunk_size 20" or "chunk_size 10" helps. But why does this happen or is it normal whisperx behaviour, what do you think about it?
The text was updated successfully, but these errors were encountered:
whisperx 2.wav --model large-v2
Inorrect Output:
0 => array:4 [▼
"start" => 0.009
"end" => 0.469
"text" => " million dollars."
"words" => array:3 [▶]
]
Skiped
1 => array:4 [▼
"start" => 20.396
"end" => 23.658
"text" => " You know, and a lot of rich people, there's nothing that I buy that brings me happiness."
"words" => array:17 [▶]
]
whisperx 2.wav --model large-v2 --chunk_size 20
Correct Output:
0 => array:4 [▼
"start" => 0.009
"end" => 0.55
"text" => " million dollars."
"words" => array:2 [▶]
]
1 => array:4 [▼
"start" => 0.59
"end" => 4.473
"text" => "You know, a lot of people equate money to happiness and it's not, it's more freedom."
"words" => array:16 [▶]
]
This is quite strange behaviour, I can't understand why this happens. Notice the sentence at the beginning of the audio and at 20 seconds are very similar. But when arg "chunk_size 30" (default) it skips the first 20 second. Setting "chunk_size 20" or "chunk_size 10" helps. But why does this happen or is it normal whisperx behaviour, what do you think about it?
The text was updated successfully, but these errors were encountered: