-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update timestamps using diff alignment algorithm #144
Conversation
Thanks @murezzda , very interesting approach! There's a module, we just open sourced See more details in this comment in this issue #30 on how I've tried to use it with the In this case that module is able to do an interpolation of the new inserted words, and transpose the time-codes for the matching once. As you can see explained in the README I think you are onto something, and there might be a way to combine elements of these two things for an optimal approach. |
Ideally we could devise later (as in: accept this PR and refactor later) a mechanism to swap implementations of various timestamp correcting things, like a plugin style or something. Such that we could have: this, stt-align-node, remote API call (Gentle, Aeneas, Speechmatics aligner) |
Thanks for the input! I think for the next step I will check out |
I've added now the What I've experienced so far with |
Thanks @murezzda ! See if this can help with the testing, on the Ted website, there is the "human accurate" transcript for this ted Talk. Here for convenience, with line spaces, and time-codes removed
I've also noticed that the colour highlight in Other than that, seems pretty accurate! |
fffe4ba
to
80c9f34
Compare
Update for this branch:
|
sounds good, re not merging the two diffing approaches. As for point 2, it could either be done on a "final save" (eg just before sending to the server) or more granularly at paragraph level, eg once user moves onto second paragraph, and see if this doesn't loose the focus. Or the focus could be restored after the update. I also think there might be tweaks to address this issue #150 that will effect this PR. Thanks @murezzda will have a closer look as soon as I get a chance. |
6f07327
to
96f6e6d
Compare
… instead of doing a 2-step process.
…ords. Timestamp update function now also uses the alignWords function directly instead of alignJSONText, removing some overhead.
80c9f34
to
91cccd1
Compare
Selection is now preserved when the timestamp is updated, which improves the usability significantly. The update of the selection state is done by mapping the old selection state on the new state. Performance wise the impact of moving the selection state seems neglible. |
Regarding the overall performance of the update, it still introduces a lag if the file is long (such as the Kate Darling TED Talk). I took some measurements and it seems the alignment of the timestamps is not the main factor, the conversion and updating of the state also takes at least as much time as the alignment: Conversion of state to Raw: 33.01611328125ms As previously mentioned, updating the selection takes about 42.7 ms in total and thus has about the same impact on the performance as all other functions used here. |
…uring timestamp update
Thanks for this @murezzda |
Hi @pietrop The problem is that the conversion to and from raw anyway have to happen, thus it would only be possible to save time in the STT alignment section. I've also thought about checking if something changed in the transcript and only align if that is the case, but it looks to me that obtaining the information where and if something has changed is more or less as expensive as just aligning the text itself. Whats your opinion, should we try to improve the performance or could we also handle this by placing the call to the realignment at some different point in the application? At the moment, the realignment happens whenever the current state of the transkript is saved. Regarding the measurements, I've used the console.time('test') and console.timeEnd('test') methods. Do you know if these are reliable for react? |
Hi @murezzda, Altho perhaps instead of the automatic save( that happens when people stop typing), we could place it after the manual save btn? And add a way so that when a parent component exports the content, a realignment is done if text has been changed since the last one - if that makes sense. Re measurements, other might be able to advise? (@Laurian @jamesdools @barneyboo ) In this PR #160 where we looked at the unnecessary re-renders I made some notes and one of the things I found was the React DevTools tools profiler - see in react docs but I am not sure if/how to use that to measure that type of performance? but might be worth looking into it. |
Hi @pietrop I've now tried to put the update as a timer into the Whats a bit of a problem with this is that it can be that unaliged text can be automatically saved, but as the alignment anyways happens on the "original" transcript, this should not be that big of a problem. Especially if the alignment is also triggered by the manual save button, this gives the user a way to trigger the aligment explicitly. Let me know what you think of this. If this would be a viable solution, I would clean up the code and give it up for review. |
@murezzda this sounds great! |
…ter 5 seconds if the transcript has been edited or if the user saves the transcript manually with the save button
@pietrop cleanup done, ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this @murezzda!! it's looking good
Testing - Ted talk
I found the "highlighted word" lagged behind a little bit, so reading along while playing wasn’t necessary a good measure for accuracy of the realignment. So was most efficient to test by double clicking on a word and checking if it plays from the corresponding sound.
Other then that, in my testing with the Ted Talk, it seemed like a pretty accurate realignment (expect for the last word of most paragraph, and at the end of the transcript).
Timings in last word in paragraphs
While doing the test described above, it seems like the timings of the the second last word include the timings of the last word. So if you double click on the last word of paragraph, you don’t get the word but the pause before the next paragraph is spoken.
There might be possible tweaks on the underlying algorithm to improve this, if is a systematic thing.(seems to happen in the majority of paragraphs but not 100% all of them)
this could be addressed in a subsequent PR
Difference between pasting and typing
if you paste text, there might not be entities for those words, and somehow the data attribute is empty,
if you type and insert some words the speaker name at paragraph level doesn’t get replaced.
To test using the ted talk, In the demo app, I have deleted all of the text, and pasted the accurate transcript from the Ted Talk.
I seem to get NaN:NaN:NaN:
for time-codes and no speaker names.
I think there might be two extra steps needed
- one is adding start timecode attribute to the draftJS blocks after realignment
- two is seeing if we can retain the speak names
Made a separate PR with some more info and possible workarounds murezzda#1
this could be addressed in a subsequent PR
Questions
- inside
updateTimestamps
would it make a difference if we were to align at paragraph/block level?
Other thoughts
- Just to recap
timestampTimer
is setup similar tosaveTimer
and it gets triggered 5 seconds after the user stops typing. AndupdateTimestampsForEditorState
can also be triggered by the save btn.
Possible separate PRs
-
could also consider (as separate PR) adding a rsync btn eg 🔄with on hover, something about "restrore timecodes for newly corrected words" - or something similar? if making the resync more explict, altho it might be confusing for users? so might want to just "hide it behind the save".
-
As a separate PR, we could add a way to make the restoring timecodes an option from the outer component when retrieving data.
So that when saving to server there’s more certainty on whether there are time-codes for each word.
Next
As mentioned, would be ideal to merge this PR #160 into master first, around reducing unnecessary re-renders but we are a bit stretch in terms of finding someone to review it on our end before merging it in, let me know if you got time to have a look and leave some comments (if needed) it be interesting to hear your thoughts on some of those choices.
@@ -46,8 +46,12 @@ | |||
"@fortawesome/free-solid-svg-icons": "^5.6.3", | |||
"@fortawesome/react-fontawesome": "^0.1.3", | |||
"babel-polyfill": "^6.26.0", | |||
"diff-match-patch": "^1.0.4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"diff-match-patch": "^1.0.4", |
diff-match-patch
doesn’t seem to be in use in package.json could be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is from the previous attempt to fix the timecodes via the diff-match-patch lib. Thanks for finding this.
import alignWords from './stt-align-node.js'; | ||
|
||
const convertContentToText = (content) => { | ||
var text = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var text = []; | |
let text = []; |
could replace var
with let
some background here
const result = alignWords( entities, currentText); | ||
|
||
const newEntities = result.map((entry) => { | ||
return createEntity(entry.start, entry.end, 0.0, entry.word, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In createEntity
the index is always -1
?
Would this be a fix?
const newEntities = result.map((entry, index) => {
return createEntity(entry.start, entry.end, 0.0, entry.word, index);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be a fix. At the moment the index is not set in any other way.
Easiest fix for being more certain when retrieving data from the outer component that there are word level timings for each word is to make the restoring time-codes part of the export process. Motivation as mentioned above is so that when saving to server there’s more certainty on whether there are time-codes for each word. Possible solution is to add a step in getEditorContent(exportFormat) {
const format = exportFormat || 'draftjs';
++ this.updateTimestampsForEditorState();
return exportAdapter(convertToRaw(this.state.editorState.getCurrentContent()), format);
} If done this way, it could be part of this PR. Just for background context, the current way of getting the editor's content from a parent component can be seen in demo app ( const { data, ext } = this.transcriptEditorRef.current.getEditorContent(
this.state.exportFormat
); |
I've run a test with this interview From PBS Frontline Documentary The Facebook Dilemma, Soleio Cuervo, Former Facebook Product Designer steps to reproduce
resultsThe alignment was very good, all the way to the end. 🎉 CaveatsWith this algorithm the alignment is only as good as the accuracy of the STT.
To try this first hand, used
It starts good, and then It drifts out of sync, incrementally. It is out of sync by a paragraph or more by the end of the interview. For this edge case, where the STT or the recognition might be a bit off, we might not have enough insights on why it's going out of sync, is it just the diffing that doesn't work? is there something that can be added to the algo to handle that etc..? To note is that for now this might be a relatively small edge case, because with decent STT and decent recordings the problem might be mitigated. |
* moved header into a component with shouldComponentUopdate false to avoid unecessary re-renders - test * moved Header component into separate file +removed unused styling, commented out for now in case it's needed, eg in mobile view? * moved playback_rates const into separate file * optimised re-render of playback rate * optimise re-render for VideoPlayer * added some notes - draft on how to prevent uncessary re-renders in React * Added some comments * small note in docs about using console.log in render to measure performance * ammend to notes * Update 2019-05-16-prevent-unnecessary-re-renders-in-react.md * notes update * notes fix * trying out why-did-you-update * updated MediaPlayer and subcomponents * made ToolTip 'how does this work' into it's own component * updated Demo app to reduce unecessary re-renders * added react-visibility-sensor * refactor settings * removed unecessary attributes from state of components + WrapperBlock performance tweak using react-visibility-sensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this is really really cool! 🙌 really appreciate your time on this.
I think most of my comments are about readability of the code - it's quite a lot to digest. I know this has been sitting here for a long time so I don't wish to block this for too much longer while we have Pietro's time.
I approve but would appreciate if a little bit of time went into refactoring - mostly around variable and function naming! A few are quite verbose whilst also not being very descriptive - so could cut them down - usually in the function scope it's quite clear. On the other hand, some are a little too generic. A hard balance, it's something we need to tackle elsewhere in the repo too 😄
But nitpicking aside this is very awesome 👍
// http://borischumichev.github.io/everpolate/#linear | ||
const outStartTimes = everpolate.linear(indicies, indiciesWithStart, startTimes); | ||
const outEndTimes = everpolate.linear(indicies, indiciesWithEnd, endTimes); | ||
words = words.map((word, index) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to assign this to a new variable in memory as map returns a new array.
const startTimes = []; | ||
const endTimes = []; | ||
// interpolate times for start | ||
for (let i = 0; i < words.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not trying to be a stickler here! But perhaps an ES6 approach might help clear up the code a bit.
eg:
words.forEach((word, index) => {
if ('start' in word) {
indiciesWithStart.push(index);
startTimes.push(word.start);
}
if ('end' in word) {
indiciesWithEnd.push(index);
endTimes.push(word.end);
}
});
(written in the PR box so would test it first) 😛
loadData() { | ||
if (this.props.transcriptData !== null) { | ||
const blocks = sttJsonAdapter(this.props.transcriptData, this.props.sttJsonType); | ||
this.setState({ originalState: convertToRaw(convertFromRaw(blocks)) }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing something here I think - can we rename this better? Makes it sound like it's a deep copy of the state object, but it's the transcript text right. Should we also declare it in constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to check with @murezzda what was the thinking around this, I am not quiet sure.
this feels like it was needed as a deep copy? but can't remember why?
this.setState({ originalState: convertToRaw(convertFromRaw(blocks)) });
```
and `originalState` could be declared in constructor as well?
|
||
/** | ||
* | ||
* @param {array} sttData - array of STT words |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this comment to sttWords or change the param name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more important to be clear whether it's an array of word objects (inc start / end times) or just strings
// transcriptData = [{} for _ in range(len(transcriptWords))] | ||
const transcriptData = []; | ||
// empty objects as place holder | ||
transcriptWords.forEach(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear why we need to make an empty array of objects this way.
If we splice out the equal matches later (#L155), then the order (exact indices) doesn't matter, right?
Does it make sense to flip it - so we just collect the ones that don't match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it's coz originally this algo is mirroring the python code form @chrisbaume https://github.com/bbc/stt-align - https://github.com/bbc/stt-align-node
Ideally we could tweak it, refactor it there, and then bring it in via npm.
So leaving this for now, before they start to diverge too much.
@@ -0,0 +1,82 @@ | |||
import generateEntitiesRanges from '../../../stt-adapters/generate-entities-ranges/index.js'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can omit the .js extensions like elswhere 👍
const convertContentToText = (content) => { | ||
var text = []; | ||
|
||
for (var blockIdx in content.blocks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reads weird - blockIndex or blockId? contextually 'index' is fine tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going for blockIndex
text = text.concat(blockArray); | ||
} | ||
|
||
return (text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no parentheses needed
return (text); | ||
}; | ||
|
||
const createEntity = (start, end, confidence, word, wordIdx) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wordIdx -> wordIndex, or index
|
||
const createContentFromEntityList = (currentContent, newEntities) => { | ||
// Update entites to block structure. | ||
var updatedBlockArray = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not using var, as per Pietro's comment 👍
Awesome! thanks @jamesdools I'll have a closer look and add those changes as a separate PR to #175 branch tomorrow |
Fixed from James comments from #144 (review)
addressed @jamesdools comments in PR #144 at commit #144 (review) so closing this PR |
* added support for docx - needs better integration altho it works * adding timecodes and speakers to plain text export * develop: Fix 159 performance problem (#171) * moved header into a component with shouldComponentUopdate false to avoid unecessary re-renders - test * moved Header component into separate file +removed unused styling, commented out for now in case it's needed, eg in mobile view? * moved playback_rates const into separate file * optimised re-render of playback rate * optimise re-render for VideoPlayer * added some notes - draft on how to prevent uncessary re-renders in React * Added some comments * small note in docs about using console.log in render to measure performance * ammend to notes * Update 2019-05-16-prevent-unnecessary-re-renders-in-react.md * notes update * notes fix * trying out why-did-you-update * updated MediaPlayer and subcomponents * made ToolTip 'how does this work' into it's own component * updated Demo app to reduce unecessary re-renders * added react-visibility-sensor * refactor settings * removed unecessary attributes from state of components + WrapperBlock performance tweak using react-visibility-sensor * develop: Update timestamps diff (#172) * Added timestamp update via diff tool * Added missing function * Commited intermediate state * Rewrote timestamp alignment and differ to be integrated in each other instead of doing a 2-step process. * Update Timestamps now works correctly. * Fixed errors from rebase, removed debug code * Moved UpdateTimestamp into its own folder. * added updateTimestampsSSTAlign which updates the timestamps with the sst-align code * Added documentation * Merged timer for updating the timestamps and local save. * Selection state is now kept across updates to timestamps * Fixed bug where words with punctuation always are considered as new words. Timestamp update function now also uses the alignWords function directly instead of alignJSONText, removing some overhead. * Fixed small bug which raised an error if an empty block was present during timestamp update * Changed time of timestamp-update. Now re-calculates the timestamps after 5 seconds if the transcript has been edited or if the user saves the transcript manually with the save button * Code cleanup * develop: Murezzda update timestamps diff (#173) * Added timestamp update via diff tool * Added missing function * Commited intermediate state * Rewrote timestamp alignment and differ to be integrated in each other instead of doing a 2-step process. * Update Timestamps now works correctly. * Fixed errors from rebase, removed debug code * Moved UpdateTimestamp into its own folder. * added updateTimestampsSSTAlign which updates the timestamps with the sst-align code * Added timestamp update via diff tool * Added missing function * Commited intermediate state * Rewrote timestamp alignment and differ to be integrated in each other instead of doing a 2-step process. * Update Timestamps now works correctly. * Fixed errors from rebase, removed debug code * Moved UpdateTimestamp into its own folder. * added updateTimestampsSSTAlign which updates the timestamps with the sst-align code * Added documentation * Merged timer for updating the timestamps and local save. * Selection state is now kept across updates to timestamps * Fixed bug where words with punctuation always are considered as new words. Timestamp update function now also uses the alignWords function directly instead of alignJSONText, removing some overhead. * Fixed small bug which raised an error if an empty block was present during timestamp update * Changed time of timestamp-update. Now re-calculates the timestamps after 5 seconds if the transcript has been edited or if the user saves the transcript manually with the save button * Code cleanup * some changes to show sudgestions for PR * added some of changes sudgested in PR * Update package.json * develop: Murezzda update timestamps diff dpe groups words by speaker (#174) * Added timestamp update via diff tool * Added missing function * Commited intermediate state * Rewrote timestamp alignment and differ to be integrated in each other instead of doing a 2-step process. * Update Timestamps now works correctly. * Fixed errors from rebase, removed debug code * Moved UpdateTimestamp into its own folder. * added updateTimestampsSSTAlign which updates the timestamps with the sst-align code * Added timestamp update via diff tool * Added missing function * Commited intermediate state * Rewrote timestamp alignment and differ to be integrated in each other instead of doing a 2-step process. * Update Timestamps now works correctly. * Fixed errors from rebase, removed debug code * Moved UpdateTimestamp into its own folder. * added updateTimestampsSSTAlign which updates the timestamps with the sst-align code * Added documentation * Merged timer for updating the timestamps and local save. * Selection state is now kept across updates to timestamps * Fixed bug where words with punctuation always are considered as new words. Timestamp update function now also uses the alignWords function directly instead of alignJSONText, removing some overhead. * Fixed small bug which raised an error if an empty block was present during timestamp update * Changed time of timestamp-update. Now re-calculates the timestamps after 5 seconds if the transcript has been edited or if the user saves the transcript manually with the save button * Code cleanup * some changes to show sudgestions for PR * added some of changes sudgested in PR * adjusted DPE adapter so that it preserves paragraphs break within contiguos speakers * fixed one test * commented out auto align left aligning as a step before save btn and before export function, rather then as a step that happens everytime autosave is triggered, as that might be unecessary, and add performance overhead, I also noticed the cursor position jumped after realignement, thought something was been put in place to preserve/avoid that? * updated package-lock * fixed vulneranilities * Getting ready to publish alpha * 1.0.4-alpha.0 * 1.0.4@alpha * fixed notes * changing speaker and timecodes to be unselectable * 1.0.4-alpha.1 * 1.0.4-alpha.1 unselectabel speakers and timecodes * fix speaker and timecodes at paragraph level after realignement * 1.0.4-alpha.2 * fixed docx integration * Subtitles export (#168) * added option to export srt files and layout to export other type of captions with auto segmentation of lines * added support for other subtitles formats * fixed npm audit * implemented export in UI * fixed test added sample files for adding tests for subtitle composer module at later stage * added optional analytics for export download options * updated CSS * 1.0.4-alpha.3 * fixed subtiles segmentation algo was picking the wrong words from the list * moved PR template in github folder * cleaned up code for subtitles parsing * 1.0.4-alpha.4 * fixed alignement algo after interpolation words time boundares where overlapping * fixed interpolation * fixed filename of word doc export * fixed TimeBox and playback rate not working * 1.0.4-alpha.5 * removed scroll sync (#181) fix #180 * refactor changes from James review #160 * Develop branch - should component update refactor (#182) * Refactor should component updatre for transcript editor * Refactor should component updatre for PlayerControls * Refactor should component updatre for TimeBox * Refactor should component update for ProgressBar * Refactor should component update for TimedTextEditor * Refactor should component update for Header * fix from PR review Fixed from James comments from #144 (review) * Update package.json * Update package.json
Is your Pull Request request related to another issue in this repository ?
Related to #30
Describe what the PR does
To restore timestamps after editing of the transcript, the new state is compared to the original state of the transcript. diff-match-patch is used to find all differences between the original and new transcript, and the following rules are applied:
State whether the PR is ready for review or whether it needs extra work
The following points need to be clarified