-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MM-53432] Calls transcriptions #549
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #549 +/- ##
========================================
+ Coverage 5.70% 6.04% +0.34%
========================================
Files 24 26 +2
Lines 4561 5110 +549
========================================
+ Hits 260 309 +49
- Misses 4282 4777 +495
- Partials 19 24 +5 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comments, nothing big. Amazing work!
server/bot_api.go
Outdated
return | ||
} | ||
|
||
postID := info["thread_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have a ensureFields(info, "thread_id", "file_id", "transcription_id")
and then we can use it elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ideal solution would be for the request body to be properly typed and likely provide an IsValid()
method. That means more coupling as both sides need to be aware of the type but I think we are going into that direction anyway with the new public
stuff so may as well do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(again, just an idea, not necessary): That's why a more generic ensureFields(stringmap map[string]string, fields ...string)
type of thing would be better? (less coupling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, understood the proposal. My main worry is that that would become problematic as soon as we need to handle something other than strings. And then we are re-inventing structs using maps. Maybe some coupling is the way to go given this is essentially a public API from the implementer's point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I added a JobInfo
type and updated everything, no more map parsing on the API side.
// Checking if the transcription has ended due to the bot leaving. | ||
if prevState.Call != nil && prevState.Call.Transcription != nil && currState.Call != nil && currState.Call.Transcription != nil && | ||
currState.Call.Transcription.EndAt > prevState.Call.Transcription.EndAt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to think the way that the bot behaviour + state is triggering things seems a little brittle and confusing. Maybe it's too late to change it, but it would be nice if jobs were started and stopped by idempotent events. Granted, we need the bots to do any of the actual recording/transcribing, but it seems like maybe the communication triggers and the stopping triggers (at least) could be in response to events, rather than these conditional blocks (which seems like it would be error prone).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand more about that as I am not entirely sure what you mean by "idempotent events". Could you make an example perhaps?
In essence the code above is in response to an event, namely the bot leaving the call which we need to handle somehow. Not sure if it's a syntactic improvement you'd be looking for (e.g. onBotLeftCall
callback) or something more structural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my thinking (and while writing this I think I see why it's not practical): Something like moving all the "bot leaving triggers the recording/transcription ending" logic out of here and putting it in its own function (say: recordingEnded
), then moving the conditional logic (lines 350-351) somewhere else (that's the infeasible part I think), then using that conditional logic to trigger an api call which trigger the recordingEnded
. Then if that recordingEnded
endpoint gets triggered multiple times (for whatever reason) we can ignore it after the first one.
I think it's tough to find a better place for lines 350-351 though...
I guess I'm worried that we've tightly coupled the bot's presence with the recording/transcribing activity. I know we need the bot there for that to actually work, I just feel it's mixing two concepts that logically should be separate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I feel this deserves a sync discussion, let's chat about it next week.
if trState.JobID != "" { | ||
return fmt.Errorf("transcription job already in progress") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's funny. Not sure how that would happen :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually can happen because we are not 100% atomic there due to the fact that we need to unlock to make the API call to the job service. So concurrent requests to start transcriptions could theoretically result in multiple jobs starting. Of course the client shouldn't ever do that but it's a possibility we may want to account for in case of issues/bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good to know, thanks
return nil | ||
} | ||
|
||
func (p *Plugin) stopTranscribingJob(state *channelState, callID string) (rerr error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an opportunity to consolidate some code with the checker? Not a big deal though.
This is awesome, big thanks to Mattermost Team for this feature. Is there any point on the roadmap for this feature that we can wait for? |
Hopefully sometime next month. But the functionality (at least initially) will only be available for Enterprise licensed installations. |
Nope, nothing else for me other than the text output. I think we're good to proceed as well @streamer45. Great work! I'll approve this knowing we will be doing some tweaking to the text output and we can test things out more robustly on Community. |
@matthewbirtch I implemented some basic heuristics to compact the produced text file output in order to avoid very short sentences. I tried a couple of approaches and eventually converged on a duration based one (as opposed to length) since I felt that splitting/joining based on time gives better results overall as it accounts for pace and potential pauses in speech. It's still experimental and highly configurable, right now enforcing the following rules. We join segments if:
Give it a try when you get a chance and let me know if you see any improvements or hit any issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided string feedback inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided string feedback inline.
Definitely seeing an improvement here @streamer45 and I like the rules you've established here - this is really smart! I vote we proceed with what you have and we can always tweak these numbers if we find we're not getting expected results in broader testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making it so easy for me to review the latest strings with direct links!
Merging this to unblock other PRs. Still working on docs updates as a separate item (https://mattermost.atlassian.net/browse/MM-55587) |
Summary
PR implements the plugin side for calls transcriptions (post call transcript).
One important detail to keep in mind is that recording and transcription jobs are artificially coupled, meaning that when transcriptions are enabled (globally from the plugin settings), starting and stopping recordings will automatically start and stop transcriptions. Moreover, any failure would cause both jobs to be terminated.
One limitation I found when trying to follow the proposed design is that it's not really possible to update the attachments for a previously created post so we cannot easily add the transcription file to recording post unless we artificially delay its creation. I don't like that idea very much given the transcription could take a while to process. For now we simply have the bot make a new post with the transcription file attached. I created https://mattermost.atlassian.net/browse/MM-54874 to track this improvement as it'll likely require server changes.
Still to do (ideally prior to merging)
MaxRecordingDuration
(see comment in code).Design
https://www.figma.com/file/ZAvwHhdUTaWkby4uekmnDX/Call-transcription
Ticket Link
https://mattermost.atlassian.net/browse/MM-53432