[MM-53432] Calls transcriptions #549

streamer45 · 2023-10-10T22:41:42Z

Summary

PR implements the plugin side for calls transcriptions (post call transcript).

One important detail to keep in mind is that recording and transcription jobs are artificially coupled, meaning that when transcriptions are enabled (globally from the plugin settings), starting and stopping recordings will automatically start and stop transcriptions. Moreover, any failure would cause both jobs to be terminated.

One limitation I found when trying to follow the proposed design is that it's not really possible to update the attachments for a previously created post so we cannot easily add the transcription file to recording post unless we artificially delay its creation. I don't like that idea very much given the transcription could take a while to process. For now we simply have the bot make a new post with the transcription file attached. I created https://mattermost.atlassian.net/browse/MM-54874 to track this improvement as it'll likely require server changes.

Still to do (ideally prior to merging)

Design

https://www.figma.com/file/ZAvwHhdUTaWkby4uekmnDX/Call-transcription

Ticket Link

https://mattermost.atlassian.net/browse/MM-53432

codecov-commenter · 2023-10-10T22:51:05Z

Codecov Report

Attention: 670 lines in your changes are missing coverage. Please review.

Comparison is base (ff80fe2) 5.70% compared to head (d787eae) 6.04%.

Files	Patch %	Lines
server/bot_api.go	0.00%	219 Missing ⚠️
server/transcription_api.go	0.00%	143 Missing ⚠️
server/job_service.go	0.00%	72 Missing ⚠️
server/session.go	0.00%	53 Missing ⚠️
server/job_metadata.go	37.80%	47 Missing and 4 partials ⚠️
server/recording_api.go	0.00%	49 Missing ⚠️
server/utils.go	15.90%	37 Missing ⚠️
server/state.go	18.51%	21 Missing and 1 partial ⚠️
server/configuration.go	36.84%	11 Missing and 1 partial ⚠️
server/store.go	0.00%	12 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##            main    #549      +/-   ##
========================================
+ Coverage   5.70%   6.04%   +0.34%     
========================================
  Files         24      26       +2     
  Lines       4561    5110     +549     
========================================
+ Hits         260     309      +49     
- Misses      4282    4777     +495     
- Partials      19      24       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cpoile

Just minor comments, nothing big. Amazing work!

e2e/config.ts

e2e/tests/recordings.spec.ts

plugin.json

cpoile · 2023-10-13T22:08:44Z

server/bot_api.go

+		return
+	}
+
+	postID := info["thread_id"]


Maybe we can have a ensureFields(info, "thread_id", "file_id", "transcription_id") and then we can use it elsewhere?

I think the ideal solution would be for the request body to be properly typed and likely provide an IsValid() method. That means more coupling as both sides need to be aware of the type but I think we are going into that direction anyway with the new public stuff so may as well do it.

(again, just an idea, not necessary): That's why a more generic ensureFields(stringmap map[string]string, fields ...string) type of thing would be better? (less coupling)

Yep, understood the proposal. My main worry is that that would become problematic as soon as we need to handle something other than strings. And then we are re-inventing structs using maps. Maybe some coupling is the way to go given this is essentially a public API from the implementer's point of view.

So I added a JobInfo type and updated everything, no more map parsing on the API side.

server/metadata.go

server/recording_api.go

cpoile · 2023-10-13T23:08:04Z

server/session.go

+	// Checking if the transcription has ended due to the bot leaving.
+	if prevState.Call != nil && prevState.Call.Transcription != nil && currState.Call != nil && currState.Call.Transcription != nil &&
+		currState.Call.Transcription.EndAt > prevState.Call.Transcription.EndAt {


I'm starting to think the way that the bot behaviour + state is triggering things seems a little brittle and confusing. Maybe it's too late to change it, but it would be nice if jobs were started and stopped by idempotent events. Granted, we need the bots to do any of the actual recording/transcribing, but it seems like maybe the communication triggers and the stopping triggers (at least) could be in response to events, rather than these conditional blocks (which seems like it would be error prone).

I'd like to understand more about that as I am not entirely sure what you mean by "idempotent events". Could you make an example perhaps?

In essence the code above is in response to an event, namely the bot leaving the call which we need to handle somehow. Not sure if it's a syntactic improvement you'd be looking for (e.g. onBotLeftCall callback) or something more structural.

Here's my thinking (and while writing this I think I see why it's not practical): Something like moving all the "bot leaving triggers the recording/transcription ending" logic out of here and putting it in its own function (say: recordingEnded), then moving the conditional logic (lines 350-351) somewhere else (that's the infeasible part I think), then using that conditional logic to trigger an api call which trigger the recordingEnded. Then if that recordingEnded endpoint gets triggered multiple times (for whatever reason) we can ignore it after the first one.
I think it's tough to find a better place for lines 350-351 though...
I guess I'm worried that we've tightly coupled the bot's presence with the recording/transcribing activity. I know we need the bot there for that to actually work, I just feel it's mixing two concepts that logically should be separate...

Alright, I feel this deserves a sync discussion, let's chat about it next week.

server/transcription_api.go

cpoile · 2023-10-13T23:18:21Z

server/transcription_api.go

+	if trState.JobID != "" {
+		return fmt.Errorf("transcription job already in progress")
+	}


That's funny. Not sure how that would happen :)

It actually can happen because we are not 100% atomic there due to the fact that we need to unlock to make the API call to the job service. So concurrent requests to start transcriptions could theoretically result in multiple jobs starting. Of course the client shouldn't ever do that but it's a possibility we may want to account for in case of issues/bugs.

Ah, good to know, thanks

cpoile · 2023-10-13T23:22:11Z

server/transcription_api.go

+	return nil
+}
+
+func (p *Plugin) stopTranscribingJob(state *channelState, callID string) (rerr error) {


Is there an opportunity to consolidate some code with the checker? Not a big deal though.

antran22 · 2023-10-28T06:15:34Z

This is awesome, big thanks to Mattermost Team for this feature. Is there any point on the roadmap for this feature that we can wait for?

streamer45 · 2023-10-30T22:13:42Z

This is awesome, big thanks to Mattermost Team for this feature. Is there any point on the roadmap for this feature that we can wait for?

Hopefully sometime next month. But the functionality (at least initially) will only be available for Enterprise licensed installations.

matthewbirtch · 2023-11-20T20:51:33Z

Is there anything else blocking you were looking to see added/changed? I am still working on some small improvements on the text output as discussed above but other than that I think we are ready to move forward. Once approved I can begin the process of getting this in Community, ideally sometime this week.

Nope, nothing else for me other than the text output. I think we're good to proceed as well @streamer45. Great work! I'll approve this knowing we will be doing some tweaking to the text output and we can test things out more robustly on Community.

streamer45 · 2023-11-21T01:33:13Z

@matthewbirtch I implemented some basic heuristics to compact the produced text file output in order to avoid very short sentences. I tried a couple of approaches and eventually converged on a duration based one (as opposed to length) since I felt that splitting/joining based on time gives better results overall as it accounts for pace and potential pauses in speech.

It's still experimental and highly configurable, right now enforcing the following rules. We join segments if:

The speaker doesn't change. This is required to guarantee order of the segments (e.g. question/answer sequences).
There are less than X seconds of pause between the end of a previous text segment and the start of the next one. (X = 2s)
The overall duration of the current sentence is less than Y seconds. (Y = 10s)

Give it a try when you get a chance and let me know if you see any improvements or hit any issue.

cwarnermm

Provided string feedback inline.

plugin.json

webapp/i18n/en.json

cwarnermm

Provided string feedback inline.

matthewbirtch · 2023-11-21T18:08:46Z

@matthewbirtch I implemented some basic heuristics to compact the produced text file output in order to avoid very short sentences. I tried a couple of approaches and eventually converged on a duration based one (as opposed to length) since I felt that splitting/joining based on time gives better results overall as it accounts for pace and potential pauses in speech.

It's still experimental and highly configurable, right now enforcing the following rules. We join segments if:

The speaker doesn't change. This is required to guarantee order of the segments (e.g. question/answer sequences).

There are less than X seconds of pause between the end of a previous text segment and the start of the next one. (X = 2s)

The overall duration of the current sentence is less than Y seconds. (Y = 10s)

Give it a try when you get a chance and let me know if you see any improvements or hit any issue.

Definitely seeing an improvement here @streamer45 and I like the rules you've established here - this is really smart! I vote we proceed with what you have and we can always tweak these numbers if we find we're not getting expected results in broader testing.

cwarnermm

Thanks for making it so easy for me to review the latest strings with direct links!

streamer45 · 2023-11-29T18:03:00Z

Merging this to unblock other PRs. Still working on docs updates as a separate item (https://mattermost.atlassian.net/browse/MM-55587)

streamer45 added 1: UX Review Requires review by ux 2: Dev Review Requires review by a core committer 3: Security Review 1: Editor Review labels Oct 10, 2023

streamer45 added this to the v0.21.0 / MM 9.3 milestone Oct 10, 2023

streamer45 requested a review from cpoile October 10, 2023 22:41

streamer45 self-assigned this Oct 10, 2023

cpoile reviewed Oct 13, 2023

View reviewed changes

Base automatically changed from MM-51852 to main October 30, 2023 19:36

streamer45 added 15 commits October 30, 2023 16:29

handleBotGetProfileForSession

b74a921

handleBotPostTranscriptions

eb15b04

Refactor job api, part I

7a1eeec

Updates

648f790

Job status updates

7ee37d5

Couple transcribing job with recording

568c6ff

Update offloader

2ac0729

Updates

241a319

Remove event

a13f920

Bump Go version

0ca0b87

Add some docs

8aef5b5

Add e2e test for call transcriptions

13ea32d

Use leaner API request context

97b2a20

Verify transcription content

e52fa11

Bump calls-common

8dd2803

streamer45 force-pushed the MM-53432 branch from a9e10dc to 8dd2803 Compare October 30, 2023 22:38

streamer45 added 2 commits October 30, 2023 16:40

Fix error message

49c5670

Rename param

95ed714

streamer45 requested a review from cwarnermm November 20, 2023 19:45

matthewbirtch approved these changes Nov 20, 2023

View reviewed changes

matthewbirtch removed the 1: UX Review Requires review by ux label Nov 20, 2023

esethna approved these changes Nov 21, 2023

View reviewed changes

cwarnermm reviewed Nov 21, 2023

View reviewed changes

plugin.json Outdated Show resolved Hide resolved

webapp/i18n/en.json Outdated Show resolved Hide resolved

webapp/i18n/en.json Outdated Show resolved Hide resolved

cwarnermm reviewed Nov 21, 2023

View reviewed changes

streamer45 mentioned this pull request Nov 21, 2023

Implement heuristics to compact contiguous text segments mattermost/calls-transcriber#5

Merged

streamer45 added 4 commits November 21, 2023 15:32

Update transcribing job max duration

df604bc

Update strings

5b82222

Merge remote-tracking branch 'origin/main' into MM-53432

d8c2227

Update deps

4b6acf9

streamer45 requested a review from cwarnermm November 21, 2023 21:55

cwarnermm approved these changes Nov 21, 2023

View reviewed changes

cwarnermm added 3: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer 1: Editor Review labels Nov 21, 2023

streamer45 added the Do Not Merge/Awaiting PR Awaiting another pull request before merging (e.g. server changes) label Nov 21, 2023

Merge remote-tracking branch 'origin/main' into MM-53432

8d8cc78

streamer45 removed the Do Not Merge/Awaiting PR Awaiting another pull request before merging (e.g. server changes) label Nov 24, 2023

streamer45 added 2 commits November 24, 2023 13:05

Fix e2e imports

f3b7ac8

Use underscore to replace spaces in filenames

d787eae

streamer45 mentioned this pull request Nov 29, 2023

Don't display AI button when plugin doesn't support it #576

Merged

streamer45 merged commit 53ab625 into main Nov 29, 2023

streamer45 deleted the MM-53432 branch November 29, 2023 18:03

amyblais mentioned this pull request Dec 15, 2023

[MM-55587] Add documentation for call transcriptions settings mattermost/docs#6814

Merged

streamer45 mentioned this pull request Jan 3, 2024

[MM-55264] Prevent call post from being modified #608

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MM-53432] Calls transcriptions #549

[MM-53432] Calls transcriptions #549

streamer45 commented Oct 10, 2023 •

edited

Loading

codecov-commenter commented Oct 10, 2023 •

edited

Loading

cpoile left a comment

cpoile Oct 13, 2023

streamer45 Oct 30, 2023

cpoile Nov 1, 2023

streamer45 Nov 1, 2023

streamer45 Nov 1, 2023

cpoile Oct 13, 2023

streamer45 Oct 30, 2023

cpoile Nov 1, 2023

streamer45 Nov 1, 2023

cpoile Oct 13, 2023

streamer45 Oct 30, 2023

cpoile Nov 1, 2023

cpoile Oct 13, 2023

antran22 commented Oct 28, 2023

streamer45 commented Oct 30, 2023

matthewbirtch commented Nov 20, 2023

streamer45 commented Nov 21, 2023

cwarnermm left a comment

cwarnermm left a comment

matthewbirtch commented Nov 21, 2023

cwarnermm left a comment

streamer45 commented Nov 29, 2023

[MM-53432] Calls transcriptions #549

[MM-53432] Calls transcriptions #549

Conversation

streamer45 commented Oct 10, 2023 • edited Loading

Summary

Design

Ticket Link

codecov-commenter commented Oct 10, 2023 • edited Loading

Codecov Report

cpoile left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antran22 commented Oct 28, 2023

streamer45 commented Oct 30, 2023

matthewbirtch commented Nov 20, 2023

streamer45 commented Nov 21, 2023

cwarnermm left a comment

Choose a reason for hiding this comment

cwarnermm left a comment

Choose a reason for hiding this comment

matthewbirtch commented Nov 21, 2023

cwarnermm left a comment

Choose a reason for hiding this comment

streamer45 commented Nov 29, 2023

streamer45 commented Oct 10, 2023 •

edited

Loading

codecov-commenter commented Oct 10, 2023 •

edited

Loading