Audio narration #195

angelala3252 · 2023-05-29T15:17:12Z

This pull request responds to issue #164 . The idea is to use sounddevice
to optionally record audio while recording ActionEvents (depending on the value of enable_audio), stop recording once a KeyboardInterrupt has been caught, use soundfile to convert the array of audio frames to a WAV file, then use [whisper](https://github.com/openai/whisper) to convert this WAV file to text. The next step is to save the text and WAV to the database in a new table with a link to its recording. Further exploration needs to be done to know what exactly we will use this text for.

Instructions to run:

git remote add audio https://github.com/OpenAdaptAI/AudioNarration.git
git fetch audio
git checkout feat/audio_narration
pip install -r requirements.txt
alembic upgrade head
python -m openadapt.record "test" True # say some words!
python -m openadapt.visualize

…conversion

angelala3252 · 2023-06-01T21:19:51Z

Updates:

crud.get_audio_info(recording) returns the audio info associated with a recording, and the .transcribed_text attribute contains the text from the narration
visualize.py now also visualizes the audio info near the top of the page.
for now, the audio and text don't do anything, but once we decide what to do with them I can continue work on that.

Note:

when speaking, try to speak clearly and loudly, and minimize background noise for optimal transcription.

angelala3252 · 2023-06-01T23:01:42Z

Update:

added compression and logging to show how much compression was done
included word by word timestamps in audio info as JSON data under words_with_timestamps

I haven't added any tests yet as I'm unsure how to test without actually saying words.

angelala3252 · 2023-06-02T14:28:40Z

Examples of what is logged and visualize output:

angelala3252 · 2023-06-02T18:24:48Z

I tried recording for a bit longer and here are the results:

Started at 2:14 and I talked the whole time

Ended at 2:19 and took 2 mins to transcribe

So a 5 min recording would be about 5 MB after compression

abrichr · 2023-06-14T22:53:57Z

@angelala3252 please resolve merge conflicts!

angelala3252 · 2023-06-14T23:26:59Z

@abrichr I've resolved the merge conflicts. What is tiktoken being used for? Whisper requires tiktoken==0.3.3, but we have tiktoken==0.4.0 as a requirement. How do you want to deal with this?

abrichr · 2023-06-15T23:43:30Z

@angelala3252 I think we can safely remove it.

It is used here: https://github.com/MLDSAI/OpenAdapt/blob/3e1ebb94eda47f251e2c0b9f10e4146e97c7458e/openadapt/strategies/mixins/openai.py#L33

And again later on in that file in a function which we are not using. Can you please remove the function and import?

angelala3252 · 2023-06-16T02:00:51Z

@abrichr done!

abrichr · 2023-06-16T02:53:31Z

Thank you @angelala3252 ! Can you please add recording performance plots before and after these changes? 🙏

abrichr · 2023-06-16T02:54:24Z

openadapt/record.py

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        terminate_event.set()

+    if enable_audio:


Can you please move this into a function or two?

I moved it into the record_audio function in e9f2d36

After this change I had to switch to using sqalchemy's thread-local scoped_session in 9293b0b as I kept getting this error:
SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 5756 and this is thread id 24420.

Here is the performance before and after:
Before:

After:

Is this ok with you?

abrichr · 2023-06-16T02:54:40Z

openadapt/record.py

+        audio_file = crud.insert_audio_file(compressed_audio_bytes)
+
+        # Create AudioInfo entry
+        audio_info = crud.insert_audio_info(result_info['text'],


Can you please explain conceptually why there are two tables?

I realize now it doesn't really make sense to store them separately 😅 I merged the two tables in 9469043

requirements.txt

…udio_narration # Conflicts: # requirements.txt

angelala3252 · 2023-06-23T15:19:25Z

@abrichr resolved merge conflicts

abrichr · 2023-06-25T21:07:17Z

Thanks @angelala3252 !

I tried checking this out and running it. I had to add the following dependencies:

poetry add sounddevice
poetry add soundfile
poetry add whisper

Now I get the following error when running python -m openadapt.record "testing audio narration" --enable_audio:

2023-06-25 17:05:05.603 | INFO     | __main__:record:597 - task_description='testing audio narration'
2023-06-25 17:05:05.638 | INFO     | __main__:create_recording:465 - recording=Recording(id=6, timestamp=1687727105.603264, monitor_width=1512, monitor_height=982, double_click_interval_seconds=0.5, double_click_distance_pixels=5, platform='darwin', task_description='testing audio narration')
2023-06-25 17:05:05.655 | INFO     | __main__:read_window_events:370 - starting
2023-06-25 17:05:05.660 | INFO     | __main__:read_screen_events:344 - starting
2023-06-25 17:05:05.666 | INFO     | __main__:process_events:76 - starting
2023-06-25 17:05:05.714 | INFO     | __main__:record_audio:529 - Audio recording started.
2023-06-25 17:05:06.111 | INFO     | __main__:read_window_events:389 - _window_data={'title': 'Terminal OpenAdapt — poetry shell ▸ tmux — 250×58', 'left': 0, 'top': 38, 'width': 1510, 'height': 933, 'window_id': 152031}
2023-06-25 17:05:06.113 | ERROR    | openadapt.window._macos:get_active_window:69 - Error getting focused window
zsh: segmentation fault  python -m openadapt.record "testing audio narration" --enable_audio
(openadapt-py3.10) abrichr@MacBook-Pro-3 OpenAdapt % 2023-06-25 17:05:08.084 | INFO     | __mp_main__:performance_stats_writer:419 - performance stats writer starting
2023-06-25 17:05:08.085 | INFO     | __mp_main__:write_events:215 - event_type='window' starting
2023-06-25 17:05:08.085 | INFO     | __mp_main__:write_events:215 - event_type='screen' starting
2023-06-25 17:05:08.085 | INFO     | __mp_main__:write_events:215 - event_type='action' starting

Maybe this is a Mac-specific issue. @0dm @dianzrong can you please try to reproduce?

dianzrong · 2023-06-30T17:12:05Z

@abrichr I got the same error from recording and interestingly enough, after the original python -m openadapt.record "test" True failed, a new python -m openadapt.record "test" True would automatically run again

2023-06-30 12:57:23.195 | INFO     | __main__:record:597 - task_description='test'
2023-06-30 12:57:23.256 | INFO     | __main__:create_recording:465 - recording=Recording(id=3, timestamp=1688144243.1956239, monitor_width=1440, monitor_height=900, double_click_interval_seconds=0.5, double_click_distance_pixels=5, platform='darwin', task_description='test')
2023-06-30 12:57:23.270 | INFO     | __main__:read_window_events:370 - starting
2023-06-30 12:57:23.276 | INFO     | __main__:read_screen_events:344 - starting
2023-06-30 12:57:23.281 | INFO     | __main__:process_events:76 - starting
2023-06-30 12:57:23.366 | INFO     | __main__:record_audio:529 - Audio recording started.
2023-06-30 12:57:23.504 | INFO     | __main__:read_window_events:389 - _window_data={'title': 'Terminal OpenAdapt — Python -m openadapt.record test True — 148×46', 'left': 347, 'top': 25, 'width': 1046, 'height': 679, 'window_id': 745}
2023-06-30 12:57:23.639 | INFO     | __main__:read_window_events:389 - _window_data={'title': 'Terminal OpenAdapt — Python ◂ Python -m openadapt.record test True — 148×46', 'left': 347, 'top': 25, 'width': 1046, 'height': 679, 'window_id': 745}
2023-06-30 12:57:27.412 | ERROR    | openadapt.window._macos:get_active_window:69 - Error getting focused window
zsh: segmentation fault  python -m openadapt.record "test" True
(.venv) d@Dians-MacBook-Air OpenAdapt % 2023-06-30 12:57:28.209 | INFO     | __mp_main__:performance_stats_writer:419 - performance stats writer starting
2023-06-30 12:57:28.209 | INFO     | __mp_main__:write_events:215 - event_type='window' starting
2023-06-30 12:57:28.209 | INFO     | __mp_main__:write_events:215 - event_type='action' starting
2023-06-30 12:57:28.209 | INFO     | __mp_main__:write_events:215 - event_type='screen' starting

angelala3252 · 2023-06-30T17:46:17Z

I think the issue is the same problem that makes it so that you can't run record when in a call on MacOS... related to #183

abrichr · 2023-07-02T14:32:02Z

@angelala3252 I believe you are right, let's consider this blocked until #183 is resolved.

In the meantime can you please clarify why this is in a separate fork under OpenAdaptAI instead of your fork? I would like to remove that fork if possible.

angelala3252 · 2023-07-03T20:48:42Z

closed in favour of PR #346 to move to my fork

angelala3252 added 3 commits May 26, 2023 17:40

added sounddevice to optionally record narration

351d87b

added sounddevice to optionally record narration and initial whisper …

f19a84a

…conversion

updated requirements for audio narration

e143767

angelala3252 marked this pull request as draft May 29, 2023 15:19

angelala3252 added 8 commits May 31, 2023 17:40

small changes

6f07b93

fixed issue with created audio file being really slow

d3ef09a

updated to save audio data and transcribed text in database

9e86193

pull from main

87a814f

new alembic migration

ce84a1b

edited audio tables

5c584b2

convert audio array to required format for whisper

802c8a2

visualize audio info

aca8cdc

angelala3252 marked this pull request as ready for review June 1, 2023 21:15

angelala3252 added 2 commits June 1, 2023 18:28

FLAC compression before storing

42b1007

store word by word timestamps

9f4c280

style changes

20d29e1

Merge branch 'main' into feat/audio_narration

109ffe0

angelala3252 added 3 commits June 15, 2023 21:53

changed tiktoken version

8d27b4f

removed unused tiktoken code

d631b2d

Merge branch 'main' into feat/audio_narration

ab0805e

abrichr reviewed Jun 16, 2023

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

abrichr reviewed Jun 16, 2023

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

angelala3252 added 7 commits June 18, 2023 17:34

alphabetic order, removed redundant dependencies

e30538b

merged AudioInfo and AudioFile

9469043

Merge remote-tracking branch 'audio/feat/audio_narration' into feat/a…

47bf845

…udio_narration # Conflicts: # requirements.txt

move audio recording into record_audio function

e9f2d36

use thread-local scoped_session

9293b0b

Merge branch 'main' into feat/audio_narration

a66acbc

remove redundant requirement

888d335

angelala3252 mentioned this pull request Jul 3, 2023

feat: add audio narration (updated) #346

Closed

7 tasks

angelala3252 closed this Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio narration #195

Audio narration #195

angelala3252 commented May 29, 2023 •

edited

Loading

angelala3252 commented Jun 1, 2023 •

edited

Loading

angelala3252 commented Jun 1, 2023 •

edited

Loading

angelala3252 commented Jun 2, 2023 •

edited

Loading

angelala3252 commented Jun 2, 2023

abrichr commented Jun 14, 2023

angelala3252 commented Jun 14, 2023

abrichr commented Jun 15, 2023

angelala3252 commented Jun 16, 2023

abrichr commented Jun 16, 2023

abrichr Jun 16, 2023

angelala3252 Jun 19, 2023

abrichr Jun 16, 2023

angelala3252 Jun 19, 2023

angelala3252 commented Jun 23, 2023

abrichr commented Jun 25, 2023

dianzrong commented Jun 30, 2023

angelala3252 commented Jun 30, 2023

abrichr commented Jul 2, 2023

angelala3252 commented Jul 3, 2023

Audio narration #195

Audio narration #195

Conversation

angelala3252 commented May 29, 2023 • edited Loading

angelala3252 commented Jun 1, 2023 • edited Loading

angelala3252 commented Jun 1, 2023 • edited Loading

angelala3252 commented Jun 2, 2023 • edited Loading

angelala3252 commented Jun 2, 2023

abrichr commented Jun 14, 2023

angelala3252 commented Jun 14, 2023

abrichr commented Jun 15, 2023

angelala3252 commented Jun 16, 2023

abrichr commented Jun 16, 2023

abrichr Jun 16, 2023

Choose a reason for hiding this comment

angelala3252 Jun 19, 2023

Choose a reason for hiding this comment

abrichr Jun 16, 2023

Choose a reason for hiding this comment

angelala3252 Jun 19, 2023

Choose a reason for hiding this comment

angelala3252 commented Jun 23, 2023

abrichr commented Jun 25, 2023

dianzrong commented Jun 30, 2023

angelala3252 commented Jun 30, 2023

abrichr commented Jul 2, 2023

angelala3252 commented Jul 3, 2023

angelala3252 commented May 29, 2023 •

edited

Loading

angelala3252 commented Jun 1, 2023 •

edited

Loading

angelala3252 commented Jun 1, 2023 •

edited

Loading

angelala3252 commented Jun 2, 2023 •

edited

Loading