Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio narration #195

Closed
wants to merge 25 commits into from
Closed

Audio narration #195

wants to merge 25 commits into from

Conversation

angelala3252
Copy link
Collaborator

@angelala3252 angelala3252 commented May 29, 2023

This pull request responds to issue #164 . The idea is to use sounddevice
to optionally record audio while recording ActionEvents (depending on the value of enable_audio), stop recording once a KeyboardInterrupt has been caught, use soundfile to convert the array of audio frames to a WAV file, then use [whisper](https://github.com/openai/whisper) to convert this WAV file to text. The next step is to save the text and WAV to the database in a new table with a link to its recording. Further exploration needs to be done to know what exactly we will use this text for.

Instructions to run:

git remote add audio https://github.com/OpenAdaptAI/AudioNarration.git
git fetch audio
git checkout feat/audio_narration
pip install -r requirements.txt
alembic upgrade head
python -m openadapt.record "test" True # say some words!
python -m openadapt.visualize

@angelala3252 angelala3252 marked this pull request as draft May 29, 2023 15:19
@angelala3252 angelala3252 marked this pull request as ready for review June 1, 2023 21:15
@angelala3252
Copy link
Collaborator Author

angelala3252 commented Jun 1, 2023

Updates:

  • crud.get_audio_info(recording) returns the audio info associated with a recording, and the .transcribed_text attribute contains the text from the narration
  • visualize.py now also visualizes the audio info near the top of the page.
  • for now, the audio and text don't do anything, but once we decide what to do with them I can continue work on that.

Note:

  • when speaking, try to speak clearly and loudly, and minimize background noise for optimal transcription.

@angelala3252
Copy link
Collaborator Author

angelala3252 commented Jun 1, 2023

Update:

  • added compression and logging to show how much compression was done
  • included word by word timestamps in audio info as JSON data under words_with_timestamps

I haven't added any tests yet as I'm unsure how to test without actually saying words.

@angelala3252
Copy link
Collaborator Author

angelala3252 commented Jun 2, 2023

Examples of what is logged and visualize output:

image
image1
Screenshot 2023-06-02 102956

@angelala3252
Copy link
Collaborator Author

I tried recording for a bit longer and here are the results:

Started at 2:14 and I talked the whole time
image
Ended at 2:19 and took 2 mins to transcribe
image
So a 5 min recording would be about 5 MB after compression
image

@abrichr
Copy link
Member

abrichr commented Jun 14, 2023

@angelala3252 please resolve merge conflicts!

@angelala3252
Copy link
Collaborator Author

@abrichr I've resolved the merge conflicts. What is tiktoken being used for? Whisper requires tiktoken==0.3.3, but we have tiktoken==0.4.0 as a requirement. How do you want to deal with this?

@abrichr
Copy link
Member

abrichr commented Jun 15, 2023

@angelala3252 I think we can safely remove it.

It is used here: https://github.com/MLDSAI/OpenAdapt/blob/3e1ebb94eda47f251e2c0b9f10e4146e97c7458e/openadapt/strategies/mixins/openai.py#L33

And again later on in that file in a function which we are not using. Can you please remove the function and import?

@angelala3252
Copy link
Collaborator Author

@abrichr done!

@abrichr
Copy link
Member

abrichr commented Jun 16, 2023

Thank you @angelala3252 ! Can you please add recording performance plots before and after these changes? 🙏

try:
while True:
time.sleep(1)
except KeyboardInterrupt:
terminate_event.set()

if enable_audio:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this into a function or two?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it into the record_audio function in e9f2d36

After this change I had to switch to using sqalchemy's thread-local scoped_session in 9293b0b as I kept getting this error:
SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 5756 and this is thread id 24420.

Here is the performance before and after:
Before:
before changes
After:
after changes

Is this ok with you?

audio_file = crud.insert_audio_file(compressed_audio_bytes)

# Create AudioInfo entry
audio_info = crud.insert_audio_info(result_info['text'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain conceptually why there are two tables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize now it doesn't really make sense to store them separately 😅 I merged the two tables in 9469043

requirements.txt Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@angelala3252
Copy link
Collaborator Author

@abrichr resolved merge conflicts

@abrichr
Copy link
Member

abrichr commented Jun 25, 2023

Thanks @angelala3252 !

I tried checking this out and running it. I had to add the following dependencies:

poetry add sounddevice
poetry add soundfile
poetry add whisper

Now I get the following error when running python -m openadapt.record "testing audio narration" --enable_audio:

2023-06-25 17:05:05.603 | INFO     | __main__:record:597 - task_description='testing audio narration'
2023-06-25 17:05:05.638 | INFO     | __main__:create_recording:465 - recording=Recording(id=6, timestamp=1687727105.603264, monitor_width=1512, monitor_height=982, double_click_interval_seconds=0.5, double_click_distance_pixels=5, platform='darwin', task_description='testing audio narration')
2023-06-25 17:05:05.655 | INFO     | __main__:read_window_events:370 - starting
2023-06-25 17:05:05.660 | INFO     | __main__:read_screen_events:344 - starting
2023-06-25 17:05:05.666 | INFO     | __main__:process_events:76 - starting
2023-06-25 17:05:05.714 | INFO     | __main__:record_audio:529 - Audio recording started.
2023-06-25 17:05:06.111 | INFO     | __main__:read_window_events:389 - _window_data={'title': 'Terminal OpenAdapt — poetry shell ▸ tmux — 250×58', 'left': 0, 'top': 38, 'width': 1510, 'height': 933, 'window_id': 152031}
2023-06-25 17:05:06.113 | ERROR    | openadapt.window._macos:get_active_window:69 - Error getting focused window
zsh: segmentation fault  python -m openadapt.record "testing audio narration" --enable_audio
(openadapt-py3.10) abrichr@MacBook-Pro-3 OpenAdapt % 2023-06-25 17:05:08.084 | INFO     | __mp_main__:performance_stats_writer:419 - performance stats writer starting
2023-06-25 17:05:08.085 | INFO     | __mp_main__:write_events:215 - event_type='window' starting
2023-06-25 17:05:08.085 | INFO     | __mp_main__:write_events:215 - event_type='screen' starting
2023-06-25 17:05:08.085 | INFO     | __mp_main__:write_events:215 - event_type='action' starting

Maybe this is a Mac-specific issue. @0dm @dianzrong can you please try to reproduce?

@dianzrong
Copy link
Collaborator

@abrichr I got the same error from recording and interestingly enough, after the original python -m openadapt.record "test" True failed, a new python -m openadapt.record "test" True would automatically run again

2023-06-30 12:57:23.195 | INFO     | __main__:record:597 - task_description='test'
2023-06-30 12:57:23.256 | INFO     | __main__:create_recording:465 - recording=Recording(id=3, timestamp=1688144243.1956239, monitor_width=1440, monitor_height=900, double_click_interval_seconds=0.5, double_click_distance_pixels=5, platform='darwin', task_description='test')
2023-06-30 12:57:23.270 | INFO     | __main__:read_window_events:370 - starting
2023-06-30 12:57:23.276 | INFO     | __main__:read_screen_events:344 - starting
2023-06-30 12:57:23.281 | INFO     | __main__:process_events:76 - starting
2023-06-30 12:57:23.366 | INFO     | __main__:record_audio:529 - Audio recording started.
2023-06-30 12:57:23.504 | INFO     | __main__:read_window_events:389 - _window_data={'title': 'Terminal OpenAdapt — Python -m openadapt.record test True — 148×46', 'left': 347, 'top': 25, 'width': 1046, 'height': 679, 'window_id': 745}
2023-06-30 12:57:23.639 | INFO     | __main__:read_window_events:389 - _window_data={'title': 'Terminal OpenAdapt — Python ◂ Python -m openadapt.record test True — 148×46', 'left': 347, 'top': 25, 'width': 1046, 'height': 679, 'window_id': 745}
2023-06-30 12:57:27.412 | ERROR    | openadapt.window._macos:get_active_window:69 - Error getting focused window
zsh: segmentation fault  python -m openadapt.record "test" True
(.venv) d@Dians-MacBook-Air OpenAdapt % 2023-06-30 12:57:28.209 | INFO     | __mp_main__:performance_stats_writer:419 - performance stats writer starting
2023-06-30 12:57:28.209 | INFO     | __mp_main__:write_events:215 - event_type='window' starting
2023-06-30 12:57:28.209 | INFO     | __mp_main__:write_events:215 - event_type='action' starting
2023-06-30 12:57:28.209 | INFO     | __mp_main__:write_events:215 - event_type='screen' starting

@angelala3252
Copy link
Collaborator Author

I think the issue is the same problem that makes it so that you can't run record when in a call on MacOS... related to #183

@abrichr
Copy link
Member

abrichr commented Jul 2, 2023

@angelala3252 I believe you are right, let's consider this blocked until #183 is resolved.

In the meantime can you please clarify why this is in a separate fork under OpenAdaptAI instead of your fork? I would like to remove that fork if possible.

@angelala3252
Copy link
Collaborator Author

closed in favour of PR #346 to move to my fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants