-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio narration #195
Audio narration #195
Conversation
Updates:
Note:
|
Update:
I haven't added any tests yet as I'm unsure how to test without actually saying words. |
@angelala3252 please resolve merge conflicts! |
@abrichr I've resolved the merge conflicts. What is tiktoken being used for? Whisper requires tiktoken==0.3.3, but we have tiktoken==0.4.0 as a requirement. How do you want to deal with this? |
@angelala3252 I think we can safely remove it. It is used here: https://github.com/MLDSAI/OpenAdapt/blob/3e1ebb94eda47f251e2c0b9f10e4146e97c7458e/openadapt/strategies/mixins/openai.py#L33 And again later on in that file in a function which we are not using. Can you please remove the function and import? |
@abrichr done! |
Thank you @angelala3252 ! Can you please add recording performance plots before and after these changes? 🙏 |
openadapt/record.py
Outdated
try: | ||
while True: | ||
time.sleep(1) | ||
except KeyboardInterrupt: | ||
terminate_event.set() | ||
|
||
if enable_audio: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move this into a function or two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it into the record_audio
function in e9f2d36
After this change I had to switch to using sqalchemy's thread-local scoped_session
in 9293b0b as I kept getting this error:
SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 5756 and this is thread id 24420.
Here is the performance before and after:
Before:
After:
Is this ok with you?
openadapt/record.py
Outdated
audio_file = crud.insert_audio_file(compressed_audio_bytes) | ||
|
||
# Create AudioInfo entry | ||
audio_info = crud.insert_audio_info(result_info['text'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain conceptually why there are two tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize now it doesn't really make sense to store them separately 😅 I merged the two tables in 9469043
…udio_narration # Conflicts: # requirements.txt
@abrichr resolved merge conflicts |
Thanks @angelala3252 ! I tried checking this out and running it. I had to add the following dependencies:
Now I get the following error when running
Maybe this is a Mac-specific issue. @0dm @dianzrong can you please try to reproduce? |
@abrichr I got the same error from recording and interestingly enough, after the original
|
I think the issue is the same problem that makes it so that you can't run record when in a call on MacOS... related to #183 |
@angelala3252 I believe you are right, let's consider this blocked until #183 is resolved. In the meantime can you please clarify why this is in a separate fork under OpenAdaptAI instead of your fork? I would like to remove that fork if possible. |
closed in favour of PR #346 to move to my fork |
This pull request responds to issue #164 . The idea is to use
sounddevice
to optionally record audio while recording ActionEvents (depending on the value of
enable_audio
), stop recording once aKeyboardInterrupt
has been caught, usesoundfile
to convert the array of audio frames to a WAV file, then use[whisper](https://github.com/openai/whisper)
to convert this WAV file to text. The next step is to save the text and WAV to the database in a new table with a link to its recording. Further exploration needs to be done to know what exactly we will use this text for.Instructions to run: