Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of batched whisper and updates on Audio pipeline #53

Merged
merged 91 commits into from
Mar 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
c4ab130
added required packages
Jiltseb Jan 26, 2024
280a7c8
added endpoint for transcribe_batch
Jiltseb Jan 26, 2024
a838ca4
added vad deployment details
Jiltseb Jan 26, 2024
8339600
added nodes for batched inference and storage
Jiltseb Jan 26, 2024
d68d84a
vad deployment for segmenting the audio
Jiltseb Jan 26, 2024
34681a7
modified whisper deployment for batched_inference
Jiltseb Jan 26, 2024
ab89fcf
modify output datamodel to accomodate batched inference
Jiltseb Jan 26, 2024
3c89095
output datamodel for vad_deployment
Jiltseb Jan 26, 2024
b3e6fc5
parameters of the vad model and inference
Jiltseb Jan 26, 2024
b642373
added default parameters for batched asr
Jiltseb Jan 26, 2024
1d6087d
added example in the demo
Jiltseb Jan 26, 2024
ce1160f
Added end point for batched transcription
Jiltseb Feb 5, 2024
ac2055c
Added nodes for audio extraction, vad processing, batched transcripti…
Jiltseb Feb 5, 2024
f5d40a4
Modified deployment for faster audio reading
Jiltseb Feb 5, 2024
60c31ac
Modified deployment for faster audio reading and handling empty vad s…
Jiltseb Feb 5, 2024
aa6a33a
Added extract_audio function utility for video to audio conversion
Jiltseb Feb 5, 2024
de6d00a
Added examples and benchmarking scripts on documentary data
Jiltseb Feb 5, 2024
f860166
test script for vad_deployment
Jiltseb Feb 5, 2024
5c697d7
changes to vad deployment
Jiltseb Feb 9, 2024
25a97a7
batched_inference endpoint
Jiltseb Feb 9, 2024
90d41de
pipeline changes for Audio dataclass
Jiltseb Feb 9, 2024
b4fe114
added audio_dir
Jiltseb Feb 9, 2024
1d9a7bf
input type changes, fixes
Jiltseb Feb 9, 2024
5e67eff
input type changes, fixes
Jiltseb Feb 9, 2024
a25c8ce
added AudioReadingException
Jiltseb Feb 9, 2024
c0a641d
added core Audio dataclass
Jiltseb Feb 9, 2024
399a8f2
tests for vad deployment
Jiltseb Feb 9, 2024
2961ab0
updated tests for whisper deployment
Jiltseb Feb 9, 2024
ef503ec
adding test audio file
Jiltseb Feb 9, 2024
e0a8807
added expected vad_output for testing
Jiltseb Feb 9, 2024
0c2d2ac
expected output for vad
Jiltseb Feb 9, 2024
d1cf018
added expected output for batched transcription
Jiltseb Feb 9, 2024
71f8f6f
fixed typo
Jiltseb Feb 9, 2024
8ec8ec3
added extract_audio function for Audio conversion
Jiltseb Feb 9, 2024
53e6c21
notebooks/demo.ipynb
Jiltseb Feb 9, 2024
08753ba
benchmarking experiments and examples
Jiltseb Feb 9, 2024
980e966
merging batched_whisper changes with aana_sdk main
Jiltseb Feb 9, 2024
34de8b5
adding default test environment values
Jiltseb Feb 14, 2024
d1f9f4f
changes to vad and whisper deployments
Jiltseb Feb 14, 2024
4977224
changes to config files
Jiltseb Feb 14, 2024
b3a66d2
added test scripts for vad, whisper deployments
Jiltseb Feb 14, 2024
905414e
test scripts for the audio data class
Jiltseb Feb 14, 2024
c412ce4
modified asr output test to cater for no audio channel
Jiltseb Feb 14, 2024
5c65230
change in extract_audio function
Jiltseb Feb 14, 2024
4a25ef2
added comment for integration tests
Jiltseb Feb 14, 2024
4f0e984
added examples in test demo notebook
Jiltseb Feb 14, 2024
351f7b0
automatic poetry update
Jiltseb Feb 14, 2024
8d3df0a
added test files and expected outputs for deployment and integration …
Jiltseb Feb 14, 2024
11e3c93
added examples in test demo notebook
Jiltseb Feb 14, 2024
7f5e6ff
changes to vad deployment
Jiltseb Feb 22, 2024
43a88b3
changes to whisper deployment
Jiltseb Feb 22, 2024
24e2f0c
resetting default values for bug fix
Jiltseb Feb 22, 2024
d8ddaae
added download function
Jiltseb Feb 27, 2024
0561e4a
changed extract_audio with pyAV
Jiltseb Feb 27, 2024
c0cb6f0
added audio utils for utility functions
Jiltseb Feb 27, 2024
334b9f3
added pyAV
Jiltseb Feb 27, 2024
af1a446
modified params and outputs
Jiltseb Feb 27, 2024
7ad9934
changes in deployment files
Jiltseb Feb 27, 2024
4fc5f5b
changes in configurations
Jiltseb Feb 27, 2024
d9bbe79
added core audio dataclass with pyAV
Jiltseb Feb 27, 2024
d5de85d
generic changes
Jiltseb Feb 27, 2024
a7833cf
changes to tests
Jiltseb Feb 27, 2024
eeddb9e
changes in tests
Jiltseb Feb 27, 2024
4df4367
fixed typo
Jiltseb Mar 1, 2024
3ce3d32
updated audio bytes to array conversion
Jiltseb Mar 1, 2024
a411cdc
updated examples
Jiltseb Mar 1, 2024
715e54e
updated tests and files
Jiltseb Mar 1, 2024
40acdb3
changed expected results for chat with video
Jiltseb Mar 4, 2024
4725a32
adding for workflow tests
Jiltseb Mar 4, 2024
c8fa100
file changes when testing chat_with_video integration test failure
Jiltseb Mar 5, 2024
b11b3b7
updated files for testing
Jiltseb Mar 5, 2024
1bd72ea
new expected files for whisper integration test with audio/video
Jiltseb Mar 6, 2024
1631cae
further file updates for tests
Jiltseb Mar 6, 2024
ecdf091
Updated cache file
movchan74 Mar 6, 2024
9ac4c80
changes after reviews
Jiltseb Mar 11, 2024
61dc28f
changes for audio PR after reviews
Jiltseb Mar 12, 2024
71c5cf3
changes for vad deployment, general util files and adding batched ver…
Jiltseb Mar 13, 2024
eff4085
updating cache files
Jiltseb Mar 13, 2024
04c64df
updated cache files
Jiltseb Mar 13, 2024
8f3916e
vad_deployment test and rerun tests
Jiltseb Mar 13, 2024
f493f22
ruff checks passed
Jiltseb Mar 13, 2024
1342d2f
update Path once
Jiltseb Mar 13, 2024
996eb27
renamed model, added model_dir
Jiltseb Mar 13, 2024
8a6b5d9
changed reading from bytes
Jiltseb Mar 13, 2024
28b4c40
linked issue, removed duplicate
Jiltseb Mar 13, 2024
c3ea371
updated docstrings
Jiltseb Mar 14, 2024
b29c7b7
Merge branch 'main' into js/batched_whisper
movchan74 Mar 15, 2024
b737a14
Updated test files and cache
movchan74 Mar 15, 2024
9e6c8e7
Cosmetical fixes
movchan74 Mar 15, 2024
d648973
Merge branch 'main' into js/batched_whisper
movchan74 Mar 15, 2024
0bb4336
Update content-hash in poetry.lock
movchan74 Mar 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
"editor.formatOnSave": true,
},
"python.testing.pytestArgs": [
// "--import-mode=importlib",
"aana"
],
"python.testing.unittestEnabled": false,
Expand Down
14 changes: 14 additions & 0 deletions aana/configs/deployments.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
StableDiffusion2Config,
StableDiffusion2Deployment,
)
from aana.deployments.vad_deployment import VadConfig, VadDeployment
from aana.deployments.vllm_deployment import VLLMConfig, VLLMDeployment
from aana.deployments.whisper_deployment import (
WhisperComputeType,
Expand Down Expand Up @@ -59,4 +60,17 @@
dtype=Dtype.FLOAT16,
).model_dump(),
),
"vad_deployment": VadDeployment.options(
num_replicas=1,
max_concurrent_queries=1000,
ray_actor_options={"num_gpus": 0.05},
user_config=VadConfig(
model=(
"https://whisperx.s3.eu-west-2.amazonaws.com/model_weights/segmentation/"
"0b5b3216d60a2d32fc086b47ea8c67589aaeb26b7e07fcbe620d6d0b83e209ea/pytorch_model.bin"
),
onset=0.5,
sample_rate=16000,
).model_dump(),
),
}
36 changes: 32 additions & 4 deletions aana/configs/endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,32 @@
),
],
),
Endpoint(
name="whisper_transcribe_in_chunks",
path="/video/transcribe_in_chunks",
summary="Transcribe a video using Whisper Medium by segmenting it into chunks",
outputs=[
EndpointOutput(
name="transcription",
output="video_transcriptions_batched_whisper_medium",
streaming=True,
),
EndpointOutput(
name="segments",
output="video_transcriptions_segments_batched_whisper_medium",
streaming=True,
),
EndpointOutput(
name="info",
output="video_transcriptions_info_batched_whisper_medium",
streaming=True,
),
EndpointOutput(
name="transcription_id", output="transcription_id_batched"
),
],
streaming=True,
),
Endpoint(
name="delete_media_id",
path="/video/delete",
Expand All @@ -130,17 +156,17 @@
outputs=[
EndpointOutput(
name="transcription",
output="video_transcriptions_whisper_medium",
output="video_transcriptions_batched_whisper_medium",
streaming=True,
),
EndpointOutput(
name="segments",
output="video_transcriptions_segments_whisper_medium",
output="video_transcriptions_segments_batched_whisper_medium",
streaming=True,
),
EndpointOutput(
name="info",
output="video_transcriptions_info_whisper_medium",
output="video_transcriptions_info_batched_whisper_medium",
streaming=True,
),
EndpointOutput(
Expand All @@ -152,7 +178,9 @@
name="timestamps", output="video_timestamps", streaming=True
),
EndpointOutput(name="caption_ids", output="caption_ids"),
EndpointOutput(name="transcription_id", output="transcription_id"),
EndpointOutput(
name="transcription_id", output="transcription_id_batched"
),
],
streaming=True,
),
Expand Down
179 changes: 173 additions & 6 deletions aana/configs/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
from aana.models.pydantic.prompt import Prompt
from aana.models.pydantic.question import Question
from aana.models.pydantic.sampling_params import SamplingParams
from aana.models.pydantic.vad_output import VadSegments
from aana.models.pydantic.vad_params import VadParams
from aana.models.pydantic.video_input import VideoInput, VideoInputList
from aana.models.pydantic.video_metadata import VideoMetadata
from aana.models.pydantic.video_params import VideoParams
Expand Down Expand Up @@ -261,6 +263,28 @@
},
],
},
{
"name": "extract_audios",
"type": "ray_task",
"function": "aana.utils.video.extract_audio",
"batched": True,
"flatten_by": "video_batch.videos.[*]",
"dict_output": False,
"inputs": [
{
"name": "video_objects",
"key": "video",
"path": "video_batch.videos.[*].video",
},
],
"outputs": [
{
"name": "audio_objects",
"key": "output",
"path": "video_batch.videos.[*].audio",
},
],
},
{
"name": "video_params",
"type": "input",
Expand Down Expand Up @@ -353,9 +377,9 @@
"method": "transcribe_batch",
"inputs": [
{
"name": "video_objects",
"key": "media_batch",
"path": "video_batch.videos.[*].video",
"name": "audio_objects",
"key": "audio_batch",
"path": "video_batch.videos.[*].audio",
},
{
"name": "whisper_params",
Expand Down Expand Up @@ -418,6 +442,26 @@
},
],
},
{
"name": "extract_audio",
"type": "ray_task",
"function": "aana.utils.video.extract_audio",
"dict_output": False,
"inputs": [
{
"name": "video_object",
"key": "video",
"path": "video.video",
},
],
"outputs": [
{
"name": "audio_object",
"key": "output",
"path": "video.audio",
},
],
},
{
"name": "generate_frames_for_video",
"type": "ray_task",
Expand Down Expand Up @@ -487,9 +531,9 @@
"method": "transcribe_stream",
"inputs": [
{
"name": "video_object",
"key": "media",
"path": "video.video",
"name": "audio_object",
"key": "audio",
"path": "video.audio",
},
{
"name": "whisper_params",
Expand Down Expand Up @@ -519,6 +563,91 @@
},
],
},
{
"name": "vad_params",
"type": "input",
"inputs": [],
"outputs": [
{
"name": "vad_params",
"key": "vad_params",
"path": "video.vad_params",
"data_model": VadParams,
}
],
},
{
"name": "vad_transcribe_in_chunks_audio",
movchan74 marked this conversation as resolved.
Show resolved Hide resolved
"type": "ray_deployment",
"deployment_name": "vad_deployment",
"method": "asr_preprocess_vad",
"inputs": [
{
"name": "audio_object",
"key": "audio",
"path": "video.audio",
},
{
"name": "vad_params",
"key": "params",
"path": "video.vad_params",
},
],
"outputs": [
{
"name": "video_transcriptions_vad_segments",
"key": "segments",
"path": "video.vad_segments",
"data_model": VadSegments,
},
],
},
{
"name": "whisper_medium_transcribe_in_chunks_video",
"type": "ray_deployment",
"deployment_name": "whisper_deployment_medium",
"data_type": "generator",
"generator_path": "video",
"method": "transcribe_in_chunks",
"inputs": [
{
"name": "audio_object",
"key": "audio",
"path": "video.audio",
},
{
"name": "video_transcriptions_vad_segments",
"key": "segments",
"path": "video.vad_segments",
},
{
"name": "whisper_params",
"key": "params",
"path": "video_batch.whisper_params",
"data_model": WhisperParams,
},
],
"outputs": [
{
"name": "video_transcriptions_segments_batched_whisper_medium",
"key": "segments",
"path": "video.segments_batched",
"data_model": AsrSegments,
},
{
"name": "video_transcriptions_info_batched_whisper_medium",
"key": "transcription_info",
"path": "video.transcription_info_batched",
"data_model": AsrTranscriptionInfo,
},
{
"name": "video_transcriptions_batched_whisper_medium",
"key": "transcription",
"path": "video.transcription_batched",
"data_model": AsrTranscription,
},
],
},
{
"name": "media_id",
"type": "input",
Expand Down Expand Up @@ -865,6 +994,44 @@
}
],
},
{
"name": "save_video_transcription_batched",
"type": "function",
"function": "aana.utils.db.save_video_transcription",
"kwargs": {
"model_name": "whisper_medium",
},
"dict_output": True,
"inputs": [
{
"name": "video_media_id",
"key": "media_id",
"path": "video.media_id",
},
{
"name": "video_transcriptions_info_batched_whisper_medium",
"key": "transcription_info",
"path": "video.transcription_info_batched",
},
{
"name": "video_transcriptions_segments_batched_whisper_medium",
"key": "segments",
"path": "video.segments_batched",
},
{
"name": "video_transcriptions_batched_whisper_medium",
"key": "transcription",
"path": "video.transcription_batched",
},
],
"outputs": [
{
"name": "transcription_id_batched",
"key": "transcription_id",
"path": "video.transcription_id_batched",
}
],
},
{
"name": "save_transcripts_batch_medium",
"type": "function",
Expand Down
2 changes: 2 additions & 0 deletions aana/configs/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ class Settings(BaseSettings):
tmp_data_dir: Path = Path("/tmp/aana_data") # noqa: S108
image_dir: Path = tmp_data_dir / "images"
video_dir: Path = tmp_data_dir / "videos"
audio_dir: Path = tmp_data_dir / "audios"
model_dir: Path = tmp_data_dir / "models"
num_workers: int = 2

db_config: DBConfig = {
Expand Down
Loading
Loading