New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat(packages): add speaches support for SileroVAD and Whisper STT as OpenAI API #872

Open

OriNachum wants to merge 2 commits into dusty-nv:dev from OriNachum:package/speaches

Collaborator

OriNachum commented Feb 22, 2025 •

edited

Loading

Note: I couldn't get the test to run.
It goes right into server startup.

When I run the test (From another docker) it works great.

Edit: Also, onnx for TTS not supported for now (onnx based)

OriNachum added 2 commits

February 22, 2025 20:15


          feat(packages): add speaches support for VAD and STT server


          chore(packages): added docs to speaches

OriNachum requested review from dusty-nv and ms1design

February 22, 2025 18:32

ms1design requested changes

View reviewed changes

Collaborator

ms1design left a comment •

edited

Loading

Hey there @OriNachum, thanks for your PR! 🥇

I’ve just wrapped up my review, and everything looks well crafted. Found just few places where we could improve and resolve some of the issues that you are facing.

Appreciate your hard work and contribution! 🫶

packages/speech/speaches/Dockerfile

+              #---
+              # name: faster-whisper:speaches
+              # group: audio
+              # depends: [ctranslate2:master-builder,faster-whisper, python:3.12, numpy]

Collaborator

ms1design Feb 24, 2025

Is there any reason to not use ctranslate2 here instead of builder? This impacts the image size and build time. We can easily reduce them.

packages/speech/speaches/Dockerfile

+              # docs: docs.md
+              # notes: run with cuda12.6
+              #---

Collaborator

ms1design Feb 24, 2025

Let's fix the tabs ;)

packages/speech/speaches/Dockerfile

+                      && rm -rf /var/lib/apt/lists/*
+                  # Remove any existing .venv directories
+                  RUN find / -name ".venv" -type d -exec rm -rf {} +  2>/dev/null || true

Collaborator

ms1design Feb 24, 2025 •

edited

Loading

Running find / as root inside a Docker RUN command can be slow. If you want to remove a .venv dir at a specific path, you can use:

RUN [ -d "/app/.venv" ] && rm -rf /app/.venv || true

packages/speech/speaches/Dockerfile

Comment on lines +32 to +43

+                  # Build faster-whisper wheel
+                  RUN git clone https://github.com/guillaumekln/faster-whisper && \
+                      cd faster-whisper && \
+                      sed -i \
+                          -e 's|^onnxruntime.*||g' \
+                          -e 's|^ctranslate2.*||g' \
+                          -e 's|^huggingface_hub.*||g' \
+                          -e 's|^tokenizers.*|tokenizers|g' \
+                          requirements.txt && \
+                      pip3 install --no-cache-dir --verbose -r requirements.txt && \
+                      python3 setup.py bdist_wheel && \
+                      cp dist/faster_whisper*.whl ${WHEEL_DIR}/

Collaborator

ms1design Feb 24, 2025

You already included the faster-whisper in the depends, so it's wheel is already installed. Safe to remove I assume.

packages/speech/speaches/Dockerfile

+                  FROM ${BASE_IMAGE}
+                  ARG SPEACHES_ROOT="/opt/speaches"
+                  ARG WHEEL_DIR="/opt/wheels"

Collaborator

ms1design Feb 24, 2025

@dusty-nv would we like to set JC default wheels dir and set this ENV in python package instead?

packages/speech/speaches/Dockerfile

+                  RUN chmod +x ${SPEACHES_ROOT}/start-speaches-server.sh
+                  # Expose port
+                  EXPOSE 8000

Collaborator

ms1design Feb 24, 2025

NTH: Make the port configurable by ARG

packages/speech/speaches/Dockerfile

Comment on lines +71 to +72

		COPY start-speaches-server.sh ${SPEACHES_ROOT}/start-speaches-server.sh
		RUN chmod +x ${SPEACHES_ROOT}/start-speaches-server.sh

Collaborator

ms1design Feb 24, 2025

I doubt that anybody would build jetson-containers on Windows, so we are safe to assume that you can set (on Linux) start-speaches-server.sh permissions before copying the file into docker image as the Docker will typically preserve the permissions if you're working in a Unix-like environment.

packages/speech/speaches/Dockerfile

Comment on lines +56 to +62

+                  # Print locations of globally installed packages (for debugging)
+                  RUN python3 -c "import sys, numpy, ctranslate2, faster_whisper; \
+                      print(f'Python path: {sys.path}'); \
+                      print(f'NumPy: {numpy.__file__}'); \
+                      print(f'ctranslate2: {ctranslate2.__file__}'); \
+                      print(f'faster_whisper: {faster_whisper.__file__}')" || \
+                      echo "Warning: Not all packages could be imported."

Collaborator

ms1design Feb 24, 2025 •

edited

Loading

This is not needed and can be moved to tests.

packages/speech/speaches/Dockerfile

+              # requires: '>=34.1.0'
+              # test: [test.py]
+              # docs: docs.md
+              # notes: run with cuda12.6

Collaborator

ms1design Feb 24, 2025

If you dare to create config.py for this package we could create a minimum requirement for this package and pin it to cu126

packages/speech/speaches/Dockerfile

		@@ -0,0 +1,77 @@
		#---
		# name: faster-whisper:speaches

Collaborator

ms1design Feb 24, 2025

@dusty-nv I know that the initial package name comes from package root directory. Should we allow to use colon : within the package name in Dockerfile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet