Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anyway to access the raw audio data (pcm or otherwise) #643

Open
mattfeury opened this issue Jan 10, 2025 · 5 comments
Open

anyway to access the raw audio data (pcm or otherwise) #643

mattfeury opened this issue Jan 10, 2025 · 5 comments

Comments

@mattfeury
Copy link

hello, this is not an issue per se but i saw similar questions and wanted to ask it: is there any way to gain access to the underlying audio data from a call?

i have tried pulling it in an AudioRecord object, but it does not seem to be sent in any source. I've tried VOICE_CALL, REMOTE_SUBMIX, UPLINK, DOWNLINK, etc. nothing returns the audio data. i'd like to pump it over the network.

thank you

@afalls-twilio
Copy link
Contributor

@mattfeury I'm not sure why you want to do this, the voice SDK does just that, it sends audio data over the network. That being said, the only way to get access to the raw PCM data is to write your own audio device by extending the AudioDevice class.

@mattfeury
Copy link
Author

@mattfeury I'm not sure why you want to do this, the voice SDK does just that, it sends audio data over the network. That being said, the only way to get access to the raw PCM data is to write your own audio device by extending the AudioDevice class.

thank you for the response. i'm referring to the playback audio (e.g. the "renderer" data). currently, twilio plays this back via an AudioTrack which means it plays via a local speaker (built-in, headset, etc). I want to forward the data to an external source. I think AudioDevice will let me do this.

One follow up question: Is AudioDevice.audioDeviceReadRenderData blocking? I notice in the example it pumps to an audio track and that audio track is doing in a blocking manner, but since i'm replacing that audioTrack with an async network call, I want to be sure i'm doing it correctly. it seems that AudioDevice.audioDeviceReadRenderData is not blocking and since any following calls are async, the speakerRendererRunnable runs extremely frequently. e.g. 5000 loops in just 10ms, despite my CALLBACK_BUFFER_SIZE_MS being 10ms. this makes me think i need to put an explicit Thread.sleep in the runnable. is this correct?

@afalls-twilio
Copy link
Contributor

afalls-twilio commented Jan 18, 2025

@mattfeury yes, internally, webrtc operates on 10m blocks and this is by design to reduce audio latency... as for it being blocking, I believe it is but I will have to spend some time digging into WebRTC and get back to you... If you are seeing it get return more frequently then every 10ms, It might not be.... from the SDK perspective AudioDevice.audioDeviceReadRenderData just calls an internal WebRTC API called RequestPlayoutData.

@87620089
Copy link

How to combine the microphone sound from AudioRecord and the sound from audioDeviceReadRenderData into one pcm file? The sound from the microphone will be sent through audioDeviceWriteCaptureData

@afalls-twilio
Copy link
Contributor

afalls-twilio commented Jan 20, 2025

@87620089 you would have to mix the pcm samples down yourself, maybe something like this?
https://stackoverflow.com/questions/12089662/mixing-16-bit-linear-pcm-streams-and-avoiding-clipping-overflow

Also, you would have to make sure they are the same bitrate before you mix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants