Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioStreamPlayer with an AudioStreamMicrophone stutters for any pitch_scale not equal to 1.0 #99930

Open
goatchurchprime opened this issue Dec 2, 2024 · 5 comments

Comments

@goatchurchprime
Copy link
Contributor

Tested versions

Reproducible on 4.3.stable and 4.4.dev5

System information

Godot v4.3.stable (77dcf97) - NixOS #1-NixOS SMP PREEMPT_DYNAMIC Thu Sep 12 09:13:13 UTC 2024 - X11 - GLES3 (Compatibility) - Mesa Intel(R) Graphics (ADL GT2) - 12th Gen Intel(R) Core(TM) i5-1240P (16 Threads)

Issue description

I encountered this issue while working on the twovoip plugin because I thought I could use this inline resampling feature to read the stream of samples at 48000Hz from a microphone that was recording at 44100Hz. (At the moment the twovoip plugin implements its own resampling on the chunks because the opus compression library doesn't handle 44100Hz audio.)

Here is a prerecorded example. record.zip I have the same on Windows.

This is not an accidental feature, since the AudioStreamPlaybackMicrophone class derives from the AudioStreamPlaybackResampled class, when if it derived directly from the AudioStreamPlayback it wouldn't have this capability.

I've not gone into the code far enough to find the bug, which is probably something to do with reading invalid samples out of the ring buffer. I am pretty sure there is no special case for pitch_scale=1.0 that skips the resampling algorithm.

This means the stuttering that happens on a Windows machine after running the AudioStreamMicrophone for more than 10 minutes under normal conditions (pitch_scale=1.0) might be related to the same bug due to some fractional slippage along the resampler over time.


On a design note, my twovoip plugin puts an AudioEffectOpusChunked class on the Audio Bus fed by the AudioStream carrying the microphone, and runs its own chunking buffer from which you can extract the Opus packets as they are filled. This is versatile because it means I could encode any stream or music from another bus into Opus packets, or apply a voice effect on the microphone before it gets processed.

However, it introduces a delay of an extra buffer as well as the potential bugs like this. So if I am trying to make a quicker response without these features (which don't apply to Opus compression since it is tuned for normal voice audio), should I write a plugin class to derive from directly from the AudioStreamPlaybackMicrophone instead so it can copy the samples directly out of the AudioDriver into its own chunking buffers without an intermediate buffer?

Steps to reproduce

Use the Audio Mic Record Demo from the Godot-demo-projects, and set the pitch_scale to 1.1
https://github.com/godotengine/godot-demo-projects/tree/master/audio/mic_record
Then record your voice and play it back.

Minimal reproduction project (MRP)

See above

@fire
Copy link
Member

fire commented Dec 4, 2024

Is this using AudioEffectCapture?

@goatchurchprime
Copy link
Contributor Author

Same problem happens with AudioEffectCapture. But I'm referring to the demo project that uses AudioEffectRecord because it's better to reproduce issues on an official demo project.

@fire
Copy link
Member

fire commented Dec 4, 2024

See also #99572 for sample rate modification

@goatchurchprime
Copy link
Contributor Author

Not to get too far off topic, this is a straightforward bug at the moment, and I suspect it could have something to do with the degradation of the mic input that happens occasionally even when pitch_scale=1.0.

The quickest fix would be to remove the resampling capability on the microphone stream so that nothing can go wrong with it. It's certainly buggy enough that I don't believe anyone is using it. The twovoip plugin has its own internal 44.1kHz -> 48kHz resampler, for example.

The root problem with the microphone is we're treating it like it is just another audio stream when it is a very special case. For a start, any time you direct it into an AudioBus you have to set that bus to mute it to avoid amplifier feedback, which kind of means it's not doing much good going into the audio system in the first place. And because it's coming from a device with its own clock-cycle instead of a file, the system goes out of phase over time.

Secondly, it's main purpose is to record speech so it can be transmitted across a network to another player. We know how this works: the audio gets chunked into 20ms chunks and compressed by the Opus library. Resampling could more easily be done against these chunks rather than in a sophisticated continuous resampling filter.

And finally there's this new Audio To Expressions system in the Meta libraries that is replaces the OVRLipSync library. This is important because it means there is a second totally independent system listening to the microphone outside of Godot engine and inferring the inputs as if you had face tracking. This tells you that the microphone is all about spoken words, so it is reasonable to tune all of its functionality towards serving this purpose.

@fire
Copy link
Member

fire commented Dec 4, 2024

Do you know of an independent recreation of Audio To Expressions system? I'm unsure how we can integrate other than through the "Unified Expression" system and AudioCapture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants