Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MP3 frame synchronization is not adequate in some pathological cases #96

Closed
torokati44 opened this issue Jan 20, 2022 · 8 comments
Closed
Milestone

Comments

@torokati44
Copy link

torokati44 commented Jan 20, 2022

symphonia-play detects this corrupted mp3 file as stereo 22050 Hz, so it plays it back at ~4x speed.
It's actually a mono 11025 Hz one, and plays correctly in VLC for example.
This data was extracted from https://z0r.de/L/z0r-de_6315.swf.

There seems to be an unusually high amount of 0xFF bytes before the first actual frame, which most likely throws off sync_frame(), even though check_header() does a decent amount of sanity checking.
Perhaps it should also be extended with parsing the header at any potential frame sync point, then checking whether right after the potential frame there's either the end of the data stream, or another sync sequence (frame header). This would necessitate adding a seekback buffer of (in some cases) as long as ~2.9 KiB, which should not be an issue with local file sources, but for network streams, it might add a noticeable amount of latency. It might also prevent syncing to perfectly valid frames that happen to not be directly followed by another frame... So maybe it should be a decoder option?
There might be other ways to fix the synchronization of this file, but this is what's described here: https://hydrogenaud.io/index.php?topic=85125.msg747716#msg747716
If the frame (supposedly) has CRC, that could also be used instead to rule out red herring frames.

The "correct" way might be to change the sample rate and channel setup at each decoded frame as necessary, but I'm not sure if the playback machinery (of either symphonia-play, or of any embedding application) is set up for that.

For reference: ruffle-rs/ruffle#4273, ruffle-rs/ruffle#5231
(EDIT: The reason I'm reporting is that at the moment, https://z0r.de/6315 only has sound in Chrome, but not in Firefox, as Ruffle currently uses decodeAudioData to decode MP3 sounds, and the mandated-by-spec format sniffing in FF can't figure out that it is in fact supposed to be MP3 content, most likely due to the amount of garbage before the first frame. If only WebAudio/web-audio-api#7 ended up being resolved...)

@torokati44
Copy link
Author

I also have about a dozen more pieces of similarly slightly garbled MP3 content (also from z0r loops), most of which seem like they are missing just the very first two bytes, wasting the entire first frame, but leaving it there as garbage that the frame sync mechanism has to skip through correctly. I'll gladly attach some more of them upon request. This one looked the most problematic though.

@pdeljanov
Copy link
Owner

Hi @torokati44,

Thanks for a very detailed bug report. I think you're on the right track.

 INFO  symphonia_core::probe > found a possible format marker within [ff, f2, 10, 84, 39, ce, 70, 30, 33, 9c, 58, 0, 0, 0, 82, 11] @ 0+35 bytes.
 INFO  symphonia_core::probe > found the format marker [ff, f2] @ 0+35 bytes.

It's not the excessive 0xFFs that are the problem, but a 0xFFF2 byte sequence that is a valid value for a MPEG header. The Mp3Reader believes that to be the first packet and passes it along to the decoder.

[symphonia-bundle-mp3/src/demuxer.rs:138] &header = FrameHeader {
    version: Mpeg2,
    layer: Layer3,
    bitrate: 8000,
    sample_rate: 22050,
    sample_rate_idx: 3,
    channel_mode: DualMono,
    emphasis: None,
    is_copyrighted: false,
    is_original: true,
    has_padding: false,
    has_crc: true,
    frame_size: 22,
}
 WARN  symphonia_play                > decode error: mp3: granule big_values > 288

The decoder promptly errors out, but it sets the sample rate based on that first packet. The Mp3Reader re-synchronizes and the remainder of the packets are correct.

[symphonia-bundle-mp3/src/demuxer.rs:138] &header = FrameHeader {
    version: Mpeg2p5,
    layer: Layer3,
    bitrate: 24000,
    sample_rate: 11025,
    sample_rate_idx: 6,
    channel_mode: Mono,
    emphasis: None,
    is_copyrighted: false,
    is_original: true,
    has_padding: true,
    has_crc: false,
    frame_size: 153,
}

However, by that point the decoder is already fixed to the original sample rate.

In this case the erroneous header set the CRC-enabled bit to 1, and I suppose if we did check the CRC it would catch this specific case, but it's not perfect. Most of the mainstream MP3 decoders don't check CRCs by default since they can be wrong and reject otherwise working streams.

Your solution makes a lot of sense to me. In the normal case it'll only add an extra 2 bytes to read (because the packet is good and we'll use the packet body). In a broken case, yes, we will have read an excessive amount of bytes, but it's not as bad as it sounds. Very few MPEG frames are that big, and the minimum MediaSourceStream reads from the underlying source is 1kB. In practice, I don't think we'll be adding much latency. MediaSourceStream can also rewind up-to 32kB so I think this should be straightforward.

I'll give it a try and report back.

If you can provide a few more cases in the meantime that'd be great. Thanks!

@torokati44
Copy link
Author

torokati44 commented Jan 22, 2022

Of the dozen or so of other loops that Firefox has issues with[1], actually only this one turned out to be misprobed (as AAC) by Symphonia as well:
z0r-7095.mp3_.zip

To be fair, our application will most likely be able to tell the decoder that it is supposed to be MP3.
I found no option to convey this to symphonia-play from the command line though, so I couldn't test whether the decoding would succeed in that case.

[1]: And had the sound embedded in a single chunk - not split into several stream packets in the SWF - so it was easy to export. Though I guess FFmpeg would demux the rest for me as well...

@pdeljanov
Copy link
Owner

Is the AAC misdetection a problem for Ruffle, or just the first issue with the sample rate?

I have a working solution for the sample rate issue that I will push after testing more.

The AAC issue would require me to implement scoring since there's a sync word for an ADTS header showing up in the second file before the MP3 sync word. Currently, Probe only tries to instantiate a FormatReader with the first thing it syncs to. With scoring, there would be an extra step where Probe asks the FormatReader how confident it is in being able to read the stream.

@pdeljanov
Copy link
Owner

I created a new branch mp3-strict-sync with the WIP code if you'd like to try.

@torokati44
Copy link
Author

torokati44 commented Jan 25, 2022

Thanks a lot! I haven't been able to test it yet, but just by looking at the code in the WIP branch: Shouldn't the seekback position (if the sync turns out to be unsuccessful) be right after the first byte of the potential sync word under consideration?
I think as it is now it skips the entire (invalid) frame it tried to decode, even if it happened to contain the next (valid) sync sequence that it is supposed to find?

EDIT: I took a second look, and now I see that this is done only if a frame is read successfully, and verified by checking the following header as well.

@torokati44
Copy link
Author

Is the AAC misdetection a problem for Ruffle, or just the first issue with the sample rate?

Ruffle uses Mp3Reader directly, so there is no format detection involved.
However, loop 7095 is also played back way too fast - I assume, for a similar reason as 6315.

Your WIP branch did fix the sample rate detection for both of these loops though!

@pdeljanov
Copy link
Owner

pdeljanov commented Jan 26, 2022

Sounds good to me!

I was thinking earlier that maybe this check should go even further beyond just checking for a sync word. Perhaps it should also check to make sure the MPEG version, layer, and sample rate match between the two headers. These should be constant within a stream, though, I guess there's no real reason why they couldn't vary.

EDIT: I pushed the original version since I'm not confident rejecting the streams mentioned above is a good idea.

This fix will be available in v0.5 which I'll publish in about a week or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants