Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] does RxPlayer support fragmented Web VTT? #1638

Open
JohnPaulHarold opened this issue Jan 25, 2025 · 4 comments
Open

[Question] does RxPlayer support fragmented Web VTT? #1638

JohnPaulHarold opened this issue Jan 25, 2025 · 4 comments

Comments

@JohnPaulHarold
Copy link

JohnPaulHarold commented Jan 25, 2025

Hello.

note: I've put this as a question, in case I'm doing something wrong, but it might also be an issue. I can reword as that distinction is established.


I am trying this stream, https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd

It has an array of text track choices, but crucially, it offers text tracks in fragmented mode, but also "sidecar". When I test this stream in the RxPlayer demo https://developers.canal-plus.com/rx-player/ the player instance shows all 10 text track options (2 x 5 languages). This can also be seen in the browser repl, looking at rxPlayer.getAvailableTextTracks(). The first 5 text tracks are the fragmented Web VTT options, while the rest are sidecar equivalents.

RxPlayer seems to play the sidecar variants fine, but does not render the first five.

My question is, does RxPlayer support this fragmented style of captions?


some digging around...

via an error event handler, this is thrown,

MediaError: BUFFER_APPEND_ERROR: Error: Can't parse WebVTT: Invalid File.

{
    "name": "MediaError",
    "type": "MEDIA_ERROR",
    "_originalMessage": "Error: Can't parse WebVTT: Invalid File.",
    "code": "BUFFER_APPEND_ERROR",
    "fatal": false,
    "tracksInfo": [
        {
            "type": "text",
            "track": {
                "language": "en",
                "normalized": "eng",
                "closedCaption": false,
                "id": "0"
            }
        }
    ]
}

which seems to correspond to this area of the code

if (/^WEBVTT( |\t|\n|\r|$)/.exec(linified[0]) === null) {
throw new Error("Can't parse WebVTT: Invalid File.");
}

If I put a breakpoint around that area in the bundled code, the text arg is "\u0000\u0000\u0000\bvtte". At a guess, the parser throws as the string does not begin with WEBVTT?

@JohnPaulHarold JohnPaulHarold changed the title [Queston] does RxPlayer support fragmented Web VTT? [Question] does RxPlayer support fragmented Web VTT? Jan 26, 2025
@peaBerberian
Copy link
Collaborator

Hi,

We do support fragmented webvtt.
I tried playing your content and reproduced the same error.

When loading the corresponding text track segments, I see that the mdat box (where the media data to parse/decode live in an mp4-like file basically), of the first segment is just in hex 00 00 00 08 76 74 74 65 which translated into the corresponding ASCII/UTF-8 with C-like escape sequences is \0\0\0\bvtte.

This doesn't seem to be a valid WEBTT file as far as we understand its specification especially its file structure, as it's supposed to be starting with a WEBVTT string (with optionally a BOM at the start). Even if we would be resilient there and accept vtt, \0\0\0\b (though it's weirdly different in some other segments for the fourth byte) doesn't seem to be a known BOM.

Looking at a subtitle segment actually containing subtitles, the subtitle's content I see when translating into ASCII is: "\x00\x00\x009vttc\x00\x00\x00\tiden1\x00\x00\x00\x1CpaylYou're a jerk, Thom.\x00\x00\x00\fvsidK�G�\x00\x00\x00\bvtte\x00\x00\x00Dvttc\x00\x00\x00\tiden2\x00\x00\x003paylLook Celia, we have to follow our passions;\x00\x00\x00gvttc\x00\x00\x00\tiden3\x00\x00\x00Jpayl...you have your robotics, and I\njust want to be awesome in space.\x00\x00\x00\fvsidx{�" where webvtt should be human-readable, like https://www.w3.org/TR/webvtt1/#introduction-caption.

Maybe it's another version of the format we're not familiar with? Maybe the packaging software you're using to make segmented subtitles has an issue?

@peaBerberian
Copy link
Collaborator

Looking further into it (on other players), it seems to be in an mp4-specialized format that I didn't know of.
I cannot find a spec on this for now but other implementations are simple enough that we can probably implement it by just seeing how others do it.

I'm looking if it's easy to do on our side.

peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
@peaBerberian
Copy link
Collaborator

peaBerberian commented Jan 28, 2025

OK I've made an attempt at parsing it.

I deployed the current work as a demo page here: https://developers.canal-plus.com/rx-player/vttc/ so you can check on your side if it works for your case.

The work is still WIP as there are some architectural questions left but that demo should be close to the final result.

peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
peaBerberian added a commit that referenced this issue Jan 28, 2025
As #1638 reported, we do not support some (or all?) ways of
communicating webvtt subtitles in mp4 files.

I initially assumed that webvtt embedded in an mp4 file worked like for
TTML (a format we're more used to): the plain subtitle format directly
inserted in an `mdat` box.

Turns out, some contents (like https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel-wvtt.ism/.mpd),
actually rely on the metadata from other ISOBMFF boxes (mainly the
tfdt and trun boxes) to provide timing information with the `mdat`
reserved for text, identifiers and style information in a new binary
format following the ISOBMFF way of doing things (new webvtt-specific
boxes).

Weirdly enough, that format does not look at all like the WEBVTT we're
used to beside the fact that it uses the same identifier and "settings"
concept.

---

As now our subtitle parser has to have the context of the whole mp4 file
(and not of the mp4 segment), and as that parser also has to rely on
state (a `timescale` value) provided by an initialization segment, I had
to update the architecture of how subtitles are communicated: they can
now be communicated as string or `BufferSource` types (the latter
leading the text encoding detection), and a supplementary `timescale`
argument (defaulting to `1`) is now always provided to parsers.

The vast majority of parsers now do not make use of that `timescale`
value which is kind of ugly though, we may want to find a better
solution.
@JohnPaulHarold
Copy link
Author

@peaBerberian thank you for this, and my apologies for the delay in replying. I've tested the same stream on your demo page and the wvtt options now appear to be rendering correctly. I'll see if I can checkout the branch on the accompanying PR also and test with some other streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants