-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmented VTT - absolute or relative timestamps? #480
Comments
I would not expect the first cue in each segment to start at 0. What packager are you using? |
#481 has more discussion. It seems that it's unclear whether VTT timestamps are supposed to be absolute of relative. |
We treat them as absolute because that's how timestamps work in all other kinds of media segments. |
Are you able to share any details on the encoder vendor or packager software you're using? |
I'm working on getting you details and a test stream I can share. |
@joeyparrish I emailed you a test stream. |
Is there any update on this? The stream I emailed @joeyparrish is will expire on Sept. 7 so if there is an opportunity to discuss this issue before then, it would be useful. |
Sorry for the delay in my response. I ran into a small, unrelated issue with your manifest that caused us to ignore the WebVTT text: <AdaptationSet mimeType="text/vtt" ...>
<Representation codecs="vtt" ... />
</AdaptationSet> Our parser is registered for As for the timestamps, we treat everything generically. Since timestamps in video segments are relative to the period, so should text timestamps be. The only exception to this so far has been WebVTT embedded in MP4. The atom containing the cue does not actually contain a timestamp. The format specifies that the cue time should be the segment time. We treat this differently than the others because this is part of the spec. Now, looking at your text content, I see this:
I researched X-TIMESTAMP-MAP, and it seems to be an HLS extension and not part of the WebVTT spec. This page from BrightCove states:
I think it makes more sense to add support for X-TIMESTAMP-MAP than to add a configuration option to decide if timestamps are relative or not. Thoughts? |
Actually, no, that won't work after all. As I'm looking more closely at the numbers, the map in the WebVTT files don't match up to the DASH presentation timeline at all. For example, the segment at time 144323839, timescale 1000 contains a map that says it should be offset to 2560783072. Comparing numbers across segments, I see that 1k in the DASH timescale seems to equal 90k in the MPEG2 timescale used by whatever generated the VTT. I'm increasingly convinced now that these timestamps are something the encoder needs to fix. They are taking something meant for HLS and just serving it as DASH, which doesn't make sense. Have you reached out to them for support? |
Related to issue #480 Change-Id: I0ef6d479e496ba45e6c4f984e8f7dc5e218c5175
We've talked to the stream provider and received the following response:
|
I note that DASH-IF IOP v3.3, section 6.4.5 forbids the use of plaintext TTML/WebVTT text in multiple segments, as quoted below.
Therefore the use of such content is dubious at best. You should be using ISOBMFF encapsulation for text streams. |
Thanks, Sander. It appears that we only have one demo asset that violates this part of IOP v3.3, and it's one we created for testing: http://storage.googleapis.com/shaka-demo-assets/tos-pto-webvtt/dash.mpd shakaAssets.testAssets.filter(function(a) {
return a.features.includes(shakaAssets.Feature.SEGMENTED_TEXT) &&
!a.features.includes(shakaAssets.Feature.EMBEDDED_TEXT);
}); Since that's our own home-made test asset, we can change it to align with whatever we determine is the best practice for this non-IOP-compliant situation. Does anyone have examples of public test content that features segmented text not embedded in ISO-BMFF? I'd like to do a survey of what's out there before we make any changes. If you can provide a test stream, please do. If you can't, please just state whether your segments' cue times are relative to the period or to the segment. Also, please state what encoder/packager you use. |
@baconz, I see you on the ExoPlayer thread. Can you weigh in on this? |
We built our VTT packager to conform with Shaka's period-relative timestamps. We can change them since it seems like everybody is switching to segment-relative timestamps. |
Okay. There were literally zero responses to my attempted survey on the mailing list. We will change to segment-relative timestamps in v2.1.0. PRs are welcome, or we'll get to it ourselves, eventually. |
@joeyparrish Any chance of adding a legacy flag to soften the transition for us? I can try to put up the PR, but probably won't get to it this week. |
Sure, that could work. If we introduce a setting for this, we could even put it into v2.0.x (default to current behavior, warn about impending deprecation when used). Then in v2.1.0 we could just remove the setting. |
This looks good. I will try to get a PR in for your review. |
* Add config option for using segment relative timestamps for VTT Fix for #480 * Make useRelativeCueTimestamps a non-nullable param * Update tests for the new useRelativeCueTimestamps param * Move period relative timestamp deprecation warning to vtt parser * Log warning only if using absolute timestamps in text cue * Fix vtt text parser test
* Add config option for using segment relative timestamps for VTT Fix for #480 * Make useRelativeCueTimestamps a non-nullable param * Update tests for the new useRelativeCueTimestamps param * Move period relative timestamp deprecation warning to vtt parser * Log warning only if using absolute timestamps in text cue * Fix vtt text parser test
Related to this issue, if we have the following in the mpd:
what would be the expected result? Video and audio times are calculated correctly but text cues are all generated starting at -1796832000000000. I'm looking at the DASH spec but I'm still unsure how the startTimes are supposed to be calculated. |
Specification-wise, media samples that exist in ISOBMFF containers exist on a separate timeline from the period, with presentationTimeOffset being the alignment factor. In other words, the period 00:00:00 is mapped to 179683200 seconds in the media sample timeline. So to display a piece of text at the start of the period, it would need to have the timestamp of 179683200 seconds, which is 49912:00:00. However, media samples that exist in plain text (sidecar) files are assumed to have a timeline aligned with the period (see DASH-IF IOP 6.4.5)! |
@sandersaares Thank you for pointing me in the right direction, I had not read that. In that section (DASH-IF IOP 6.4.5) it says:
According to this it looks like Shaka should be ignoring the @joeyparrish does this look correct to you? I can open an separate issue for this and likely provided a PR. |
The text parsers were all stateless. This caused problems with MP4 VTT as the timescale is needed later on for other boxes. This changes parsers to carry state. How time is referenced with the text parsers is not clear and has caused confusion. In v2.0.1, we introduced the useRelativeCueTimestamps option to control the behavior of our WebVTT parser. We decided in #480 (comment) that we would remove this option in v2.1.0. All WebVTT timestamps in v2.1.0 will be relative to the segment time. This change creates a new time context interface that will be used to help limit the confusion around how time is communicated. Closes #726 Change-Id: I67409608c35d2d5abb8b8b25529859cb37f8f0a8
WebVTT segments are not syncing properly with live streams because they are not accounting for the
segmentStartTime
. The first cue in each segment starts at00:00:00.000
so thesegmentStartTime
needs to be used to offset each segment properly.The text was updated successfully, but these errors were encountered: