Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable media playback in JQuery mode #441

Merged
merged 1 commit into from
Dec 6, 2018

Conversation

Jaifroid
Copy link
Member

The evolution of this code is tracked in #172 . I believe the issues I identified there have been solved.

The code is generalized to work with <video> and <audio> blocks, together with any <source> tags contained within them. Hence I call it "media playback" rather than just "video". I have tested with video and audio provided in different ZIMs, but note that I did not find any ZIM using pure audio blocks -- the scrapers seem to put any audio in video blocks (and it plays fine like that). Details of some pages to test with in #172.

Playback works in browsers without needing JavaScript-in-the-ZIM to be enabled, although sometimes getting to the content is a little challenging without JS (but basically our standard search usually works). I have tested the code with FFOS (works in simulator, actual device will depend on available codecs), Edge, Chromium, Firefox ESR, IE11 (no webm support), and UWP (works in Windows 10, no webm codec in Windows 10 Moible). For the browsers which don't support webm, it seems to degrade gracefully, indicating that there is no support.

One issue with TEDx ZIMs is that they contain a proprietary UI that uses extensive JS, including having links of the type <a href="javascript:doSomeProprietaryFunction();">Go to top</a>. Or else <body onload="raw javascript">. These cause an exception when clicked, or in the worst case, cause an app crash on loading the page, and they won't work even in SW mode in most extensions. But this is a different issue. I just thought it should be signalled with regard to testing this PR.

@Jaifroid Jaifroid self-assigned this Nov 23, 2018
@Jaifroid Jaifroid requested a review from mossroy November 23, 2018 11:18
www/js/app.js Outdated
// It first searches for <img, <script, or <link, then scans forward to find, on a word boundary, either src=["']
// OR href=["'] (ignoring any extra whitespace), and it then tests everything up to the next ["'] against a pattern that
// matches ZIM URLs with namespaces [-I] ("-" = metadata or "I" = image). Finally it removes the relative or absolute path.
// DEV: If you want to support more namespaces, add them to the END of the character set [-I] (not to the beginning)
var regexpTagsWithZimUrl = /(<(?:img|script|link)\s+[^>]*?\b)(?:src|href)(\s*=\s*["']\s*)(?:\.\.\/|\/)+([-I]\/[^"']*)/ig;
var regexpTagsWithZimUrl = /(<(?:img|script|link|video|audio|source)\s+[^>]*?\b)(?:src|href)(\s*=\s*["']\s*)(?:\.\.\/|\/)+([-I]\/[^"']*)/ig;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick explanation here: all video and audio I've encountered have been in the I namespace. The spec doesn't mention video or other media, and no specific namespace seems to exist for it. There are two image namespaces, with the second one being J, but the detailed specs are missing for image handling, so it's not clear when if ever J is used. I have never encountered it, and it seems to be reserved for image text (?).

@mossroy
Copy link
Contributor

mossroy commented Nov 23, 2018

Thanks a lot, it looks great!
I only did a quick test on a Firefox OS device, with dirtybiology ZIM file, and it worked perfectly!
At first sight, the code looks clean too.

I'll try to have a closer look, but I really lack some time those days.

@Jaifroid
Copy link
Member Author

Hmm, I've just finished downloading Dirtybiology, the latest ZIM dated 2018-06, and I'm getting in both Edge and Firefox, "Video format or MIME type is not supported". The type of videos in this ZIM is "video/mp4" rather than webm. Now mp4 is definitely supported in Windows, though I guess the specific encoding could be a problem with my system, though I doubt it. I'll investigate...

@mossroy
Copy link
Contributor

mossroy commented Nov 24, 2018

I was using an old ZIM of dirtybiology (2015-11) : I'll try later with 2018-06

@Jaifroid
Copy link
Member Author

OK, the videos on the sub-pages of that ZIM work, it's the video on the front page that doesn't. Looking at it in detail, it doesn't appear to have a source. It looks like it might be loaded dynamically in JavaScript.

But mp4 videos on other pages in the ZIM seem to work OK on my system.

To summarize, I think this is a "false alarm" based on the fact that the front page of the ZIM probably populates the video block with dynamic info. Maybe a random video, or "video du jour" type of thing...

@Jaifroid
Copy link
Member Author

I completely forgot about subtitles / closed captions!

I've now added them to the PR. I've tested on Edge with the latest dirtybiology ZIM and with ted_en_global_issues_2018-07.zim. Subtitles show fine, and the media block doesn't need to reload the media on insertion of subtitle files blobs.

I wondered if I needed to decode the files to utf8, but because they are being presented to the browser as URLs of uint8 arrays, the browser seems to handle that. It seems to have no problem with Arabic, Greek, etc.

@Jaifroid
Copy link
Member Author

I'm experimenting with not using the DataView method of converting the uint8array to blob, even though it's theoretically faster. The reason is that a) it blocks Internet Explorer needlessly (I'm not sure why, because DataView is supposed to be fully supported from IE10); and b) I'm not actually convinced it offers a more efficient access to the underlying data in our case. In fact, I can't see any of the theoretical advantages from my quick tests -- the blob is loaded just as fast with either direct use of the uint8array or a DataView of its buffer.

IE11 is able to play the videos in the recent TEDx ZIM ted_en_global_issues_2018-07.zim, because they are in video/mp4 format, so it's worth continuing support, I think, if it doesn't offer any disadvantages to the other browsers.

@Jaifroid
Copy link
Member Author

There is a problem when there are too many subtitle tracks for a video (e.g. in TEDx ZIMs). The screen grab below is from latest global issues TEDx ZIM in Firefox, after pressing the captions button in the player.

Only the subtitles from Danish to Russian can be selected, and the others are impossible to click. In Edge, only the top five subtitles are shown, and the others are also impossible to reach. Furthermore, subtitle files are highly compressed and it takes a very long time to decompress all of these (I've monitored them -- they carry on decompressing for at least a minute after loading the page). I've been thinking that all this decompression is a bit useless, when at most the viewer will want one of the subtitle languages.

@mossroy @kelson42 , what would you advise in this situation? I could add a little box beneath the video,, or in a bootstrap modal at the bottom fo the screen, allowing the user to select which subtitle file to decompress, and insert just that one as the top track only when the user asks for it. It would make the requested language accessible and would save a lot of unnecessary decompression. But is this intervening in a problem that should be fixed at ZIM level? Should I make an issue somewhere for this?

image

@Jaifroid Jaifroid force-pushed the Support-video-in-the-ZIM-in-jQuery-mode branch from 6a5a47a to 32f2836 Compare November 30, 2018 11:30
@Jaifroid
Copy link
Member Author

I've rebased this PR on master for ease of testing media-based ZIMs which couldn't be searched without #440 #443.

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 2, 2018

There is what I believe is a specific problem with accessing subtitles in tedxgeneva-2014_fr_all_2015-03.zim. The subtitles are definitely in the ZIM (you can see them if you press the space bar in our search box), but do not seem to be properly referenced, which means that they can't be found, either in this branch in jQuery mode, or in master in SW mode. The problem can be seen in the page's source code below. It looks like the subtitles are referenced as if the page referencing them were inside the subdirectory of the video.

Should we try to work around this, or put it down to a bad encode of this ZIM?

Subsequent TEDx ZIMs I've tested don't have this problem, but I've only tested the global issues ZIM.

<video class="video-js vjs-default-skin" controls preload="auto" width="480" height="270" 
    data-setup='{"autoplay": true, "preload": "true"}'>
    <source src="../../I/Free-software-free-society-Richard-Stallman-at-TEDxGeneva-2014/video.webm" type="video/webm" />
    <track kind="subtitles" src="en.vtt" srclang="en" label="English" />
    <track kind="subtitles" src="es-ES.vtt" srclang="es-ES" label="" />
    <track kind="subtitles" src="hu.vtt" srclang="hu" label="Hungarian" />
</video>

@kelson42
Copy link
Collaborator

kelson42 commented Dec 2, 2018

@Jaifroid @mossroy I would not recommend to workaround the problem. If there is a bug please report it. If it is done in a way that it ask for too much of decompression, please report it too (but to me it sounds really strange).

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 3, 2018

Thanks @kelson42 . The issue with decompression of lots of vtt files is just the usual one: decompressing text-based assets (html, js, css, svg, vtt) is "costly" and slow for our app, whereas extracting assets that are stored in the ZIM with no additional compression (jpeg, png, webm, mp4, etc.) is fast. So, when we load a video page from TEDx that has 30+ subtitle assets, it takes a lot of CPU to decompress them all. You can really feel a mobile device heating up, for example, and even on a high-spec laptop it can cause the fan to kick in. Since the user only needs one of the subtitle files, I was trying to think of ways to prevent unnecessary decompression.

I think the display bug shown in screenshot above is a Firefox issue.

@kelson42
Copy link
Collaborator

kelson42 commented Dec 3, 2018

@Jaifroid Yes, I rephrase my question: why do we have to unpack 30+ vtt files? I don't see any reason why this should be necessary! Only one vtt file decompression should be necessary to get on language. We have a bug somewhere IMO.

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 3, 2018

@kelson42 The issue is that in jQuery mode, we decompress all assets with ZIM URLs on a page after we inject the page into the iframe's DOM. If I can find something like an onTrackSelect event that fires when the user selects a specific subtitle language in the video player, I could exempt subtitles and instead insert the single required file at the moment the user selects it. Unfortunately, despite scouring the spec and trying onchange events on the text tracks, I haven't yet found an event that fires.

I'll keep looking... Otherwise, I'm thinking a custom <select> dropdown immediately beneath the video would be the best solution (jQuery mode only).

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 4, 2018

Some more info on events associated with text tracks here:

https://stackoverflow.com/questions/50778257/texttracklist-onchange-event-not-working-in-ie-and-edge

The suggested workaround doesn't work (I tested) because we don't have a valid subtitle URL (until we extract the file from the ZIM).

EDIT: The track's onerror event fires in Edge when changing the subtitles. However, in Firefox, all 28+ of these events fire on first load of the video (one for each subtitle track), so it looks like we can't use that.

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 4, 2018

So I've pushed a custom text track selection dropdown to this PR for testing. In Edge, IE11 and Chromium, the subtitles will show immediately on selection, even if the video is already playing. On Firefox, currently, it's necessary to press the CC button in the player after selecting the subtitiles. This can probably be automated.

This is what it looks like visually:

image

@mossroy
Copy link
Contributor

mossroy commented Dec 4, 2018

Did you try to listen to the addTrack event of textTracks ?
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video#Detecting_track_addition_and_removal

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 4, 2018

@mossroy I assumed that would only fire when a track is added programmatically to the video block. The tracks are already in the video DOM on page load/render, so they're not being added. However, I'll test to see, in case my assumption was wrong.

@mossroy
Copy link
Contributor

mossroy commented Dec 4, 2018

I would make the same assumption, but am not sure

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 4, 2018

Unfortunately, the addtrack event of the video element's textTracks object, like the onerror event, fires on DOM load in Firefox for each of the subtitle tracks. In both Edge and FF it doesn't fire on attempting to select a different subtitle, at least one with an invalid src (I tested both with a ZIM src and with no src / data-kiwixurl).

image

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 4, 2018

OK, we might be able to work around with a combination of an onchange event on the textTracks object in Firefox and Chromium, and an onerror event for Edge. I had previously tested FF for a change event of the video element itself, but trying again now with the change event of the textTracks object, it fires in both FF and Chromium on attempting to load a subtitle file, even though the src is invalid. In Edge, it doesn't fire (because the src is invalid), but the onerror event fires instead. In IE11, no event fires, though the textTracks object does exist.

Clearly, Edge doesn't attempt to load the asset until the user selects it, and then reports an error instead of a change. Firefox and Chromium both report the error on DOM load, which suggests they might be trying to load the assets up-front. Difference in loading strategy seems to account for the difference in events here.

The difficulty in working out a combination of onerror and onchange is that we'd need some browser detection, to prevent FF and Chromium from reacting to the onerror event, since it fires for each track on DOM load. And IE would be left with no support for subtitles.

@mossroy
Copy link
Contributor

mossroy commented Dec 5, 2018

If I sum up the situation (please correct me if I'm wrong) :

  • You managed to make videos work in jQuery mode (on all browsers, even if some video formats are not supported by IE)
  • You managed to make subtitles work in jQuery mode too (on all browsers), by loading all of them on page load. But it can be slow when there are many, because subtitles are compressed, and our current ZIM backend is not very fast
  • To optimize performance, you are testing 2 different ways :
    • modify the page to add a drop-down for the user to select the subtitle. This way, you can read only the selected subtitle
    • use a combination of onchange and onerror events to read only the selected subtitle

Maybe this could be split into 2 different PRs : a first one which enables videos and subtitles (without optimization), and another one to test some optimizations.
The first one might be merged, and give some time to work on optimizations.

Regarding optimizations, my first reflex would be to work on #116 instead (compile libzim with emscripten). I know it will take more time to implement that, but it should improve performance and features of our ZIM files reading. I feel that our time would be better invested to work on that.

Anyway, if you prefer to work on this optimization (which I would respect), I would prefer the second way (trying to use the events sent by the browser), because it looks more generic, and can not break the layout of the page.
I don't think leaving IE11 behind, on that topic, is a problem. It does not support webm (which is AFAIK the most common video codec used in ZIM files), and is not a priority IMHO.

But I think we must not make some code that depends on the browser brand. Because we can not know how these browsers will evolve in the future. And it's not unusual that this kind of detection fails to do what we want : for example how should Firefox for iOS be detected? (It's a browser made by Mozilla, but that does not use the Gecko engine). See https://css-tricks.com/browser-detection-is-bad/
Anyway, I think we can make an exception for Internet Explorer : we know for sure that this browser will not evolve, so we might detect it in order to use a fallback mode for this browser (like injecting all the subtitles).

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 5, 2018

That's a very clear and correct summary of the situation @mossroy . I would quite like to be able to release a version of Kiwix JS (and Kiwix JS Windows) that can claim to support TEDx ZIMs, so it might be best to revert this PR to the version that extracts all the subtitles, and then (after any further tests) merge it. That will then provide base support on all browsers.

I would also like to continue working on a solution (under a new issue / PR) for preventing extraction of all subtitles, because I think it's very close, and if I stop now, I'll probably forget what has already been tested. Besides, working on #116 just throws more power at the problem, and doesn't prevent the problematic behaviour of unnecessarily extracting 30 subtitle files when the user only needs one (which is actually a "bug"). From what I can tell, in SW mode the video.js plugin is used to manage video playback rather than the native widget, although that has its own problems (I can't seem to select subtitles at all in SW mode in Chromium in the latest TEDx global issues ZIM).

I completely agree that browser detection is bad. I'll think of how we could use capability detection instead. That actually could be quite easy with IE (as it seems to lack certain methods of the textTracks object).

Thank you, this exchange has been very helpful and I can see a way forward.

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 5, 2018

@mossroy This PR is now ready for your review (no rush, when you have time). I've reverted to the version that extracts all subtitle files after video load, and have cleaned up the code a little. It's tested and working in Edge, IE11, FF ESR and Chromium with ted_en_global_issues_2018-07.zim. In FFOS I tested video playback, which is fine, but not subtitle display (because I haven't loaded one of the newer properly subtitled ZIMs onto the fake SD card). I can't see any reason why it wouldn't work on FFOS, so long as the HTML5 widget in FFOS has subtitle / CC support.

@mossroy
Copy link
Contributor

mossroy commented Dec 5, 2018

I did not manage to choose subtitles on my Firefox OS device. But, as you said, it's probably because it might not be supported by the platform.
In fact, subtitles are supported by the underlying gecko engine of Firefox OS (at least in version 2.5) : I can see them on my device on https://www.iandevlin.com/html5test/webvtt/html5-video-webvtt-sample.html. But there does not seem to be any GUI to choose between subtitles (or disable them). In dirtybiology_fr_all_2018-06.zim, it does not display any subtitle (maybe the platform only displays the default subtitle).

But we don't care if subtitles do not work in Firefox OS, it's not worth working on it IMHO. It's already a miracle that videos work on these devices, even with only 512MB of RAM.

On Firefox 63, there is an error message in the console, probably because it does not find the video source when the page loads (and before the media has been injected) :

L’élément n’a pas d’attribut « src ». Le chargement de la ressource média a échoué. article.html
Le chargement de toutes les ressources possibles a échoué. Le chargement du média a été arrêté.

But that's very minor.

Except that, videos seem to work very fine on the platforms I tested (including subtitles) : Firefox 63, Firefox 63 as an extension, Chromium 70.
On dirtybiology_fr_all_2018-06.zim, the home page displays a black video zone with no content, but I suppose this comes from the lack of javascript support?

Copy link
Contributor

@mossroy mossroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good too : nice work!

@Jaifroid
Copy link
Member Author

Jaifroid commented Dec 5, 2018

But there does not seem to be any GUI to choose between subtitles (or disable them). In dirtybiology_fr_all_2018-06.zim, it does not display any subtitle (maybe the platform only displays the default subtitle).

It would be very easy to have an option on the Configuration page: Set default subtitle language: xxxx. Then we could mark one of the languages as default and it might play on clients that don't have an interface for choosing subtitles.

L’élément n’a pas d’attribut « src ». Le chargement de la ressource média a échoué. article.html
Le chargement de toutes les ressources possibles a échoué. Le chargement du média a été arrêté.

I've seen the equivalent message in English. It's because we've changed src to kiwix-dataurl. We do this to prevent 404s, but clearly something checks here that the src is missing. We'd probably get a failure message if we left the src in place also. The only way to avoid it would be to pre-extract the video before rendering the DOM, but I feel that would cause too long a delay.

On dirtybiology_fr_all_2018-06.zim, the home page displays a black video zone with no content, but I suppose this comes from the lack of javascript support?

Yes, these are dynamic videos that are supposed to be filled with some (possibly random, or possibly "vidéo du jour") content. We could, in principle, put a random video in there, but the code would not be generic and would have to be checked on each ZIM update. I guess we'd be better off trying to get #152 [JavaScript in the ZIM] working acceptably.

@mossroy
Copy link
Contributor

mossroy commented Dec 5, 2018

Thanks for these answers. So I would have left this PR as it is : don't do more work to comply with Firefox OS, or to avoid these error logs, or anything specific to some ZIM files.

@Jaifroid Jaifroid force-pushed the Support-video-in-the-ZIM-in-jQuery-mode branch from 1555f15 to 91b290d Compare December 6, 2018 09:04
@Jaifroid Jaifroid merged commit 46c6865 into master Dec 6, 2018
@Jaifroid Jaifroid deleted the Support-video-in-the-ZIM-in-jQuery-mode branch December 6, 2018 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants