Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to explicitly spell out what media formats are supported #453

Open
ara4n opened this issue Mar 25, 2019 · 23 comments · May be fixed by matrix-org/matrix-spec-proposals#4011
Open

Need to explicitly spell out what media formats are supported #453

ara4n opened this issue Mar 25, 2019 · 23 comments · May be fixed by matrix-org/matrix-spec-proposals#4011
Assignees
Labels
A-Client-Server Issues affecting the CS API enhancement A suggestion for a relatively simple improvement to the protocol

Comments

@ara4n
Copy link
Member

ara4n commented Mar 25, 2019

e.g. should client support displaying SVGs? HEIC? 3GP? etc.

Miniproposal: we should piggyback on HTML5, with the exception of SVG which is particularly vulnerable to exploding clients thanks to billion lol attacks.

However, if clients think they can display more exotic images safely, they can - but otherwise, we should fall back to thumbnails, ideally those provided by the sending device (in case servers cannot thumbnail or decline to risk thumbnailing the origin image).

@turt2live turt2live added enhancement A suggestion for a relatively simple improvement to the protocol A-Client-Server Issues affecting the CS API labels Mar 25, 2019
@turt2live
Copy link
Member

also stickers

@richvdh
Copy link
Member

richvdh commented Mar 31, 2021

relatedly, we should spell out what formats the /thumbnail endpoint can return. PNG? GIF? Animated GIF? SVG? .mp4?

@kevincox
Copy link

The thumbnail endpoint should probably use the Accept header to get the formats that the client supports. However it probably makes sense to provide minimum requirment. Maybe just JPEG and GIF?

@erikjohnston
Copy link
Member

relatedly, we should spell out what formats the /thumbnail endpoint can return. PNG? GIF? Animated GIF? SVG? .mp4?

I have a few concerns with returning animated gifs/videos:

  1. If we make this configurable on the HS then we're risking splitting the ecosystem a bit. Having such a visible and noticeable difference when using the same client on different servers might cause people to think that things aren't as seamless as we'd want. Especially if we allow animated GIFs as avatars (which I think discord allows?)
  2. Clients often download lots of thumbnails simultaneously, e.g to fill in the room member list with avatars. Allowing animated gifs/videos means that suddenly these calls to /thumbnails might return large amounts of data, wedging the client connection pool and causing the app to suddenly feel sluggish.

This makes me feel quite hesitant about ideas such as allowing servers to optionally support animated gifs, or allowing clients to ignore animations by only rendering the first frame (as they still would pay the cost of downloading the entire file).

I guess ideally we might allow clients to specify if they want an animated thumbnail, plus having the server ensure that animated thumbnails are still fairly small in size (e.g. only cutting out frames if its too big?).

@kevincox
Copy link

If the client only advertised support for JPEG in the Accept header it can effectively choose to not get animated results. This doesn't help with server load as the assumption is that the server would be supporting at least JPEG and GIF.

Call my a millennial but I think that most people want to see animated gifs animating by default. It seem like the two main options are: 1. Require the client to download the original for animation or 2. Require "video" support in /thumbnails (at least GIF, optionally APNG, WebP or other based on the Accept header).

@ara4n
Copy link
Member Author

ara4n commented Mar 31, 2021

I think we need to give the option of supporting animated GIFs for folks who want them, and to faithfully bridge from Discord and others. But I agree that clients should be able to request non-animated GIFs somehow (for users who don't want to bog down their apps with animations, or only load the animated GIF on hoverover, etc). I don't see this as splitting the ecosystem, as the onus will be on client developers to provide a way for viewing the animated GIF if available (eg. on hoverover), just as we do for animated GIFs in the timeline. We can spell out this expectation in the spec while we're formalising the supported formats.

In terms of whether we advertise that we want animated GIFs via Accept header mime-types or querystring parameters on the media repo... I don't have particularly strong thoughts. We don't use Accept headers much (at all?) currently in the API, but perhaps that's a mistake.

@kevincox
Copy link

I think the main downside for using Accept is that it doesn't actually say anything about still/video. So if you want to prevent video you need to carefully select formats (like JPEG) that don't support video. So you would be locking yourself out of still GIFs, PNG and WebP out of fear that you might get a video back. That being said this is probably an acceptable first step and we could add explicit parameters later if we need this feature.

@ShadowJonathan
Copy link
Contributor

I'd say to play with/use the Accept header, it's standard and many browsers can implicitly state which formats they support, i'm not entirely sure how well-supported they are in ajax-like APIs, but that's research-able.

@erikjohnston
Copy link
Member

I don't see this as splitting the ecosystem, as the onus will be on client developers to provide a way for viewing the animated GIF if available (eg. on hoverover), just as we do for animated GIFs in the timeline. We can spell out this expectation in the spec while we're formalising the supported formats.

To clarify: I think its fine if its client controlled, but probably less fine if its server controlled.

@kevincox
Copy link

@ShadowJonathan accept isn't a forbidden request header so there should be no issue setting it from the browser.

@richvdh
Copy link
Member

richvdh commented Mar 31, 2021

I don't really understand why you are obsessing about Accept when it doesn't really solve the problem of animated or not. (Incidentally, it turns out Synapse supports an undocumented type parameter which allows the client to request a particular media type: it currently supports image/jpeg or image/png.

There is a bit of a problem with either way of allowing clients to specify a format, which is that the default behaviour of Synapse is to generate thumbnails at upload time. If we want to give clients the option to choose between formats, that implies that Synapse would have to generate many more thumbnails.

So, my inclination is to resist giving clients the option of format for thumbnail, and just say that thumbnails must be one of png, jpeg or gif (and that clients must support all of them).

@kevincox
Copy link

kevincox commented Mar 31, 2021

I don't really understand why you are obsessing about Accept

Because it is a standardized way for clients to negotiate.

doesn't really solve the problem of animated or not

It doesn't. But as stated it does serve as a short-term fix as clients can ask for JPEG to get a still image.

If we want to give clients the option to choose between formats, that implies that Synapse would have to generate many more thumbnails.

No, this is why I suggested to pick a short list of supported formats. For example if we picked JPEG + GIF the homeserver can just pre-generate these (depending on how we make the requirements it may only need to generate one for still images). The client is happy to support whatever it wants, it isn't guaranteed to get those.

However this means that it is easy for the server to add more formats if it wants. For example synapse may decide that it finds that webp is a good bandwidth saver so it may start supporting that, now any clients that support it may be served that instead. For example some proxies (like Cloudflare) support on-the-fly transcoding, so it would be possible to add that without any protocol or code changes.

just say that thumbnails must be one of png, jpeg or gif (and that clients must support all of them).

This is basically a less flexible version of what I proposed, and I don't see any real benifits. I will reword what I said.

  • Clients must support at least a standardized list of formats (like png, jpeg and gif).
  • Clients request the thumbnail with an Accept header saying what they support.
  • The server can respond with any of the requested formats, or one of the required formats. (although the required formats should probably be in the Accept header anyways).

The server still only needs to support and pre-generate 1 thumbnail (minimum) but it provides the option for servers to use better formats if there is mutual support.

Optionally, we may also add a requirement on the server such as it must support at least jpeg. As in if jpeg is in Accept it may not return a format not in Accept. This would allow hacking around lack of explicit animation control by requesting jpeg, which must be supported.

Optionally, we may also want to ensure there is at least one format that supports video so that the it is a client decision if animations should be played, so maybe we require animated gif support.

@HarHarLinks
Copy link
Contributor

HarHarLinks commented Dec 14, 2021

Matrix should spec sets of mime types that are expected to be supported by clients and servers for dedicated media events like m.image, m.video, m.voice, m.audio, m.sticker, etc. It is necessary because you need to be able to expect that your conversation partners can understand you (i.e. read certain media formats). Obviously you should still be able to send any file as a file, but there are media types you're expected to be able to view inline the conversation.

The server should take care that media has an appropriate format for the media type (image, video, ...) before committing it to the media repo if possible and necessary. Duplicates of the same media in different formats (beyond thumbnails) are to be avoided as to not store a 10MB mp4 plus 10MB webm plus 10MB av1 etc. At the same time it does not seem unreasonable to allow multiple file formats for each media type, such as png and jpg for images. On the other hand this creates complicated situations where the thumbnailer will generate a jpg thumb (which is usually reasonable and smaller) from a png with alpha, resulting in an inaccurate representation (jpg has no alpha).

Optimally the formats chosen are ubiquitously supported, but other considerations apply, e.g. a reasonably modern animated image format (webp? apng? jpgxl? perhaps a video format like mp4/avc?) may be included over the fairly obsolete gif if that format warrants the effort of making every client include the appropriate codec if there are platforms that do not natively support it.

In these cases the server could offer the modern format by default, but if the client does not Accept that modern format in its request header, the server could transcode the stored "canonical" modern format to the gif "fallback format" on demand.

iOS does not seem to support webm video, so mp4 should be chosen instead. Actually, it might be proper to even spec what actual codecs can be used, in this case mp4 = avc + aac, as avc + mp3 might not play back as well. If I try to upload webm to my home server, either the server will transcode it to mp4 or tell my client to transcode it to mp4 first. This implies there needs to be an explicit option to send media "as a file" instead, such that it isn't transcoded.

A general issue about server side transcoding are DOS attacks by requesting much CPU to transcode over and over again, thus should be limited to clients while delivering only canonical format over federation. Further, clients may be rate limited?

@uhoreg
Copy link
Member

uhoreg commented Dec 14, 2021

Transcoding won't work with encrypted files

@HarHarLinks
Copy link
Contributor

HarHarLinks commented Dec 22, 2021

Transcoding won't work with encrypted files

Completely right, so client-side transcoding it is.


A very real example I want to point out: new-ish iOS versions save pictures in HEIC (HEIF) format. This isn't a consistently supported codec across platforms, and iOS clients should

  • not be allowed to upload heic for use as m.image
  • in a user interaction, when the user is trying to upload as image ("share image" button, not "share file"), it must get transcoded
  • the user may consciously choose to upload as file from the start or - upon UI feedback that it would need to get transcoded - choose to send as file instead

Another point i missed earlier:
It's unpreventable server-side that a non-compliant client could still upload the wrong mime-type (e.g. e2ee). It would be non-compliant and as such unexpected behaviour would be "ok" spec-wise. If that's in the scope of MSC/spec, then it should advise clients to try to display it anyway, and failing that show an appropriate error "the author of this message sent a file we can't display". E.g. element web currently just shows an infinite spinner when you send a custom image/heif m.image.

This also raises the question: should clients try to show m.file as the respective media when it's a known "displayable" mime type?


Something else to consider is that the mime type field can be arbitrarily set by clients and could possibly be different from the actual file. Could this be an attack vector? I think element-web pretty much just puts the media in an <img> tag if the event is an m.image (not quite sure, it used to not with svg or something) and the mimetype is allowed https://github.com/matrix-org/matrix-react-sdk/blob/ce226ab5346b405b4f21a4938f87eb2ef6c748f9/src/utils/DecryptFile.js#L57.

@HarHarLinks
Copy link
Contributor

Something related has been done and specified in element-hq/element-android#3444 (comment)

@ShadowJonathan
Copy link
Contributor

Linking matrix-org/synapse#1278 as related to this issue.

@Mikaela
Copy link

Mikaela commented Aug 17, 2022

If we make this configurable on the HS then we're risking splitting the ecosystem a bit. Having such a visible and noticeable difference when using the same client on different servers might cause people to think that things aren't as seamless as we'd want. Especially if we allow animated GIFs as avatars (which I think discord allows?)

This has already happened a long time ago, some client+server+media-repo combinations result in users seeing animated avatars without an option to stop them leading into accessibility issue while others like Nheko intentionally don't play them at all.

@phirz
Copy link

phirz commented Aug 27, 2023

(Animated) vector graphics -- particularly Lottie as used by Telegram (stickers) and suggested in #9 -- are probably out of discussion for the same reason as SVG?

@ara4n
Copy link
Member Author

ara4n commented Jan 26, 2024

element-hq/element-x-ios#2374 is a good example of this biting people in the wild - iOS matrix clients will silently fail to play back VP9 codec video muxed in MP4s (despite iOS having a VP9 codec for WebRTC, thanks to Apple trying to push H.265 instead).

So, rather than saying "uh, send whatever format a typical browser should be able to play", i really do wonder if we should formally say what formats are supported in detail - obligating the sender to transcode, or send as a m.file (or worst case have the receiver transcode, although that opens up a huge range of attack surface on the receiver if you can send them e2ee media which pwns libavcodec or whatever they use to transcode).

@MTRNord
Copy link
Contributor

MTRNord commented Jan 26, 2024

element-hq/element-x-ios#2374 is a good example of this biting people in the wild - iOS matrix clients will silently fail to play back VP9 codec video muxed in MP4s (despite iOS having a VP9 codec for WebRTC, thanks to Apple trying to push H.265 instead).

So, rather than saying "uh, send whatever format a typical browser should be able to play", i really do wonder if we should formally say what formats are supported in detail - obligating the sender to transcode, or send as a m.file (or worst case have the receiver transcode, although that opens up a huge range of attack surface on the receiver if you can send them e2ee media which pwns libavcodec or whatever they use to transcode).

I wonder if a nicer alternative would be to allow sending multiple Formats. Similar to how html5 handled it these days. That way the client can pick one that it might best be able to display.

@kevincox
Copy link

kevincox commented Jan 26, 2024

I would try to avoid multiple formats because in e2ee rooms that would mean the sender needs to upload multiple formats. When on mobile on a battery and a potentially expensive connection it is best not to have to transcode and upload multiple formats.

The best option is probably making a fairly short list of supported formats and occasionally a new one can be added. But ideally it would be added in advance and clients would avoid actually sending in the new format until it is likely that most receivers have updated their client with support.

@HarHarLinks
Copy link
Contributor

how feasible would it be for matrix as an open standard to define formats based on open standards as required for these media types? there might be locked down platforms that stick to their proprietary and closed and not widely adopted formats (looking at apple), and the fair thing to do seems to not disadvantage platforms that can't support these proprietary formats, but instead force clients on platforms not natively supporting these formats to work around it themselves (such as ios apps shipping their own codecs to read and write them if necessary). yes, the best case is for everyone to play nice, but there will always be the odd ones being disruptive in a bad way, and i don't think an open protocol can continue to call itself that if a hard-ish dependency is not at least open-ish.

@turt2live turt2live self-assigned this Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Client-Server Issues affecting the CS API enhancement A suggestion for a relatively simple improvement to the protocol
Projects
None yet
Development

Successfully merging a pull request may close this issue.