Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avcodec: add remove_dovi and remove_hdr10plus option to hevc,av1_metadata bsf #480

Merged
merged 6 commits into from
Oct 16, 2024

Conversation

gnattu
Copy link
Member

@gnattu gnattu commented Oct 14, 2024

This adds two options, remove_dovi and remove_hdr10plus, to the hevc_metadata bitstream filter. These options are useful in cases where certain types of dynamic metadata trigger player bugs, such as on some smart TVs and set-top boxes. With these options, clients can have the video remuxed with the selected dynamic metadata removed.

For example:

./ffmpeg -i 'hdr10+.mkv' -bsf:v hevc_metadata=remove_hdr10plus=1 -c:v copy -c:a copy -tag:v hvc1 hdr10.mkv
./ffmpeg -i 'dovi.mp4' -bsf:v hevc_metadata=remove_dovi=1 -c:v copy -c:a copy -tag:v hvc1 hdr10.mp4

One thing to note is that current ffmpeg implementation of matroska is bugged where it will always attempt to carry over dolby vision metadata when the input has one, and if that is removed by this bsf it will complain and exit. Fixing the mkv handling is out of the scope of this PR. The workaround is to use mp4 as target container when stripping dovi meatadata.

The upstream made an overhaul on the dovi rpu related code so this is not useful in the future when we upgrade to that version. A back-port might be too heavy as that is a big refactor.

Changes

Issues

@gnattu gnattu requested a review from a team October 14, 2024 23:49
@nyanmisaka
Copy link
Member

Are this changes still relevant when we bump to FFmpeg 7.1+? The newly added dovi_rpu_bsf handles not only HEVC but also AV1.

Regarding dropping HDR10+ from HEVC, does using filter_units=remove_types=39 /*HEVC_NAL_SEI_PREFIX*/ already meet our needs?

If we are not in a rush to use this in the upcoming JF 10.10 we can wait for FFmpeg 7.1.x. Otherwise we will have to continue adding bsf option detection code to take advantage of these two custom options.

@gnattu
Copy link
Member Author

gnattu commented Oct 15, 2024

The newly added dovi_rpu_bsf handles not only HEVC but also AV1.

That one needs slightly more tweak because that only removes the RPU but it will keep the EL, which may still cause compatibility issues. If we are going to upgrade to that version soon then we can modify that one instead.

Regarding dropping HDR10+ from HEVC, does using filter_units=remove_types=39 /HEVC_NAL_SEI_PREFIX/ already meet our needs?

This could be too aggressive as HEVC_NAL_SEI_PREFIX is not used exclusively for HDR10+ metadata. My current implementation is already a minimal implementation that checks the NAL really contains HDR10+ metadata before removal, but the most correct implementation is to check if the NAL contains other stuff and copy those back into a new NAL which will be more complicated. The use cases requires the "most correct implementation" is rare, but SEI used for things other than HDR10+ is more common.

@gnattu
Copy link
Member Author

gnattu commented Oct 15, 2024

Otherwise we will have to continue adding bsf option detection code to take advantage of these two custom options.

We have to do that anyway because the upstream filter does not fit our needs. Even if we upstreamed our changes we still need to do the checks because a release contains our bsf code will unlikely land before 10.11.

The more problematic part using those is how we are handling range. Both from the probe side and the playback side. HDR10+ metadata is unavailable until you really decode one of the frame to get the side data but ffprobe does not have clean way to stop at desired location. Our current data structure also cannot handle the case where Dolby Vision and HDR10+ metadata coexisting in the same file. The clients cannot accurately report the actual capability either. For example, if it can correctly handle Dolby Vision profile 7 fallback, does it allow out of spec profile 8.6 playback, can it handle dovi and hdr10+ coexisting and so on. So we do have a lot of more work to use these bsf filters.

@gnattu
Copy link
Member Author

gnattu commented Oct 15, 2024

https://ffmpeg.org//pipermail/ffmpeg-devel/2024-October/334998.html

I submitted the patch which will also remove EL to upstream without changing the CLI interface. Hope that could get accepted so that we can maintain one less bsf.

@nyanmisaka
Copy link
Member

Found a method to extract the HDR10+ metadata info from the very first packet/frame.

ffprobe -hide_banner -v quiet -strict -2 -print_format json -i "hdr10plus.mkv" -select_streams v -show_frames -show_entries frame=side_data_list -read_intervals 0%+#1
Result
{
    "frames": [
        {
            "side_data_list": [
                {
                    "side_data_type": "Mastering display metadata",
                    "red_x": "34000/50000",
                    "red_y": "16000/50000",
                    "green_x": "13250/50000",
                    "green_y": "34500/50000",
                    "blue_x": "7500/50000",
                    "blue_y": "3000/50000",
                    "white_point_x": "15635/50000",
                    "white_point_y": "16450/50000",
                    "min_luminance": "1/10000",
                    "max_luminance": "10000000/10000"
                },
                {
                    "side_data_type": "Content light level metadata",
                    "max_content": 1000,
                    "max_average": 168
                },
                {
                    "side_data_type": "HDR Dynamic Metadata SMPTE2094-40 (HDR10+)",
                    "application version": 1,
                    "num_windows": 1,
                    "targeted_system_display_maximum_luminance": "400/1",
                    "maxscl": "0/100000",
                    "maxscl": "0/100000",
                    "maxscl": "0/100000",
                    "average_maxrgb": "0/100000",
                    "num_distribution_maxrgb_percentiles": 9,
                    "distribution_maxrgb_percentage": 1,
                    "distribution_maxrgb_percentile": "0/100000",
                    "distribution_maxrgb_percentage": 5,
                    "distribution_maxrgb_percentile": "0/100000",
                    "distribution_maxrgb_percentage": 10,
                    "distribution_maxrgb_percentile": "100/100000",
                    "distribution_maxrgb_percentage": 25,
                    "distribution_maxrgb_percentile": "0/100000",
                    "distribution_maxrgb_percentage": 50,
                    "distribution_maxrgb_percentile": "0/100000",
                    "distribution_maxrgb_percentage": 75,
                    "distribution_maxrgb_percentile": "0/100000",
                    "distribution_maxrgb_percentage": 90,
                    "distribution_maxrgb_percentile": "0/100000",
                    "distribution_maxrgb_percentage": 95,
                    "distribution_maxrgb_percentile": "0/100000",
                    "distribution_maxrgb_percentage": 99,
                    "distribution_maxrgb_percentile": "0/100000",
                    "fraction_bright_pixels": "0/1000",
                    "knee_point_x": "0/4095",
                    "knee_point_y": "0/4095",
                    "num_bezier_curve_anchors": 9,
                    "bezier_curve_anchors": "102/1023",
                    "bezier_curve_anchors": "205/1023",
                    "bezier_curve_anchors": "307/1023",
                    "bezier_curve_anchors": "410/1023",
                    "bezier_curve_anchors": "512/1023",
                    "bezier_curve_anchors": "614/1023",
                    "bezier_curve_anchors": "717/1023",
                    "bezier_curve_anchors": "819/1023",
                    "bezier_curve_anchors": "922/1023"
                },
                {
                    "side_data_type": "Dolby Vision RPU Data"
                },
                {
                    "side_data_type": "Dolby Vision Metadata",
                    "rpu_type": 2,
                    "rpu_format": 18,
                    "vdr_rpu_profile": 1,
                    "vdr_rpu_level": 0,
                    "chroma_resampling_explicit_filter_flag": 0,
                    "coef_data_type": 0,
                    "coef_log2_denom": 23,
                    "vdr_rpu_normalized_idc": 1,
                    "bl_video_full_range_flag": 0,
                    "bl_bit_depth": 10,
                    "el_bit_depth": 10,
                    "vdr_bit_depth": 12,
                    "spatial_resampling_filter_flag": 0,
                    "el_spatial_resampling_filter_flag": 1,
                    "disable_residual_flag": 0,
                    "vdr_rpu_id": 0,
                    "mapping_color_space": 0,
                    "mapping_chroma_format_idc": 0,
                    "nlq_method_idc": 0,
                    "nlq_method_idc_name": "linear_dz",
                    "num_x_partitions": 2047,
                    "num_y_partitions": 1,
                    "components": [
                        {
                            "pivots": "0 128 256 384 512 640 768 896 1023",
                            "pieces": [
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "4948 8388608"
                                },
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "9896 8349022"
                                },
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "0 8388608"
                                },
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "0 8388608"
                                },
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "0 8388608"
                                },
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "0 8388608"
                                },
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "0 8388608"
                                },
                                {
                                    "mapping_idc": 0,
                                    "mapping_idc_name": "polynomial",
                                    "poly_order": 1,
                                    "poly_coef": "0 8388608"
                                }
                            ],
                            "nlq_offset": 512,
                            "vdr_in_max": 1048576,
                            "linear_deadzone_slope": 2048,
                            "linear_deadzone_threshold": 0
                        },
                        {
                            "pivots": "0 1023",
                            "pieces": [
                                {
                                    "mapping_idc": 1,
                                    "mapping_idc_name": "mmr",
                                    "mmr_order": 3,
                                    "mmr_constant": 0,
                                    "mmr_coef": "0 6391320 0 0 0 0 0 0 3195660 0 0 0 0 0 0 1597830 0 0 0 0 0"
                                }
                            ],
                            "nlq_offset": 512,
                            "vdr_in_max": 1048576,
                            "linear_deadzone_slope": 2048,
                            "linear_deadzone_threshold": 0
                        },
                        {
                            "pivots": "0 1023",
                            "pieces": [
                                {
                                    "mapping_idc": 1,
                                    "mapping_idc_name": "mmr",
                                    "mmr_order": 3,
                                    "mmr_constant": 418,
                                    "mmr_coef": "53 427 6391320 26 26 213 13 3 213 3195660 0 0 53 0 0 106 1597830 -210 -210 13 0"
                                }
                            ],
                            "nlq_offset": 512,
                            "vdr_in_max": 1048576,
                            "linear_deadzone_slope": 2048,
                            "linear_deadzone_threshold": 0
                        }
                    ],
                    "dm_metadata_id": 0,
                    "scene_refresh_flag": 1,
                    "ycc_to_rgb_matrix": "9574/8192 0/8192 13802/8192 9574/8192 -1540/8192 -5348/8192 9574/8192 17610/8192 0/8192",
                    "ycc_to_rgb_offset": "16777216/268435456 134217728/268435456 134217728/268435456",
                    "rgb_to_lms_matrix": "7222/16384 8771/16384 390/16384 2654/16384 12430/16384 1300/16384 0/16384 422/16384 15962/16384",
                    "signal_eotf": 65535,
                    "signal_eotf_param0": 0,
                    "signal_eotf_param1": 0,
                    "signal_eotf_param2": 0,
                    "signal_bit_depth": 12,
                    "signal_color_space": 0,
                    "signal_chroma_format": 0,
                    "signal_full_range_flag": 1,
                    "source_min_pq": 7,
                    "source_max_pq": 3079,
                    "source_diagonal": 42
                }
            ]
        }
    ]
}

@gnattu
Copy link
Member Author

gnattu commented Oct 15, 2024

Found a method to extract the HDR10+ metadata info from the very first packet/frame.

This is still not ideal because now we need to run ffprobe one more time to probe for audio and extra data. I thought about something similar as well but I don't like the idea of running ffprobe multiple times.

@nyanmisaka
Copy link
Member

Found a method to extract the HDR10+ metadata info from the very first packet/frame.

This is still not ideal because now we need to run ffprobe one more time to probe for audio and extra data. I thought about something similar as well but I don't like the idea of running ffprobe multiple times.

You don't need to run it again, -show_frames and -show_streams can coexist in one ffprobe command.

@gnattu
Copy link
Member Author

gnattu commented Oct 15, 2024

You don't need to run it again, -show_frames and -show_streams can coexist in one ffprobe command.

with show_streams and not specifying -select_streams v the packet you read might be the audio packet or no packet at all with the specified interval.

@nyanmisaka
Copy link
Member

You don't need to run it again, -show_frames and -show_streams can coexist in one ffprobe command.

with show_streams and not specifying -select_streams v the packet you read might be the audio packet or no packet at all with the specified interval.

You may be right but I don't have such file (audio pkt muxed before video pkt) to verify, so reading only one pkt is not able to get valid information.

@nyanmisaka
Copy link
Member

nyanmisaka commented Oct 15, 2024

For DoVi we can use the upstream dovi_rpu_bsf and your patch, but for HDR10+ we need a custom option anyway.

I found an official sample of AV1/HDR10+ from AOMediaCodec, maybe it's worth adding something similar to av1_metadata_bsf? See also FFmpeg/FFmpeg@d6d5765

@gnattu
Copy link
Member Author

gnattu commented Oct 16, 2024

Both remove_dovi and remove_hdr10plus is now ported to av1_metadata bsf in case the upstream has no interest on our patch.

@gnattu gnattu changed the title avcodec: add remove_dovi and remove_hdr10plus option to hevc_metadata bsf avcodec: add remove_dovi and remove_hdr10plus option to hevc,av1_metadata bsf Oct 16, 2024
@gnattu gnattu merged commit 84a0ae4 into jellyfin Oct 16, 2024
27 checks passed
@gnattu gnattu deleted the remove-hdr10plus-dovi-bsf branch October 16, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants