Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split yuv into its own pseudo codec #328

Closed
totaam opened this issue Apr 30, 2013 · 24 comments
Closed

split yuv into its own pseudo codec #328

totaam opened this issue Apr 30, 2013 · 24 comments

Comments

@totaam
Copy link
Collaborator

totaam commented Apr 30, 2013

Issue migrated from trac ticket # 328

component: core | priority: minor | resolution: fixed

2013-04-30 15:23:44: totaam created the issue


Why?:

  • cleaner, the code in the 264/vpx codecs is getting messy
  • we don't necessarily need yuv when using x264/vpx with the opengl client (though we would need to tweak the window backing logic to always choose opengl windows if yuv2rgb is not available)
  • make easier to tune the colourspace conversion step (downsampling/scaling etc)

Attached is a test patch (does not even compile) which shows how to get started. I don't have time to take this further right now..

@totaam
Copy link
Collaborator Author

totaam commented Apr 30, 2013

2013-04-30 15:24:09: totaam uploaded file yuv-codec.patch (15.5 KiB)

work in progress yuv codec

@totaam
Copy link
Collaborator Author

totaam commented Apr 30, 2013

2013-04-30 15:39:31: ahuillet commented


I think the "co" should also be split from the "dec".

@totaam
Copy link
Collaborator Author

totaam commented May 4, 2013

2013-05-04 14:57:20: antoine changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented May 4, 2013

2013-05-04 14:57:20: antoine changed owner from ** to antoine

@totaam
Copy link
Collaborator Author

totaam commented May 4, 2013

2013-05-04 14:57:20: antoine commented


splitting encoder from decoders is done (that part was very easy) in r3265

@totaam
Copy link
Collaborator Author

totaam commented May 17, 2013

2013-05-17 14:36:24: antoine uploaded file yuv-codec-v2.patch (9.6 KiB)

better patch - no longer using a yuvlib, all in cython

@totaam
Copy link
Collaborator Author

totaam commented May 26, 2013

2013-05-26 10:40:44: ahuillet commented


The plan is: separate video backends into modules -

csc_* for colorspace conversion (csc_swscale, csc_nvcuda coming one day)
enc_* for encoding (enc_x264)
dec_* for decoding (dec_avcodec)

The patches on the C parts are done, with an improved API, and kept at https://github.com/ahuillet/video_backends.

Now we need to re-do the .pyx interfaces to make this work again.

@totaam
Copy link
Collaborator Author

totaam commented May 26, 2013

2013-05-26 10:58:59: ahuillet commented


Quirks:

  • csc_ is able to support any colorspace, but enc_ and dec_ have not had that piece of functionality readded yet.
  • the logic for changing colorspace has been removed from enc_x264 because it's not its place, it will have to be reimplemented in python

Memory management:

  • you allocate a context with init_* functions like before, you clean up the context with free_* functions like before, but those functions do not free the context, you have to call free(context) manually
  • pass { NULL, NULL, NULL } to csc_image functions - they allocate their own memory, you free it by calling free_csc_image({})
  • enc_x264 allocate its own memory and you don't free it afterwards (it reuses the same buffer over and over again)
  • I am not sure about dec_avcodec, will check.

@totaam
Copy link
Collaborator Author

totaam commented May 26, 2013

2013-05-26 16:18:58: totaam commented


We also need to deal with reads beyond the size of the buffer caused by swscale reading 8 (or more?) bytes at a time - valgrind output:

==8159## Thread 6:8159## Invalid read of size 88159##    at 0xF7580E1: ??? (in /usr/lib64/libswscale.so.2.1.101)8159##    by 0xF750038: ??? (in /usr/lib64/libswscale.so.2.1.101)8159##    by 0xF756465: sws_scale (in /usr/lib64/libswscale.so.2.1.101)8159##    by 0xFFFC441: csc_image_rgb2yuv (x264lib.c:502)8159##    by 0xFFF71B5: __pyx_pw_4xpra_6codecs_4x264_7encoder_7Encoder_25compress_image (encoder.c:3033)8159##    by 0x35B1EDD280: PyEval_EvalFrameEx (ceval.c:4098)8159##    by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184)8159##    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)8159##    by 0x35B1E6D925: function_call (funcobject.c:526)8159##    by 0x35B1E49C0D: PyObject_Call (abstract.c:2529)8159##    by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411)8159##    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)8159##  Address 0x1acea03c is 630,732 bytes inside a block of size 630,736 alloc'd8159##    at 0x4A0887C: malloc (vg_replace_malloc.c:270)8159##    by 0x35B3A287BB: XGetImage (GetImage.c:82)8159##    by 0x158E52C6: __pyx_pw_4xpra_3x11_7gtk_x11_12gdk_bindings_13PixmapWrapper_3get_image (gdk_bindings.c:6555)8159##    by 0x35B1E49C0D: PyObject_Call (abstract.c:2529)8159##    by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411)8159##    by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184)8159##    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)8159##    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)8159##    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)8159##    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)8159##    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)8159==    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)

@totaam
Copy link
Collaborator Author

totaam commented May 27, 2013

2013-05-27 12:31:28: ahuillet commented


New backend infrastructure WIP at https://github.com/ahuillet/xpra/commits/

Things to do :

  • re-add nogil
  • fix xmemalign() definition and calls
  • colorspace other than 420 need to be de-hardcoded
  • OpenGL backend seems to go through CSC anyway
  • packaging, debs, rpms - patches directory
  • win32 build
  • vpx
  • feed RGB to x264 directly instead of YUV444
  • quality # 100> ultrafast in x264

@totaam
Copy link
Collaborator Author

totaam commented May 28, 2013

2013-05-28 10:09:59: totaam uploaded file yuv-codec-v3.patch (93.7 KiB)

latest work in progress

@totaam
Copy link
Collaborator Author

totaam commented May 31, 2013

2013-05-31 12:28:41: totaam commented


Please pull the latest changes I've sent, updated list of remaining issues (help needed - although not all of these issues need to be dealt with before we merge into trunk, most of them will need doing at some point):

  • edge resistance: take current setting into account, so the current one (or similar ones that re-use csc or video encoder) will get a score boost
  • fire pipeline changes from timer thread if new best option wins (set new quality/speed then re-evaluate pipeline)
  • move ImageWrapper to global location (no longer a server-only class)
  • free_csc is a simple free(),s o no real need for the currently very ugly code: merge with XImageWrapper code which does do a free()
    Also, maybe we want/have to use separate memaligned buffers for each plane?
  • handle scaling and odd dimensions problems
  • expose dimensions mask (0xFFFE is for x264 only?)
  • BGRA does not work for vpx or x264!?: segfault! - disabled by #ifdef for now (may also need rowstride work for uneven sizes?) - and even if it does work, this also affects:
  • with other csc modes than YUV420P, YUV422P, YUV444P, say BGRA: what should we expose to the client? does it care about the mode that was used or about the actual pixel dimensions used during compression. maybe we should not expose the option "csc=YUV422P" but something like: "csc_divs=(1,1),(2,1),(2,1)" instead? and maybe this ties with scaling too
  • related to above: old clients without "encoding_client_options" option could still parse pixel data as long as the divs are the same? (not that we would want to care and handle this particular case - just a note)
  • ensure we zero copy pixels instead of copying from decoder/encoder, then:
  • restore nogil: verify that we are passing the same pointers via string wrappers and that we are allowed to manipulate them without the gil.
  • fix opengl (no idea why the output is blank - the yuv codepath works for the non-gl code..)
  • swscale: could be tuned using speed/quality too: use FAST_BILINEAR if quality=0 and speed=100, do re-init in set_quality/set_speed - which could be combined as: configure(quality, speed)
  • think about target bitrate mode - no real difficulty there?
  • vpx: use vpx_img_wrap to ensure we don't copy the pixels
  • client side: use similar code to choose the decoder path, and maybe choose window backing (acceleration apis generally require direct access to the backing window)
  • dec_avcodec should be able to handle vpx: make the decoder export the list of codec formats it handles (x264, vpx, etc)
  • max width and height for csc (ffmpeg ML related message), vpx and x264: find the actual limits from somewhere (API?) rather than hard-coding to 4k*4k
  • rename x264 to h264 as encoding option (and keep "x264" in encodings for backwards compatibility)
  • move memalign to lib with pxd for importing into python code when we create image buffers
  • encoder/csc init: probe and maybe test speed (also useful to test it does work!) - ie: could test csc a 1k*1k random/test buffer to figure out latency
  • vpx cannot do speed or quality for now: we should skip the calculations! (keep caps telling us what we can do for the current encoder)
  • maybe add command line options for enabling/disabling/prioritizing encoder backends
  • if window size has changed just a little, we could send the edges as per the one-pixel workaround (which may well need generalizing anyway when dealing with thins like scaling), at least a few times before doing the full re-init which sends a new (expensive) iframe. And by that point the window may have been resized some more - so we don't lose video delta benefits.
  • the encoder specs should be static: the pipeline class should have a single copy of all encoder specs for all window sources. Then:
  • the encoder specs can have guesstimates and for some enc/csc, and runtime values too: we return objects, so these can be updated continuously by the encoder/csc
  • setup cost should be dynamic: decrease as hardware encoding availability decreases (so we then become even less likely to use hardware for small windows)
  • rename rgb_format to pixel_format

@totaam
Copy link
Collaborator Author

totaam commented May 31, 2013

2013-05-31 14:13:20: ahuillet commented


Pull - from where?

@totaam
Copy link
Collaborator Author

totaam commented Jun 3, 2013

2013-06-03 12:55:17: totaam commented


Mostly done in:

Not closing this ticket yet, what is left to do:

  • handle dimensions restrictions generically, and for scaled down dimensions too: expose it as an encoder spec property (so x264 can have its even window dimensions)
  • move ImageWrapper and maybe refactor it with the XImageWrapper
  • zero-copy pixels in and out (also look at vpx_img_wrap for vpx)
  • nogil option on non-python buffers
  • let client backing class tell the server if it can use scaling (currently just per client)
  • let client backing class tell the server which csc modes are allowed (currently just per client)
  • handle scaling in GL client code
  • handle csc client side for GL rendering (as GL cannot handle ARGB - but we may still want to use ARGB server side to reduce overall latency)

And maybe also:

  • swscale could tune the speed/flags at runtime by doing a full re-init (which is fast/cheap)
  • handle transient encoder failures: mark for retry/not
  • when trying out new pipelines, keep the old one around in case we need to fallback to it (the new one does not work out)

@totaam
Copy link
Collaborator Author

totaam commented Jun 6, 2013

2013-06-06 16:46:02: totaam changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Jun 6, 2013

2013-06-06 16:46:02: totaam changed owner from antoine to ahuillet

@totaam
Copy link
Collaborator Author

totaam commented Jun 6, 2013

2013-06-06 16:46:02: totaam commented


More improvements::

  • imagewrapper changes in r3552
  • nogil restored in r3572
  • less pixel copying in r3570 and r3568
  • handle size restrictions generically r3588
    and lots of cleanups in between.
    [[BR]]

Which only leaves needing fixing for this ticket:

Other things we can deal with later (see also lists in previous comments):

  • finding real csc/encoder limits at runtime (would be worth at least testing what works and what doesn't to make sure the codec_spec is more or less accurate - finding out later when we try to use it is more expensive)
  • client_backing could tell the server which csc modes it will handle (alternative to fixing gl backing to handle RGBP)

@totaam
Copy link
Collaborator Author

totaam commented Jun 10, 2013

2013-06-10 13:58:07: totaam commented


and... scaling brought back an old bug:

non-existing PPS 0 referenced

After spending a lot of time debugging and inspecting the server-side code (which looks fine), I managed to reproduce it with client side debugging switched on and found:

paint_with_video_decoder: window dimensions have changed from (421, 374) to (420, 374)
paint_with_video_decoder: new <type 'xpra.codecs.dec_avcodec.decoder.Decoder'>(420,374,YUV420P)
paint_with_video_decoder: info={'width': 420, 'colorspace': 'YUV420P', 'type': 'x264', 'height': 374}
[h264 @ 0x7f2b8836af00] non-existing PPS 0 referenced
[h264 @ 0x7f2b8836af00] decode_slice_header error
[h264 @ 0x7f2b8836af00] no frame!
Error while decoding frame
2013-06-10 19:04:51,472 draw error
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 1004, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 258, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 348, in draw_region
    self.paint_with_video_decoder(x264_Decoder, "x264", img_data, x, y, width, height, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 258, in paint_with_video_decoder
    coding, len(img_data), width, height, options))
Exception: paint_with_video_decoder: x264 decompression error on 1044 bytes of picture data for 420x374 pixels, \
    options={'quality': 29, 'frame': 2, 'speed': 100, 'csc': 'YUV420P'}

For the record, what is happening here is that we were sending the raw dimensions of the XImageWrapper we encode from as being the dimensions we can paint with, and these new dimensions could get clipped by x264's size restrictions (even dimensions only) to the same size as the current encoder pipeline (ie: when resizing a window to a size one pixel shorter) which means the current encoding pipeline was still valid, but this caused the client side code to erroneously think that it needed a new decoder pipeline to deal with the packet... (cleaning the old pipeline, causing the new video packet to reference a frame that has been discarded).
A somewhat ugly solution is to pass the real dimensions via client_options, see attached patch. A better solution will be committed.

@totaam
Copy link
Collaborator Author

totaam commented Jun 10, 2013

2013-06-10 13:58:44: totaam uploaded file actual-video-size.patch (2.2 KiB)

use the correct video size when decoding (hackish solution)

@totaam
Copy link
Collaborator Author

totaam commented Jun 17, 2013

2013-06-17 13:22:12: totaam changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Jun 17, 2013

2013-06-17 13:22:12: totaam changed resolution from ** to fixed

@totaam
Copy link
Collaborator Author

totaam commented Jun 17, 2013

2013-06-17 13:22:12: totaam commented


The correct solution for the client decoding error was merged in r3607
(note: this was not caused by scaling after all)

Also merged a workaround for GL window's lack of RGB support in r3612 (+ r3617 for OR windows)

As for finding the encoder limits, we'll deal with that when we find problems.
(note: this needs to go in the codec_spec rather than finding out too late when we try to instantiate a new pipeline)

This is enough for now, will follow up in #350, etc

@totaam totaam closed this as completed Jun 17, 2013
@totaam
Copy link
Collaborator Author

totaam commented Jul 18, 2013

2013-07-18 06:28:24: antoine commented


Note: all the csc/encoder info is now exported via xpra info:

window[2].csc=swscale
window[2].csc.dst_format=YUV444P
window[2].csc.dst_height=710
window[2].csc.dst_width=1364
window[2].csc.flags=BICUBLIN | SWS_ACCURATE_RND
window[2].csc.frames=6
window[2].csc.pixels_per_second=70503532
window[2].csc.src_format=BGRX
window[2].csc.src_height=710
window[2].csc.src_width=1364
window[2].csc.total_time_ms=82
window[2].encoder=x264
window[2].encoder.frames=6
window[2].encoder.height=710
window[2].encoder.pixels_per_second=23144142
window[2].encoder.preset=medium
window[2].encoder.profile=high444
window[2].encoder.src_format=YUV444P
window[2].encoder.total_time_ms=251
window[2].encoder.width=1364

[[BR]]
And we also have the scores for figuring out what other encoding/csc options were available and how they ranked:

window[3].pipeline_option[0].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[0].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[0].format=YUV444P
window[3].pipeline_option[0].score=48
window[3].pipeline_option[1].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[1].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[1].format=YUV422P
window[3].pipeline_option[1].score=44
window[3].pipeline_option[2].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[2].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[2].format=YUV420P
window[3].pipeline_option[2].score=26

How they are ranked will depend on the current quality and speed settings which is also available:

window[2].quality.cur=62
window[2].speed.cur=34

@totaam
Copy link
Collaborator Author

totaam commented May 19, 2014

2014-05-19 13:38:54: totaam commented


(setting correct milestone the work was completed in)

@totaam totaam mentioned this issue Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant