-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thSAD are blocksize dependent for dct>0 #20
Comments
Thanks. For the first part: thSAD parameter should be given as if you give it for 8x8 blocksize of a 8 bit video. Easy to use, you don't have to give different thSADs for similar desired effect, regardless of block sizes. How it works internally: right before the analysis the thSAD values will be recalculated (scaled) to the actual blocksize. Moreover bit depth and even chroma subsampling is involved in calculation when chroma=true is given. Thus a thSAD=400 will become 400*(4*4)*256 for BlkSize=32 (32/8=4 for X and Y as well), and 16 bit video (16 bit/8 bit = 256). For a 4:2:0 video it will be multiplied by an additional 1.5 factor. As for dct: at modes 1..4 for some reason there is an nBlkSizeX/2 factor which is used when DCTBytes2D is used before. Either this is an empirical constant put there by the original author, Fizick, which seemed to work at times long ago, when practically the only supported blocksize was 8. Or: after the fftw_execute_r2r transformation the data is really scaled somehow with the 1/horizontal size (?? yes I know, why only horizontal, but who knows?). I should check it with manipulated input block contents to see if there is any difference in the magnitude of resulting blocks. If it does not scale, the *nBlkSize/2 should be replaced with *8/2 because blksize 8 was the default since the beginnings. But if the experiment shows that the result somehow is proportional to the 1/nBlkSizeX then it should remain. |
I ran into this issue by accident trying out MDegrain with dct=1. To have the degraining strength independent of nBlkSizeX I needed to scale thSAD1, thSAD2 and thSCD1 with nBlkSizeX. And thanks for the summery I never catched this detail about subsampled inputs. I probably will have to take that into account when processing inputs subsampled diffrently from 4:2:0... |
Interesting. Until I check the issue (is the dct 1..4 internal multiplication proper across very different X blocksizes), you can try playing with different thSAD values but that should be I think only a temporary workaround. As I wrote you normally don't have to bother the thSAD scaling, because it purposely should behave similarly for the same thSAD parameter whatever blocksize or subsampling you are using. |
My tests have shown that to have the 3 distinct modes of dct working comparably (dct=0, dct=1, dct=5), I needed to multiply the 3 thresholds by the following factors: Or to say it the other way around: The resulting return value for every call to the function PlaneOfBlocks::LumaSADx must be divided by these factors such that the thresholds can stay the same. And this is in fact the way I tested. |
dct=5 is a separate algorithm, uses SATD, see https://en.wikipedia.org/wiki/Sum_of_absolute_transformed_differences |
yes, I know all these are different metrics. And while your assessment is of course very valid, the metrics are not different by just factors, it is still helpful to find out by how much they are typically apart in scenarios where one would not expect much visual difference. The frequency domain metrics (DCT and SATD) were introduced because SAD has strong issues with recognizing similar blocks while luminosity changes. Thus in scenes where luminosity is constant one would expect the visual result to be very similar on all three metrics. From my tests with hundreds of such frames I could see that indeed these factors are relatively constant when luminosity does not change. Tuning the factors on bright scenes with constant luminosity will then be useful such that it would not be given in the hand of the user to guess how he needs to change the thresholds just because he decided he wants to change the metric. |
To sum up: as long as the different metrics agree on a block being unchanged, they tend to always be apart by the same relative factors. These factors diverge only once the metrics disagree on whether a block is unchanged. |
I wonder if I graph the normal sad vs dct sad pairs for different clip types (normal home video, anime, etc), will I see some kind if clear correlation and establish a formula or formulas, at least around the "zero"? And does it depend on block size and how? (It's clear that dct sad of a full 0 and full 255 valued 8x8 block is zero for all block sizes, not counting the 0th DC element of course. I couldn't see clear correlation for different blocksizes when I created blocks for "synthetic" tests: block1=0,4,8,12,0,4,8,12,etc... block2=4,8,12,0,4,8,12,0... |
Let me sum up my knowlege of what my understanding of the 3 metrics is. Just to be sure we're on the same page here:
Now what I argue is that if one can say that thSCD1=400 (the default and which means something like every pixel changes its value by 6 units) makes sense for metric 1 then it must be possible to have a setting for thSCD1 when using metric 2 or 3 such that the threshold is sensitive to a similar amount of visual difference. I found the factors of typical relative levels of the 3 metrics for block which have not changed (by the thSCD=400 definition) by dumping the outputs of PlaneOfBlocks::LumaSADx for all 3 metrics in hundreds of frames to text files and comaring them. Of course, it would be necessary to make sure that the source does not have a too big influence in this. Which I have to admit, could very much be the case. Anime could turn out to behave very differently to film. I am not sure whether synthetic measurements with white noise are the way to go. Since by the very defintion of white noise, this means that each block has changes above every definable threshold of similarity. This is basically the regime were one would expect the methods to diverge a lot. As I said in my previous comments two days ago, the factors can only be expected to be somewhat stable in the regime of "zero" changes. Thus if you want to go on with white noise, I would turn it down in luminosity to a level were one would expect to see usual real life values of noise. Perhaps it is actually interesting to see what happens with the variation of the factors as funtions of the noise intensity. |
Oh thinking about it, I can put numbers on how dim the noise must be. The liminosity of the noise must be lower than the thSCD1 setting, such that one would expect the "SAD" metric to remain under the threshold. Thus using the default of thSCD1=400 the luminosity of the noise must be such that the average per pixel difference is below 400/64=6.25 Testing just with white noise is probably a good approximation for anime, which tends to have huge uniform colors. So here, that would be the idea for a test: Use white noise of a tweakable average luminosity on top of the backgrounds of the DCT coefficients (link) which should also be of a tweakable luminosity. Then one can look at the different results in these three dimensions: background pattern, background luminosity, white noise luminosity. This should be a mathematically complete description for non changing blocks. |
Hm I am still stuck on the question whether the 0th component is actually used in DCT. The more I think about this issue the more I come to the conclusion that it is probably used. Since in a way it is a measure of similarity. And if it was discarded the matrix would not be quadratic anymore... |
(I'm not lost just busy). White noise was just an experiment, I had to refresh my mind on the whole DCT topic and how it is handled in mvtools. For 8x8 there is a faster, integer DCT algorithm, other blocksizes work in float and use the fftw library. Float versions have int-float-int conversion, which makes the result similat to the integer dct version. Finally we'll have a signed 8 bit (10,12,14,16 bit) result. DC component is halved, the other components have an 1/sqrt(2) factor. Then after dctshift0 and dctshift, zero level of the result is moved by adding 128 (for 8 bits). https://github.com/pinterf/mvtools/blob/mvtools-pfmod/Sources/DCTFFTW.cpp#L376 and https://github.com/pinterf/mvtools/blob/mvtools-pfmod/Sources/DCTFFTW.cpp#L394 So in brief: the final integer result (AC component) will have a * (1/sqrt(2)) / (blkSizeX * BlkSizeY) + 128 conversion. Now I can see that you made a serious result dump and the a comparison I wanted to do - I didn't know before, how far you reached. |
Shift right by dctshift is the same as division by (blockSizeX * blockSizeY), for power-of-2 blocksizes it worked fine but for a 12x12 block it does't. The reason of the normalization is to fit the result to the 8 bit range. Then, after SADing the dct'd blockes, the mul-by-blksizeX-div-2 should be mul-by-sqrt(blockSizeX/2 * blockSizeY/2) which would work for non-square blocks. BTW these normalization factors are different when we use FFT3W dct or e.g. matlab dct, it's a free choice depending our practical needs. But all this keeps the question open how to choose a proper metric which correctly describes the block difference, for detecting best block match or detecting scene change and a usable weight for example mdegrain. |
I think I will not have time before Christmas, but then I will try to get at this issue with some serious assessment. |
Hey I'm back! So let my try to pick up where we left. |
Sorry the archive was unreachable, should be fixed now. |
Using MDegrain with a dct>0 (set in MAnalyze) the thresholds thSAD1, thSAD2 and thSCD1 are blocksize dependent.
Expectation:
Follwing the documentation: "The provided thSAD value is scaled to a 8x8 blocksize."
Thus the thresholds should not dependent on the blocksize.
Solution:
remove explicit multiplication of the sad value with variable nBlkSizeX in function PlaneOfBlocks::LumaSADx in each case statement
The text was updated successfully, but these errors were encountered: