BitInformation for data previously reduced in precision #46

rsignell-usgs · 2023-02-16T15:27:10Z

We have a large dataset from WRF, where the providers already reduced precision on some variables before passing to us. For example:

ALBEDO:number_of_significant_digits = 5 ;          
LAI:number_of_significant_digits = 3 ;
LH:number_of_significant_digits = 5 ;   
...

Can we use this knowledge somehow to create an alternate algorithm for calculating the appropriate keepbits?

The text was updated successfully, but these errors were encountered:

milankl · 2023-02-16T17:34:32Z

Do you know whether they rounded in decimal or in binary? Because while rounding in one base also quantizes in the other it doesn't round (as in make the trailing digits/bits shorter/simpler, the actual idea of rounding) in the other. Example:

julia> a = randn(Float32,5)
5-element Vector{Float32}:
 -0.35390654
  1.223104
 -0.04762434
 -1.5507344
  0.89877653

julia> b = round.(a,digits=3)
5-element Vector{Float32}:
 -0.354
  1.223
 -0.048
 -1.551
  0.899

julia> bitstring.(b,:split)
5-element Vector{String}:
 "1 01111101 01101010011111101111101"
 "0 01111111 00111001000101101000100"
 "1 01111010 10001001001101110100110"
 "1 01111111 10001101000011100101011"
 "0 01111110 11001100010010011011101"

While the mantissa bits aren't all zero after some mantissa bits, the resulting array may still be more compressible as the number of possible mantissa bitpatterns still have been greatly reduced.

But to actually answer your question, you can (roughly) translate between significant digits and bits via

nsb(nsd::Integer) = Integer(ceil(log(10)/log(2)*nsd))

which just arises from the idea that with $d$ digits you have a maximum absolute error of $\tfrac{10^{-d}}{2}$, and with $b$ bits you have a maximum absolute error of $\tfrac{2^{-b}}{2}$, equating the two and rounding up rather than down gives you the number of bits you need to have an error that isn't greater than the error from rounding in decimal. There are some caveats that @czender can elaborate more on.

In your case this means that if you already know that your dataset contains only $d$ digits of precision you can definitely round to

$d$ digits	$b$ bits
1	4
2	7
3	10
4	14
5	17
6	20
7	24

without losing any information. And in fact, you probably want to because if there's real information in the mantissa bits past those it's artificial from the quantization.

milankl · 2023-02-16T17:38:04Z

In fact, we already discussed that here nco/nco#250

rsignell-usgs · 2023-02-17T16:00:22Z

@pnorton-usgs and I checked, and these variables were processed with NCO using args like ppc ALBEDO:5, which specifies the Number of Significant Digits (as opposed to the Decimal Significant Digits).

So a few values look like this:

import struct

a = ds['ALBEDO'][100,500:510,600].values

a

array([0.2371893 , 0.22657919, 0.22525072, 0.21817589, 0.20056486,
       0.2204647 , 0.22222233, 0.22418547, 0.22438478, 0.22910786],
      dtype=float32)

def binary(num):
    return ''.join('{:0>8b}'.format(c) for c in struct.pack('!f', num))

for v in list(a):
    print(binary(v))

00111110011100101110000111000000
00111110011010000000010001100000
00111110011001101010100000100000
00111110010111110110100110000000
00111110010011010110000011100000
00111110011000011100000110000000
00111110011000111000111001000000
00111110011001011001000011100000
00111110011001011100010100100000
00111110011010101001101101000000

milankl · 2023-02-17T17:01:34Z

Just checked what Charlie means by NSD vs DSD, but the former is a relative error vs the latter is an absolute error. Great that you used nco's ppc option, because then rounding is actually done in binary. Depending on the version of nco this should also do granular bitrounding (see here for more on this) meaning that keepbits is somewhat variable from value to value. I guess it's between 16-18 keepbits here, looking at trailing zeros but it's obv impossible to say for sure without knowing the full precision values.

That gives you an upper bound for the bitinformation, but I'd still just run it over and see what the bitinformation analysis says?

rsignell-usgs · 2023-02-17T17:41:30Z

Will do @milankl !! Thanks for the great rounding examples also. I feel like I'm starting to understand this stuff! :)

rsignell-usgs changed the title ~~BitInformation for data known to be already reduced in precision~~ BitInformation for data previously reduced in precision Feb 16, 2023

milankl added the documentation Improvements or additions to documentation label Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BitInformation for data previously reduced in precision #46

BitInformation for data previously reduced in precision #46

rsignell-usgs commented Feb 16, 2023

milankl commented Feb 16, 2023

milankl commented Feb 16, 2023

rsignell-usgs commented Feb 17, 2023 •

edited

Loading

milankl commented Feb 17, 2023

rsignell-usgs commented Feb 17, 2023

BitInformation for data previously reduced in precision #46

BitInformation for data previously reduced in precision #46

Comments

rsignell-usgs commented Feb 16, 2023

milankl commented Feb 16, 2023

milankl commented Feb 16, 2023

rsignell-usgs commented Feb 17, 2023 • edited Loading

milankl commented Feb 17, 2023

rsignell-usgs commented Feb 17, 2023

rsignell-usgs commented Feb 17, 2023 •

edited

Loading