Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something went wrong with noise. #202

Closed
brettviren opened this issue Mar 22, 2023 · 31 comments
Closed

Something went wrong with noise. #202

brettviren opened this issue Mar 22, 2023 · 31 comments

Comments

@brettviren
Copy link
Member

Discovered by Mike Wang (with a DNN no less!) and reported here:

https://cdcvs.fnal.gov/redmine/issues/27898

Some regression related to noise crept in. Haiwang finds it most plain as uncharacteristic high or low values at the very begin and end of the mean waveform. Between v 0.20.0 and 0.23.0:

noise1

Looking at the first ticks across a few versions (also from Haiwang):

noise2

Something between 0.21 and 0.23 went "bad". PR #172 is a suspect.

@HaiwangYu
Copy link
Member

a note with more info:
2023-03-22 noise-debug-v2.pdf

@brettviren
Copy link
Member Author

Zooming in on the left side also shows the problem:

https://www.phy.bnl.gov/~bviren/tmp/wctsim/osd/bad-noise/bad-noise.html

Time is on the X-axis. A ridge of high samples go down the channels. Then there is a broader valley of low samples.

Note, each channel group samples noise from different, arbitrary wire lengths and is not meant to be a correct model for any given detector. The data is generated via:

https://github.com/WireCell/wire-cell-toolkit/blob/master/gen/test/test-noise-roundtrip.jsonnet

@HaiwangYu
Copy link
Member

Yes, I saw that too on all 3 planes of the PDSP simulation.
Screenshot 2023-03-23 at 9 23 12 AM

@brettviren
Copy link
Member Author

Well, it seems this bug is due, at least in part, to a rather embarrassing blunder on my part!

During the refactoring of the "recycling randoms" I went through a some iteration on changing the internal API. After that was done, I did not correctly update how that API is called when we create a recycling random generator.

https://github.com/WireCell/wire-cell-toolkit/blame/master/gen/src/AddNoise.cxx#L56

As a consequence, the "percentage replacement" value was being used to set the mean of the normal distribution. That mean should have been set to zero but was set to 0.04.

Changing the code to call make_recyling() with a proper mean=0 and sigma=1 arguments removes the weird high/low ends of the mean waveforms. Because 0.04 is close to 0.0, this bug was not so obvious.

That is a real bug but even given a non-zero Gaussian mean, I do not understand the cause of the high/low ends of the mean waveform. Something else is likely lurking. I'll check a bit more.

@brettviren
Copy link
Member Author

Hmm, the generated normals are consumed to construct a complex spectrum modulated by the Gaussian sigmas from the input mean spectrum. Using a non-zero mean for the "normal" distribution certainly causes the result to be non-Rayleigh so some distortion must happen. Not sure why this exact shape, but I think that's enough to believe there is no deeper problem.

@brettviren
Copy link
Member Author

Here is the case when I make the mean of the "normal" distribution ridiculously large (1.0).

inco-adc-bad

Note the exaggerated ends of the mean waveforms below each frame.

Here is after using proper mean of 0 for the normal distribution:

inco-adc-good

The commit which is inbound provides check_noise_roundtrip.py to make these plots and spew some stats and test-noise-roundtrip.bats which will make these plots and fail the stats if they go too far out of spec again.

@HaiwangYu
Copy link
Member

@brettviren, thanks for figuring this out so quickly. I will try it out too.

A small comment: For your test plot, is that possible to move the coordinate axises a bit away from the content (2D image and 1D waveforms)? So that the edge ticks/channels can be viewed easier.

A question: what this test plot looks like if you use mean=0.04 instead of 1? Will the the test fail in that case?

@HaiwangYu
Copy link
Member

Hi @brettviren, seems the spectra with this fix went up a bit compared to 0.20.0 and 0.17.1.
Screenshot 2023-03-24 at 12 10 40 PM
Screenshot 2023-03-24 at 12 10 03 PM

workarea:
https://www.phy.bnl.gov/~yuhw/wct-ci/gen/

gen:

wire-cell -l stdout -L debug -c check_pdsp_noise.jsonnet -V output=frames-9485256.tar.bz2

compare mean spec

wirecell-plot comp1d -n spec frames-0.17.1.tar.bz2 frames-9485256.tar.bz2 spec-u.pdf --chmin 0 --chmax 800 --interactive

@HaiwangYu HaiwangYu reopened this Mar 24, 2023
@brettviren
Copy link
Member Author

Thanks @HaiwangYu. As mentioned in our chat, definitely there should be some change in the spectrum but my naive understanding is that the spectrum would become slightly lower after the bug fix. I'll look into this more!

@HaiwangYu
Copy link
Member

HaiwangYu commented Mar 24, 2023

Hi @brettviren, I checked the following variations of 9485256, the higher spectra remains. So I think something else than Normals::make_recycling happened between 0.23.0 and 9485256

  • v2: re-run
  • v3: mean 0 -> 0.04
  • v4: revert to original call

While if I add the correct function call of Normals::make_recycling to 0.23.0, I do get matching mean spectra and wave:
Screenshot 2023-03-24 at 3 08 42 PM

wirecell-plot comp1d -n spec frames-0.17.1.tar.bz2 frames-0.23.0-v3.tar.bz2 spec-u.pdf --chmin 0 --chmax 800 --interactive
wirecell-plot comp1d -n wave frames-0.17.1.tar.bz2 frames-0.23.0-v3.tar.bz2 tmp.pdf --chmin 0 --chmax 800 --interactive

@HaiwangYu
Copy link
Member

I checked that they should be loading the same noise spectra too.
Screenshot 2023-03-24 at 3 17 05 PM

@brettviren
Copy link
Member Author

@HaiwangYu thanks, I'll look at these type of plots next.

I checked kind of the same thing using the "roundtrip" test which applies this chain:

spec->sampling->volts->adc->dac->volts->fitting->spec

With bug fix, mean=0, output spec matches input spec up to distortions which I believe are due to inescapable ADC quantization error.

With bug added back, mean=0.04, the output spectra are slightly higher than input, beyond what quantization adds.

Adding "superbug" with mean=1.0, the fit spectra go very high. Eg, input spectrum peak at 0.16V gives output 0f about 0.24V.

@brettviren
Copy link
Member Author

I see something a little different using comp1don the "roundtrip" noise. The actual spectra are hand-made to mimic official PDSP noise but the shapes should not be exactly the same. Also, the "roundtrip" test does not care about canonical "planes". The channel range I plot are simply the channels getting the maximum noise in the test.

Note, I just pushed a small change to the comp1d plotter. Actual changes are only cosmetic.

I'm using EmpiricalNoiseModel as you. I have also looked at using GroupNoiseModel and observe the same "recycling bug".

I note two differences in comparing "bug" vs "no bug" pair with "roundtrip" noise related to your pair:

  1. Noise from "roundtrip" test has a closer match between "bug" and "no bug".
  2. It also has strong DC component missing from your noise. This is after subtracting the median. If instead I use the new --baseline=mean I get a zero value DC component. Perhaps this is related to integer roundoff? I note, your sample also has non-zero DC.

test-noise-roundtrip-nobug-empno+smlbug-empno-empno-comp1d-spec0000
test-noise-roundtrip-nobug-empno+smlbug-empno-empno-comp1d-wave0000

So, what's still going on?

  • Something about using the concocted noise spectra in the "roundtrip" test instead of official PDSP noise spectra?

  • Something about mixing signal + noise (your test) vs pure noise (my test)?

I'll audit the code between 0.23 and now.

@brettviren
Copy link
Member Author

@HaiwangYu I guess something may be off about your tests? Here is what I see with latest HEAD with your check_pdsp_noise.jsonnet from your web site:

check-pdsp-noise-nobug-smlbug-comp1d-spec0000
check-pdsp-noise-nobug-smlbug-comp1d-wave0000

@HaiwangYu
Copy link
Member

HaiwangYu commented Mar 24, 2023

Hi @brettviren, I compared 0.20.0 and 9485256, for u plane [0, 800). And 9485256 has a higher spectra. Did you see a different behavior?

@brettviren
Copy link
Member Author

Okay. I'm confused. You wrote 0.23 before.

@HaiwangYu
Copy link
Member

If I check same channels for same configuration, I got same results as yours

Screenshot 2023-03-24 at 6 35 15 PM

Screenshot 2023-03-24 at 6 34 41 PM

@brettviren
Copy link
Member Author

Okay, I will look at 0.20 + bug fix compared to HEAD.

@HaiwangYu
Copy link
Member

Okay. I'm confused. You wrote 0.23 before.

0.23 has the same spectra as 0.20. 9485256 is also higher than 0.23.

@brettviren
Copy link
Member Author

Ah, got it. Then I'll look at 0.23! 😄

@brettviren
Copy link
Member Author

@HaiwangYu I don't know what I'm missing. Except for the DC frequency bin and the first/last time bins, I get good agreement between 0.23 vs HEAD and "0.04 bug" vs "no bug".

@HaiwangYu
Copy link
Member

HaiwangYu commented Mar 25, 2023

@brettviren, I think I found one change in PR175 may be the reason:

return round(relvoltage * adcmaxval);

  • I used 0.17.1 as the ref: before IDFT, noise refactoring
  • pr175: higher than ref
  • pr175 without round: same as ref

Screenshot 2023-03-24 at 8 43 43 PM

I guess this is because if no explicit round called here, some downstream code would implicitly use floor.

More in:
2023-03-22 noise-debug-v2-digitizer.pdf

@brettviren
Copy link
Member Author

Excellent work, @HaiwangYu !

I wish I had given more info in my commit message as I'm having a hard time remembering what exactly motivated adding round(). This particular commit is in the middle of a series related to BlobDepoFill which is not directly related to Digitizer. I must have found some problem as a side effect of the BlobDepoFill work.

I'd expect using round() would shift the ADC waveforms upward by 1/2 count on average compared to floor(). We won't see this shift in the wave plots above due to the re-baselining comp1d applies. The spec plot shift is in the right direction for this explanation but it is not obvious that the size of the shift is consistent with 1/2 count.

I'll spend some time to better understand the implication of round() compared to floor().

@brettviren
Copy link
Member Author

Check round vs floor at different noise levels

I use the "roundtrip" test configured with a parameterized noise spectrum that roughly models real spectra. For every group of 256 channels out of total 2560 channels I give progressively larger fraction of the parameterize spectrum. Eg, first group gets 0.1x total noise, second 0.2x, etc.

The nobug label means that the original bug that spawned this issue is fixed.

Measured mean spectra and waveform vs amount of noise

I show the measured mean spectra and waveforms for the three groups with least noise (0.1x, 0.2x and 0.3x) for the cases of directly truncating floating-point ADC values to integers (marked as floor) and after first applying round(). Starting with the 3rd group, the higher noise groups have no substantial differences between floor and round.

AC-coupled mean spectra

I simply set the zero-frequency bin to 0 so baseline shifts do not dominate the plot of the spectra. No other transformation is done (besides taking mean across the group of channels).

comp1d-floor+round-nobug-spec-ac-grp1
comp1d-floor+round-nobug-spec-ac-grp2
comp1d-floor+round-nobug-spec-ac-grp3

AC-coupled mean waveforms

The corresponding AC-coupled waveforms are:

comp1d-floor+round-nobug-wave-ac-grp1
comp1d-floor+round-nobug-wave-ac-grp2
comp1d-floor+round-nobug-wave-ac-grp3

DC-coupled mean waveforms

To show the baseline shift directly, the untransformed mean waveforms are:

comp1d-floor+round-nobug-wave-none-grp1
comp1d-floor+round-nobug-wave-none-grp2
comp1d-floor+round-nobug-wave-none-grp3

Conclusions

  • As expected, round() imparts a baseline shift of 1/2 ADC count compared to floor.

  • I think that the large distortion in group 1 spectrum for floor is due to more extreme quantization error than round. I think this also explains why floor gives much smaller variance in the mean waveform compared to round.

  • PDSP mean waveform has std=0.25 compared to group 3's std=0.04 or group 10's std=0.12. Already group 3 has no floor vs round spectra difference besides baseline shift. This leaves me with some trouble to understand why floor vs round on PDSP spectrum makes a difference.

What do?

This really comes down to what is the correct vs desired model for ADC quantization. Since, the answer for each may not be the same and may be a decision for each experiment, I make the application of round() be a configurable parameter.

But, we still should be a sensible default. Unless people have opinions, I will make floor the default.

@brettviren
Copy link
Member Author

Check original bug, double sized bug and no bug

Here I look at three cases:

  • nobug we properly use Normal distribution
  • smlbug we use Gaussian with mean=0.04 (original bug)
  • bigbug we use Gaussian with mean=0.8

This uses the same "roundtrip" test as above.

Here the focus is just on the case of floor quantization and the highest noise ("group 10").

The three mean AC-coupled waveforms:

comp1d-floor-bugs-wave-ac-grp10

And the three mean AC-coupled spectra:

comp1d-floor-bugs-spec-ac-grp10

Neither bug nor "double-bug" appears to distort the spectrum despite the damage at the ends of the waveforms. The bug fix removes that damage.

@brettviren
Copy link
Member Author

For posterity, here is a simple demonstration about the choice between round and floor in the Digitizer.

As a function of input voltage it shows the ADC as untruncated float, floor or round at either end of the fullscale range.

digitizer

It is clear how floor is a maximally biased choice.

However, what is not shown is that we also add a baseline to the voltage. This baseline could be chosen to shift the voltage by 1/2 ADC step such that floor would not actually be biased. Getting this actually correct requires some careful study. Note, in the plot, baseline=0.

@HaiwangYu
Copy link
Member

Hi @brettviren, I think I made a mistake in the original comp1d that caused the large spectra discrepancy I saw.

The following code subtracts the median first then convert the frame to 'int16' by call floor in the end. This causes quite some bias and further more, bias the frame from Digitizer::round and Digitizer::floor differently.

frame = numpy.array((frame.T - numpy.median(frame, axis=1)).T, dtype=dtype)

After fixing this, or using your new comp1d, the spectra are consistent between Digitizer::round and Digitizer::floor except for the DC component.
Screenshot 2023-03-26 at 8 43 54 PM

I am so sorry for making this mistake.

Meanwhile, it is great that we noticed this Digitizer::floor to Digitizer::round change. Which may be just a simple baseline shift in most cases but still good to thoroughly understand its potential impact.

@brettviren
Copy link
Member Author

Ah, great, thanks for checking this, @HaiwangYu! I think that closes the last concern.

BTW, I had also looked at that original median subtraction and casting with some interest but could not see anything wrong with it.

@HaiwangYu
Copy link
Member

HaiwangYu commented Mar 27, 2023

Hi @brettviren, I tried to overlay the input spectra and it seems the older version has a better consistency. My plotting script:

def specs_from_file(spectra_file, planes=None, wirelen=7500, scale_factor=1.0e9): 
    '''
    default unit is megavolt
    MV -> mV scale_factor=1.0e9
    '''
    wire_specs = json.loads(bz2.BZ2File(spectra_file, 'r').read())
    for i, wire_spec in enumerate(wire_specs) :
        if planes is not None and wire_spec['plane'] not in planes:
            continue
        if abs(wire_spec['wirelen']-wirelen) > 10 :
            continue
        print("const {:.3e} plane {}, wirelen {:.1f}".format(wire_spec['const'],wire_spec['plane'],wire_spec['wirelen']))
        freqs = [x*1000 for x in wire_spec['freqs']]
        amps = [math.sqrt(x**2+(wire_spec['const'])**2)*scale_factor for x in wire_spec['amps']]
        return (freqs,amps)

and ADC -> mV scale is 4095./1400.

Screenshot 2023-03-27 at 11 30 47 AM

@brettviren
Copy link
Member Author

Hi @HaiwangYu

In the refactoring I missed the fact that the EmpericalNoiseModel returns a (half) spectrum with more than the number of requested samples (equal to the "fft best length").

The IncoherentAddNoise was then using only the requested spectrum size portion and that effectively stretched the spectrum toward higher frequencies. The fix now has IncoherentAddNoise being more tolerant. It will perform the sampling on the overly large spectrum and then truncate the resulting waveform to fit the requested size.

I also found and fixed some problems with the handling of the white-noise "constant" that EmpericalNoiseModel supports.

There is still a 3% increase (0.106 -> 0.109, see last plot below) in the RMS of final waveforms compared to rel 0.20.0 and 0.21.0. I think this is small enough that we need not delay making a release with all these fixes. Let me know if you disagree.

See below for some diagnostic plots comparing current HEAD with past releases. First the spectrum with 2 zooms at peak and bump and then the waveforms.

comp1d-history-spec
comp1d-history-spec-zoom2
comp1d-history-spec-zoom1
comp1d-history-wave

@HaiwangYu
Copy link
Member

Thanks @brettviren. Seems all noise related tests are OK for me. I will do some more tests and make the release.
2023-03-28 noise-debug-v2-spctra-shift.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants