feature: Interpolate between closest ranks #2995

edugfilho · 2024-10-22T16:54:46Z

implements #2985

hotsphink · 2024-10-22T17:05:19Z

src/components/explore/ProbeExplorer.svelte

+      totalFreq += histogram[i].value;
+      cumFreq.push({ bin: histogram[i].bin, cumFreq: totalFreq });
+    }
+    const normalizer = cumFreq.at(-1).cumFreq


This is just totalFreq, fwiw. Except in the edge case where histogram is empty, but then the .cumFreq would throw an error. (It's probably impossible?)

hotsphink · 2024-10-22T17:10:13Z

src/components/explore/ProbeExplorer.svelte

+    }
+    const normalizer = cumFreq.at(-1).cumFreq
+    if (normalizer != 1) {
+      cumFreq = cumFreq.map((item) => {return {bin: item.bin, cumFreq: item.cumFreq/normalizer}})


Not that it really matters, but rather than creating a new set of objects, this could do an in-place update:

const normalizer = 1 / totalFreq; cumFreq.forEach(item => { item.cumFreq *= normalizer });

thank you. I appreciate the review!

hotsphink · 2024-10-22T17:13:26Z

src/components/explore/ProbeExplorer.svelte

+      if (targetFreq <= cumFreq[0].cumFreq) {
+        percentileValues[percentile] = cumFreq[0].bin;
+      }
+      if (targetFreq >= cumFreq.at(-1).cumFreq) {


if (targetFreq >= totalFreq)

hotsphink · 2024-10-22T18:16:50Z

src/components/explore/ProbeExplorer.svelte

+          var y0 = cumFreq[i].bin;
+          var y1 = cumFreq[i + 1].bin;
+          // Linear interpolation formula
+          var percentileValue = y0 + ((targetFreq - x0) * (y1 - y0)) / (x1 - x0);


I think this works, but I'm not sure how the bin values are given. If one bucket is for values 3-7, then will .bin be 3, 7, or (3+7)/2=5? With exponentially increasing bucket sizes, it seems like it would make a difference. ...after more research, I think your formula is exactly right as long as the bins are given as the lowest value in each bucket, which seems like it must be the case. (Then the range of values that fall in some bucket i are [cumFreq[i].bin, cumFreq[i+1].bin) which is exactly what you're using here, so I should shut up already.)

It's a little sketchy, because the linear interpolation assumes that the values are uniformly distributed within the bucket. Apparently sometimes people will use a logarithmic interpolation instead, which is for when the values are exponentially distributed within the bucket. I'm not at all sure about this stuff, but I don't think we can assume any particular distribution. And fortunately, the linear interpolation won't be very far off most other distributions. I think?

Your assumptions are correct. I added a note about this in the tooltip that will show next to the check-box that toggles the feature.
I also think that if we need to add different types of interpolation based on something (e.g. probe metadata) we can do that quickly now that we already have the core feature implemented

hotsphink · 2024-10-22T19:06:04Z

src/components/explore/QuantileExplorerView.svelte

                  </h3>
                  <span
                    use:tooltipAction={{
-                      text: 'Applies a moving average to smooth out short-term fluctuations on percentile values.',
+                      text: 'Generates percentiles using the Between Closest Ranks Linear Interpolation. This can show an innacurate representation of the data if the underlying distribution is not continuous and/or the data between bins is not uniformly distributed.',


s/innacurate/inaccurate/

Also, this isn't quite right. This does not depend on the distribution of data between bins, only on the distribution of data within bins. It will actually handle any distribution between bins, which is fortunate since it's usually going to some weird multimodal exponential-ish thing.

oops, my bad. I'll open another PR for this. I thought you were done.

oops, my bad. I'll open another PR for this. I thought you were done.

Sorry, I thought I was too.

feature: Interpolate between closest ranks

5d9a244

hotsphink reviewed Oct 22, 2024

View reviewed changes

Simplify normalization logic

f8b1c28

hotsphink reviewed Oct 22, 2024

View reviewed changes

edugfilho added 2 commits October 22, 2024 14:19

no-var

a90f96e

Fix lint and formatting

889ce3a

hotsphink reviewed Oct 22, 2024

View reviewed changes

edugfilho merged commit cd25a09 into main Oct 22, 2024
4 checks passed

edugfilho deleted the interpolate-between-ranks branch October 22, 2024 19:06

This was referenced Oct 22, 2024

fix: interpolate empty histograms + tooltip #2996

Merged

Changes in the 95th percentile aren't always visible in Glam when they are visible in Telemetry #1639

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Interpolate between closest ranks #2995

feature: Interpolate between closest ranks #2995

edugfilho commented Oct 22, 2024 •

edited

Loading

hotsphink Oct 22, 2024

hotsphink Oct 22, 2024

edugfilho Oct 22, 2024

hotsphink Oct 22, 2024

hotsphink Oct 22, 2024

edugfilho Oct 22, 2024

hotsphink Oct 22, 2024

edugfilho Oct 22, 2024 •

edited

Loading

hotsphink Oct 22, 2024

feature: Interpolate between closest ranks #2995

feature: Interpolate between closest ranks #2995

Conversation

edugfilho commented Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edugfilho Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edugfilho commented Oct 22, 2024 •

edited

Loading

edugfilho Oct 22, 2024 •

edited

Loading