Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error bound check for CKMS appears to be off #27

Open
c0deaddict opened this issue Jul 28, 2020 · 6 comments
Open

Error bound check for CKMS appears to be off #27

c0deaddict opened this issue Jul 28, 2020 · 6 comments

Comments

@c0deaddict
Copy link

c0deaddict commented Jul 28, 2020

I'm trying to make Elixir bindings for this library. During testing of the bindings I noticed that in my test quantile queries were off by more than the error bound (compared to the quantiles on a sorted list). Reading through the tests in this project, I noticed that the error bound check is not using .abs():

(v - percentile(&data, prcnt)) < err,

The test unfortunately fails when mentioned line is changed to (v - percentile(&data, prcnt)).abs() < err.

Is this the paper on which the algorithm is based? http://www.dimacs.rutgers.edu/~graham/pubs/papers/bq-pods.pdf I tried to read through it to discover what the error bound is, but could not find it.

@blt
Copy link
Collaborator

blt commented Jul 28, 2020

Is it possible for you to share your test code, or at least the data you're using?

Regarding the paper, that's the one per this discussion but back when I originally implemented this library the one available freely online had a bug in its algorithm. I know the IEEE is accurate per discussion with Graham Cormode.

If you can share your test data I'll try and reproduce the issue in this library's test code.

@c0deaddict
Copy link
Author

My Elixir code is a mess atm, but I wrote a small test in Rust that shows what I mean. Below program works fine for n = 100 or n = 1000, but fails for n = 10000 or bigger n. It does not seems to depend on the random seed.

What i'm unsure about is how to correctly compute the exact quantile: should the index be ceil, floor or round?

use quantiles::ckms::CKMS;
use rand::Rng;
use std::cmp;

fn main() {
    let error = 0.001;
    let n = 10000;

    let mut data = Vec::<u16>::new();
    let mut ckms = CKMS::<u16>::new(error);

    for _ in 0..n {
        let v = rand::thread_rng().gen_range(u16::MIN, u16::MAX);
        data.push(v);
        ckms.insert(v);
    }

    data.sort();

    for p in 0..1001 {
        let q = p as f64 / 1000.0;
        let idx = cmp::max(1, (q * n as f64).ceil() as usize);
        let expected = data[idx - 1];
        let (_rank, actual) = ckms.query(q).unwrap();
        let diff = expected as f64 - actual as f64;
        if diff.abs() > error {
            println!(
                "q {}: expected {} actual {} diff {} > {}",
                q, expected, actual, diff, error
            );
        }
    }
}

@blt
Copy link
Collaborator

blt commented Aug 1, 2020

Excellent. Thank you. I'm going to convert your code here into a test in the suite. Will report back.

blt added a commit that referenced this issue Aug 2, 2020
This commit demonstrates the error reported by @c0deaddict. In some
cases, for reasons unknown, when elements are inserted into the CKMS
the error bound is not obeyed. This appears to be related to the total
number of elements inserted into the structure and not the points
themselves.
blt added a commit that referenced this issue Aug 2, 2020
I think the issue that was pointed in #27 is an error with the user
supplied quantile code. Note that `error_nominal_test` and
`error_nominal_with_merge_test` assert that error bounds are maintained
without triggering the reported failure. This commit simplifies the
supplied test code and indicates that the disagreement happens with the
user quantile calculation. Note that the calculated rank and user index
are quite different.
@blt
Copy link
Collaborator

blt commented Aug 2, 2020

@c0deaddict I think the error is in your comparison code. I put together a branch that includes your test code and trims it down some. This is it:

quantiles/src/ckms/mod.rs

Lines 293 to 366 in cde2a22

#[cfg(test)]
mod test {
use super::*;
use crate::ckms::store::invariant;
use quickcheck::{QuickCheck, TestResult};
use std::cmp;
use std::cmp::Ordering;
use std::f64::consts::E;
fn percentile(data: &[f64], prcnt: f64) -> f64 {
let idx = (prcnt * (data.len() as f64)) as usize;
data[idx]
}
#[test]
fn test_insert() {
let error = 0.001;
let total_points = 10;
let mut data = Vec::<u16>::new();
let mut ckms = CKMS::<u16>::new(error);
for v in 0..total_points {
data.push(v);
ckms.insert(v);
}
assert_eq!(ckms.query(0.0), Some((1, 0)));
assert_eq!(ckms.query(0.001), Some((1, 0)));
assert_eq!(ckms.query(0.01), Some((1, 0)));
assert_eq!(ckms.query(0.1), Some((1, 0)));
assert_eq!(ckms.query(0.101), Some((1, 0)));
assert_eq!(ckms.query(0.14), Some((1, 0)));
assert_eq!(ckms.query(0.145), Some((1, 0)));
assert_eq!(ckms.query(0.146), Some((1, 0)));
assert_eq!(ckms.query(0.15), Some((2, 1)));
assert_eq!(ckms.query(0.199), Some((2, 1)));
assert_eq!(ckms.query(0.2), Some((2, 1)));
assert_eq!(ckms.query(0.3), Some((3, 2)));
assert_eq!(ckms.query(0.4), Some((4, 3)));
assert_eq!(ckms.query(0.5), Some((5, 4)));
assert_eq!(ckms.query(0.6), Some((6, 5)));
assert_eq!(ckms.query(0.7), Some((7, 6)));
assert_eq!(ckms.query(0.8), Some((8, 7)));
assert_eq!(ckms.query(0.9), Some((9, 8)));
assert_eq!(ckms.query(1.0), Some((10, 9)));
for q in &[
0.0, 0.001, 0.01, 0.1, 0.101, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
] {
let idx = cmp::max(1, (q * total_points as f64).ceil() as usize);
let expected = data[idx - 1];
let (rank, actual) = ckms.query(*q).unwrap();
let diff = if expected > actual {
expected - actual
} else {
actual - expected
};
let result = (diff as f64).partial_cmp(&error);
assert!(result.is_some());
assert_ne!(
result,
Some(Ordering::Greater),
"total_points: {} | idx: {} | (rank, actual): ({}, {}) | quantile: {} | {} > {}",
total_points,
idx,
rank,
actual,
q,
diff,
error
);
}
}

I first wondered if the reported issue could be triggered without resorting to random data, and it could. I found somewhat accidentally that the issue could be triggered by a drastically smaller number of points, 10 in the above snippet. I went back to the paper and for the data set 0 .. 10 calculated the rank and approximation the paper suggests. That's what these lines are:

quantiles/src/ckms/mod.rs

Lines 320 to 338 in cde2a22

assert_eq!(ckms.query(0.0), Some((1, 0)));
assert_eq!(ckms.query(0.001), Some((1, 0)));
assert_eq!(ckms.query(0.01), Some((1, 0)));
assert_eq!(ckms.query(0.1), Some((1, 0)));
assert_eq!(ckms.query(0.101), Some((1, 0)));
assert_eq!(ckms.query(0.14), Some((1, 0)));
assert_eq!(ckms.query(0.145), Some((1, 0)));
assert_eq!(ckms.query(0.146), Some((1, 0)));
assert_eq!(ckms.query(0.15), Some((2, 1)));
assert_eq!(ckms.query(0.199), Some((2, 1)));
assert_eq!(ckms.query(0.2), Some((2, 1)));
assert_eq!(ckms.query(0.3), Some((3, 2)));
assert_eq!(ckms.query(0.4), Some((4, 3)));
assert_eq!(ckms.query(0.5), Some((5, 4)));
assert_eq!(ckms.query(0.6), Some((6, 5)));
assert_eq!(ckms.query(0.7), Some((7, 6)));
assert_eq!(ckms.query(0.8), Some((8, 7)));
assert_eq!(ckms.query(0.9), Some((9, 8)));
assert_eq!(ckms.query(1.0), Some((10, 9)));

If you run the code this portion of the test will pass. Good sign. At least my understanding of the paper and the implementation correspond. The remaining portion of the test that fails is:

quantiles/src/ckms/mod.rs

Lines 340 to 365 in cde2a22

for q in &[
0.0, 0.001, 0.01, 0.1, 0.101, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
] {
let idx = cmp::max(1, (q * total_points as f64).ceil() as usize);
let expected = data[idx - 1];
let (rank, actual) = ckms.query(*q).unwrap();
let diff = if expected > actual {
expected - actual
} else {
actual - expected
};
let result = (diff as f64).partial_cmp(&error);
assert!(result.is_some());
assert_ne!(
result,
Some(Ordering::Greater),
"total_points: {} | idx: {} | (rank, actual): ({}, {}) | quantile: {} | {} > {}",
total_points,
idx,
rank,
actual,
q,
diff,
error
);
}

This fails like so:

---- ckms::test::test_insert stdout ----
thread 'ckms::test::test_insert' panicked at 'assertion failed: `(left != right)`
  left: `Some(Greater)`,
 right: `Some(Greater)`: total_points: 10 | idx: 2 | (rank, actual): (1, 0) | quantile: 0.101 | 1 > 0.001', src/ckms/mod.rs:353:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    ckms::test::test_insert

The important things to note here is that the idx disagrees with the rank. If you adjust from ceil to floor then the test will pass, but that's only incidentally. Quantile calculation like this requires you to take into account whether the dataset has an odd or even number of members, which the calculation in your example doesn't do. If you adjust to 11 total samples -- and comment out the hand-calculation bit -- then you'll find the test will pass. I believe what this means is that the calculation you've supplied is faulty. I feel pretty confident about that after having gone back over the paper, plus remembering that the tests

quantiles/src/ckms/mod.rs

Lines 419 to 448 in cde2a22

#[test]
fn error_nominal_test() {
fn inner(mut data: Vec<f64>, prcnt: f64) -> TestResult {
data.sort_by(|a, b| a.partial_cmp(b).unwrap());
if !between_inclusive(prcnt, 0.0, 1.0) || data.is_empty() {
return TestResult::discard();
}
let err = 0.001;
let mut ckms = CKMS::<f64>::new(err);
for d in &data {
ckms.insert(*d);
}
if let Some((_, v)) = ckms.query(prcnt) {
debug_assert!(
(v - percentile(&data, prcnt)) < err,
"v: {} | percentile: {} | prcnt: {} | data: {:?}",
v,
percentile(&data, prcnt),
prcnt,
data
);
TestResult::passed()
} else {
TestResult::failed()
}
}
QuickCheck::new().quickcheck(inner as fn(Vec<f64>, f64) -> TestResult);
}

and

quantiles/src/ckms/mod.rs

Lines 457 to 499 in cde2a22

#[test]
fn error_nominal_with_merge_test() {
fn inner(lhs: Vec<f64>, rhs: Vec<f64>, prcnt: f64, err: f64) -> TestResult {
if !between_inclusive(prcnt, 0.0, 1.0)
|| !between_inclusive(err, 0.0, 1.0)
|| (lhs.len() + rhs.len()) < 1
|| lhs.is_empty()
|| rhs.is_empty()
{
return TestResult::discard();
}
let mut data = lhs.clone();
data.append(&mut rhs.clone());
data.sort_by(|a, b| a.partial_cmp(b).unwrap());
let err = 0.001;
let mut ckms = CKMS::<f64>::new(err);
for d in &lhs {
ckms.insert(*d);
}
let mut ckms_rhs = CKMS::<f64>::new(err);
for d in &rhs {
ckms_rhs.insert(*d);
}
ckms += ckms_rhs;
if let Some((_, v)) = ckms.query(prcnt) {
debug_assert!(
(v - percentile(&data, prcnt)) < err,
"v: {} | percentile: {} | prcnt: {} | data: {:?}",
v,
percentile(&data, prcnt),
prcnt,
data
);
TestResult::passed()
} else {
TestResult::failed()
}
}
QuickCheck::new().quickcheck(inner as fn(Vec<f64>, Vec<f64>, f64, f64) -> TestResult);
}

both assert that the error guarantee holds, for randomly generated datasets and queries. If you buy this analysis I think what today's debugging session suggests to me is:

  1. That the paper that lays out the data structure is behind a paywall is a severe impediment to trust of the structure and
  2. this library would be greatly improved with a reference quantile structure that consumed arbitrarily large space but returned exact results for queries.

@c0deaddict
Copy link
Author

@blt First of all thanks for your thorough investigation, and apologies for my late reply.

@c0deaddict I think the error is in your comparison code. I put together a branch that includes your test code and trims it down some. This is it:

quantiles/src/ckms/mod.rs

Lines 293 to 366 in cde2a22

#[cfg(test)]
mod test {
use super::*;
use crate::ckms::store::invariant;
use quickcheck::{QuickCheck, TestResult};
use std::cmp;
use std::cmp::Ordering;
use std::f64::consts::E;
fn percentile(data: &[f64], prcnt: f64) -> f64 {
let idx = (prcnt * (data.len() as f64)) as usize;
data[idx]
}
#[test]
fn test_insert() {
let error = 0.001;
let total_points = 10;
let mut data = Vec::<u16>::new();
let mut ckms = CKMS::<u16>::new(error);
for v in 0..total_points {
data.push(v);
ckms.insert(v);
}
assert_eq!(ckms.query(0.0), Some((1, 0)));
assert_eq!(ckms.query(0.001), Some((1, 0)));
assert_eq!(ckms.query(0.01), Some((1, 0)));
assert_eq!(ckms.query(0.1), Some((1, 0)));
assert_eq!(ckms.query(0.101), Some((1, 0)));
assert_eq!(ckms.query(0.14), Some((1, 0)));
assert_eq!(ckms.query(0.145), Some((1, 0)));
assert_eq!(ckms.query(0.146), Some((1, 0)));
assert_eq!(ckms.query(0.15), Some((2, 1)));
assert_eq!(ckms.query(0.199), Some((2, 1)));
assert_eq!(ckms.query(0.2), Some((2, 1)));
assert_eq!(ckms.query(0.3), Some((3, 2)));
assert_eq!(ckms.query(0.4), Some((4, 3)));
assert_eq!(ckms.query(0.5), Some((5, 4)));
assert_eq!(ckms.query(0.6), Some((6, 5)));
assert_eq!(ckms.query(0.7), Some((7, 6)));
assert_eq!(ckms.query(0.8), Some((8, 7)));
assert_eq!(ckms.query(0.9), Some((9, 8)));
assert_eq!(ckms.query(1.0), Some((10, 9)));
for q in &[
0.0, 0.001, 0.01, 0.1, 0.101, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
] {
let idx = cmp::max(1, (q * total_points as f64).ceil() as usize);
let expected = data[idx - 1];
let (rank, actual) = ckms.query(*q).unwrap();
let diff = if expected > actual {
expected - actual
} else {
actual - expected
};
let result = (diff as f64).partial_cmp(&error);
assert!(result.is_some());
assert_ne!(
result,
Some(Ordering::Greater),
"total_points: {} | idx: {} | (rank, actual): ({}, {}) | quantile: {} | {} > {}",
total_points,
idx,
rank,
actual,
q,
diff,
error
);
}
}

I first wondered if the reported issue could be triggered without resorting to random data, and it could. I found somewhat accidentally that the issue could be triggered by a drastically smaller number of points, 10 in the above snippet. I went back to the paper and for the data set 0 .. 10 calculated the rank and approximation the paper suggests. That's what these lines are:

quantiles/src/ckms/mod.rs

Lines 320 to 338 in cde2a22

assert_eq!(ckms.query(0.0), Some((1, 0)));
assert_eq!(ckms.query(0.001), Some((1, 0)));
assert_eq!(ckms.query(0.01), Some((1, 0)));
assert_eq!(ckms.query(0.1), Some((1, 0)));
assert_eq!(ckms.query(0.101), Some((1, 0)));
assert_eq!(ckms.query(0.14), Some((1, 0)));
assert_eq!(ckms.query(0.145), Some((1, 0)));
assert_eq!(ckms.query(0.146), Some((1, 0)));
assert_eq!(ckms.query(0.15), Some((2, 1)));
assert_eq!(ckms.query(0.199), Some((2, 1)));
assert_eq!(ckms.query(0.2), Some((2, 1)));
assert_eq!(ckms.query(0.3), Some((3, 2)));
assert_eq!(ckms.query(0.4), Some((4, 3)));
assert_eq!(ckms.query(0.5), Some((5, 4)));
assert_eq!(ckms.query(0.6), Some((6, 5)));
assert_eq!(ckms.query(0.7), Some((7, 6)));
assert_eq!(ckms.query(0.8), Some((8, 7)));
assert_eq!(ckms.query(0.9), Some((9, 8)));
assert_eq!(ckms.query(1.0), Some((10, 9)));

If you run the code this portion of the test will pass. Good sign. At least my understanding of the paper and the implementation correspond. The remaining portion of the test that fails is:

quantiles/src/ckms/mod.rs

Lines 340 to 365 in cde2a22

for q in &[
0.0, 0.001, 0.01, 0.1, 0.101, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
] {
let idx = cmp::max(1, (q * total_points as f64).ceil() as usize);
let expected = data[idx - 1];
let (rank, actual) = ckms.query(*q).unwrap();
let diff = if expected > actual {
expected - actual
} else {
actual - expected
};
let result = (diff as f64).partial_cmp(&error);
assert!(result.is_some());
assert_ne!(
result,
Some(Ordering::Greater),
"total_points: {} | idx: {} | (rank, actual): ({}, {}) | quantile: {} | {} > {}",
total_points,
idx,
rank,
actual,
q,
diff,
error
);
}

This fails like so:

---- ckms::test::test_insert stdout ----
thread 'ckms::test::test_insert' panicked at 'assertion failed: `(left != right)`
  left: `Some(Greater)`,
 right: `Some(Greater)`: total_points: 10 | idx: 2 | (rank, actual): (1, 0) | quantile: 0.101 | 1 > 0.001', src/ckms/mod.rs:353:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    ckms::test::test_insert

The important things to note here is that the idx disagrees with the rank. If you adjust from ceil to floor then the test will pass, but that's only incidentally. Quantile calculation like this requires you to take into account whether the dataset has an odd or even number of members, which the calculation in your example doesn't do. If you adjust to 11 total samples -- and comment out the hand-calculation bit -- then you'll find the test will pass. I believe what this means is that the calculation you've supplied is faulty.

How should the odd/even members be taken into account? Using ceil or floor for one or the other doesn't give right results either. I get the best results by using round.

There seem to be different methods of calculating percentiles. Numpy for example can do a linear interpolation if the quantile is not at an exact array index: https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L3865-L3969 I think the main question is: what method for calculating exact quantiles did the authors of the paper use?

I feel pretty confident about that after having gone back over the paper, plus remembering that the tests

quantiles/src/ckms/mod.rs

Lines 419 to 448 in cde2a22

#[test]
fn error_nominal_test() {
fn inner(mut data: Vec<f64>, prcnt: f64) -> TestResult {
data.sort_by(|a, b| a.partial_cmp(b).unwrap());
if !between_inclusive(prcnt, 0.0, 1.0) || data.is_empty() {
return TestResult::discard();
}
let err = 0.001;
let mut ckms = CKMS::<f64>::new(err);
for d in &data {
ckms.insert(*d);
}
if let Some((_, v)) = ckms.query(prcnt) {
debug_assert!(
(v - percentile(&data, prcnt)) < err,
"v: {} | percentile: {} | prcnt: {} | data: {:?}",
v,
percentile(&data, prcnt),
prcnt,
data
);
TestResult::passed()
} else {
TestResult::failed()
}
}
QuickCheck::new().quickcheck(inner as fn(Vec<f64>, f64) -> TestResult);
}

and

quantiles/src/ckms/mod.rs

Lines 457 to 499 in cde2a22

#[test]
fn error_nominal_with_merge_test() {
fn inner(lhs: Vec<f64>, rhs: Vec<f64>, prcnt: f64, err: f64) -> TestResult {
if !between_inclusive(prcnt, 0.0, 1.0)
|| !between_inclusive(err, 0.0, 1.0)
|| (lhs.len() + rhs.len()) < 1
|| lhs.is_empty()
|| rhs.is_empty()
{
return TestResult::discard();
}
let mut data = lhs.clone();
data.append(&mut rhs.clone());
data.sort_by(|a, b| a.partial_cmp(b).unwrap());
let err = 0.001;
let mut ckms = CKMS::<f64>::new(err);
for d in &lhs {
ckms.insert(*d);
}
let mut ckms_rhs = CKMS::<f64>::new(err);
for d in &rhs {
ckms_rhs.insert(*d);
}
ckms += ckms_rhs;
if let Some((_, v)) = ckms.query(prcnt) {
debug_assert!(
(v - percentile(&data, prcnt)) < err,
"v: {} | percentile: {} | prcnt: {} | data: {:?}",
v,
percentile(&data, prcnt),
prcnt,
data
);
TestResult::passed()
} else {
TestResult::failed()
}
}
QuickCheck::new().quickcheck(inner as fn(Vec<f64>, Vec<f64>, f64, f64) -> TestResult);
}

both assert that the error guarantee holds, for randomly generated datasets and queries.

These tests look pretty solid. Only thing that seems strange is that the assertion checks if the actual value v is not bigger than by more than err than the expected value percentile(&data, prcnt), but it could happen that v is smaller than percentile(&data, prcnt) by more than err. For example if v = 80 and percentile(&data, prcnt) = 100, this would slip through but is an error, right?

If you buy this analysis I think what today's debugging session suggests to me is:

1. That the paper that lays out the data structure is behind a paywall is a severe impediment to trust of the structure and

Yes indeed. Is this the paper you based it on: https://ieeexplore.ieee.org/abstract/document/4274974/ Maybe it's a good idea to include a link to it in the source code?

Fortunately there are ways to work around the pay wall :-)

2. this library would be greatly improved with a reference quantile structure that consumed arbitrarily large space but returned exact results for queries.

That would be very useful!

@blt
Copy link
Collaborator

blt commented Aug 9, 2020

@blt First of all thanks for your thorough investigation, and apologies for my late reply.

Not a problem! Things take the time they take.

How should the odd/even members be taken into account? Using ceil or floor for one or the other doesn't give right results either. I get the best results by using round.

There seem to be different methods of calculating percentiles. Numpy for example can do a linear interpolation if the quantile is not at an exact array index: https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L3865-L3969 I think the main question is: what method for calculating exact quantiles did the authors of the paper use?

This is a very good question, and not necessarily something I appreciated when I first wrote this library. I won't likely have a chance to get to the paper until at least next weekend, but this does strike me as a very important question to pursue.

These tests look pretty solid. Only thing that seems strange is that the assertion checks if the actual value v is not bigger than by more than err than the expected value percentile(&data, prcnt), but it could happen that v is smaller than percentile(&data, prcnt) by more than err. For example if v = 80 and percentile(&data, prcnt) = 100, this would slip through but is an error, right?

Ah, indeed. Yes that is an oversight. That should be corrected. I'd be curious to see what turns up when it is.

Yes indeed. Is this the paper you based it on: https://ieeexplore.ieee.org/abstract/document/4274974/ Maybe it's a good idea to include a link to it in the source code?

It's linked in the README but that's a little distant. What you get in the source documentation is not the best:

//! As of this writing you _must_ use the presentation in the IEEE version of
//! the paper. The authors' self-published copy of the paper is incorrect and
//! this implementation will _not_ make sense if you follow along using that
//! version. Only the 'full biased' invariant is used. The 'targeted quantiles'
//! variant of this algorithm is fundamentally flawed, an issue which the
//! authors correct in their "Space- and Time-Efficient Deterministic Algorithms
//! for Biased Quantiles over Data Streams"

That would be very useful!

Cool. When I have a little time next I'll fix the test bug you pointed out above and see about including this in the project. Sounds fun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants