[Proposal] Ultra fast modified Greenwald-khanna 💨 #26

sitegui · 2019-11-19T21:40:42Z

Hello,

I've been cooking up an algorithm inspired by GK, but built for speed. It currently achieves 11.4 million elements per second in single-thread, a 45x speedup from the number cited on this repo's README. With 8 threads it goes up to 77.0M/s (see more results here)

Initially I though about releasing it as a separate crate, but my question for you maintainers/users of this lib is whether you believe this one more quantile implementation could be added to this repo? The upside is to reduce the fragmentation of quantile crates at the cost of bloating this one lib.

What do you think?

Note: in some close future, I'd want to write up the main rationale of my approach somewhere. But it boils down mainly to:

micro-compressions: do not insert at every sample, discard them earlier instead of issuing a full compression with a regular interval as GK originally does
B-tree structure instead of Vec: to speed up insertion on arbitrary positions

The memory bound is 5 / epsilon samples.

Best regards,

The text was updated successfully, but these errors were encountered:

blt · 2019-11-20T18:23:43Z

I'd be very happy to have more quantile implementations in this repository! Please do let me know if I can help inclusion in any way. Guilherme Souza wrote on 11/19/19 1:40 PM:

…

Hello, I've been cooking up an algorithm inspired by GK, but built for speed. It currently achieves 11.4 million elements per second in single-thread, a 45x speedup from the number cited on this repo's README. With 8 threads it goes up to 77.0M/s (see more results here <https://github.com/sitegui/space-efficient-quantile#benchmark>) Initially I though about releasing it as a separate crate, but my question for you maintainers/users of this lib is whether you believe this one more quantile implementation could be added to this repo? The upside is to reduce the fragmentation of quantile crates at the cost of bloating this one lib. What do you think? Note: in some close future, I'd want to write up the main rationale of my approach somewhere. But it boils down mainly to: 1. micro-compressions <https://github.com/sitegui/space-efficient-quantile/blob/94b6d4c9d7d53362004d963e5867d2ea148291ac/src/modified_gk/samples_tree/node.rs#L120-L166>: do not insert at every sample, discard them earlier instead of issuing a full compression with a regular interval as GK originally does 2. B-tree structure instead of Vec: to speed up insertion on arbitrary positions The memory bound is |5 / epsilon| samples <https://github.com/sitegui/space-efficient-quantile/blob/94b6d4c9d7d53362004d963e5867d2ea148291ac/src/modified_gk/summary.rs#L39>. Best regards, — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#26?email_source=notifications&email_token=AAAA36ZOOA2SFWVDAP6MIS3QURMNVA5CNFSM4JPJY4DKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H2PGKEA>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAA36YAGO6SXNJJKRINWMDQURMNVANCNFSM4JPJY4DA>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Ultra fast modified Greenwald-khanna 💨 #26

[Proposal] Ultra fast modified Greenwald-khanna 💨 #26

sitegui commented Nov 19, 2019

blt commented Nov 20, 2019 via email

[Proposal] Ultra fast modified Greenwald-khanna 💨 #26

[Proposal] Ultra fast modified Greenwald-khanna 💨 #26

Comments

sitegui commented Nov 19, 2019

blt commented Nov 20, 2019 via email