Opt-in speedy support for grouped pandas
Features
- Implementation of fast mutate, filter, and summarize using CallTreeLocal (#134). For even just a couple thousand groups, the fast methods are close to optimal hand-written pandas, and the slow versions are almost 1000x slower :o.
- fixed current grouped pandas mutate to preserve row order (#139)
- laid down tests of all supported series methods, currently skipping SQL backends (but ready to go!)
- put up some very basic documentation (#145)
- wrote an ADR on the rational for fast groupby (#135)
Note that CallTreeLocal has new options, allowing it to look up based on chained attributes (e.g. look for an entry named "dt.year", and override custom function calls.).
I still need to finish support for user defined operations and some light siu refactoring.
Breaking changes
- Removed the rm_attr argument from CallTreeLocal, since converting subattrs like
dt.year
will consumedt
anyway (can't imagine a situation where we'd want to keep it, and couldn't do that in the translator function)
Demo
from siuba.experimental.pd_groups import fast_mutate, fast_filter, fast_summarize
from siuba import *
from siuba.data import mtcars
g_cars = mtcars.groupby(['cyl', 'gear'])
fast_mutate(g_cars, _.hp - _.hp.mean())