Release Opt-in speedy support for grouped pandas · machow/siuba

Features

Implementation of fast mutate, filter, and summarize using CallTreeLocal (#134). For even just a couple thousand groups, the fast methods are close to optimal hand-written pandas, and the slow versions are almost 1000x slower :o.
fixed current grouped pandas mutate to preserve row order (#139)
laid down tests of all supported series methods, currently skipping SQL backends (but ready to go!)
put up some very basic documentation (#145)
wrote an ADR on the rational for fast groupby (#135)

Note that CallTreeLocal has new options, allowing it to look up based on chained attributes (e.g. look for an entry named "dt.year", and override custom function calls.).

I still need to finish support for user defined operations and some light siu refactoring.

Breaking changes

Removed the rm_attr argument from CallTreeLocal, since converting subattrs like dt.year will consume dt anyway (can't imagine a situation where we'd want to keep it, and couldn't do that in the translator function)

Demo

from siuba.experimental.pd_groups import fast_mutate, fast_filter, fast_summarize
from siuba import *
from siuba.data import mtcars

g_cars = mtcars.groupby(['cyl', 'gear'])

fast_mutate(g_cars, _.hp - _.hp.mean())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opt-in speedy support for grouped pandas