Releases · machow/siuba

sql supports custom join conditions via sql_on (#202)
siuba.series.spec now includes all Series methods, even unsupported ones (#209)
the spec also now is derived from the file siuba/series/spec.yml (#211)
siu Symbolic is no longer falsey (#210)
added new verb top_n (#222)
added vector functions ceil_date and floor_date to siuba.experimental.datetime (#222)

QA

re-enabled testing of example jupyter notebooks (#206)

Assets 2

17 Feb 23:53

machow

v0.0.17

cf72cf2

Add fct_lump prop argument, fix fast grouped summarize

Fixes

added more fast grouped method tests, and fixed fast summarize (#197)

Features

support prop argument in fct_lump (#195)

Assets 2

11 Feb 02:28

machow

v0.0.16

35d4e1c

fix if_else, remove psycopg2 dependency

Fixes

if_else doesn't try to coerce to new type at end (#179)
removed psycopg2 dependency (causes install to fail if user does not have postgres) #189

Assets 2

08 Feb 04:28

machow

v0.0.15

d5c9e4b

Fix nest function to support pandas v1.0.0

Fixes nest raising the error "TypeError: copy() takes no keyword arguments". Nest now uses a more principled approach to splitting a grouped DataFrame, and creating a list of sub frames! (see #182)

Also fixed doc build, by not trying to run notebooks starting with draft-. (#186)

Assets 2

08 Feb 04:22

machow

v0.0.14

08fe90d

Support for user defined functions (UDFs)

New Feature: support user defined functions (#146)

Support for user defined functions (UDFs). Note that these require annotating the return type. For more on the theory behind these see ADR-003.

from siuba.siu import symbolic_dispatch
from pandas.core.groupby import SeriesGroupBy, GroupBy
from pandas import Series

@symbolic_dispatch(cls = Series)
def cummean(x):
    """Return a same-length array, containing the cumulative mean."""
    return x.expanding().mean()


@cummean.register(SeriesGroupBy)
def _cummean_grouped(x) -> SeriesGroupBy:
    grouper = x.grouper
    n_entries = x.obj.notna().groupby(grouper).cumsum()

    res = x.cumsum() / n_entries

    return res.groupby(grouper)

from siuba import _, mutate
from siuba.data import mtcars

# a pandas DataFrameGroupBy object
g_cyl = mtcars.groupby("cyl")

mutate(g_students, cumul_mean = cummean(_.score))

Support for many methods in vector.py, using UDFs (#158)

Bug Fixes

Fix regression where .str wasn't being removed when processing siu expressions for SQL (#159)
Grouped filter now preserves order
Verbs now tested to preserve original index (d938ab3)

Tests

Add many more versions of python and pandas to travis CI test matrix (#161)

Assets 2

29 Oct 00:30

machow

v0.0.13

ca35930

Opt-in speedy support for grouped pandas

Features

Implementation of fast mutate, filter, and summarize using CallTreeLocal (#134). For even just a couple thousand groups, the fast methods are close to optimal hand-written pandas, and the slow versions are almost 1000x slower :o.
fixed current grouped pandas mutate to preserve row order (#139)
laid down tests of all supported series methods, currently skipping SQL backends (but ready to go!)
put up some very basic documentation (#145)
wrote an ADR on the rational for fast groupby (#135)

Note that CallTreeLocal has new options, allowing it to look up based on chained attributes (e.g. look for an entry named "dt.year", and override custom function calls.).

I still need to finish support for user defined operations and some light siu refactoring.

Breaking changes

Removed the rm_attr argument from CallTreeLocal, since converting subattrs like dt.year will consume dt anyway (can't imagine a situation where we'd want to keep it, and couldn't do that in the translator function)

Demo

from siuba.experimental.pd_groups import fast_mutate, fast_filter, fast_summarize
from siuba import *
from siuba.data import mtcars

g_cars = mtcars.groupby(['cyl', 'gear'])

fast_mutate(g_cars, _.hp - _.hp.mean())

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes

Features

QA

Fixes

Features

Fixes

New Feature: support user defined functions (#146)

Bug Fixes

Tests

Releases: machow/siuba

Experimental Symbolic autocompletion

Fix lhs ops, support kwargs in sql count

Small fix for summarize, w/ Series results

Small update for docs: Call.map_replace and cars data

top_n, floor_date, custom sql joins, and full method spec

Fixes

Features

QA

Add fct_lump prop argument, fix fast grouped summarize

Fixes

Features

fix if_else, remove psycopg2 dependency

Fixes

Fix nest function to support pandas v1.0.0

Support for user defined functions (UDFs)

New Feature: support user defined functions (#146)

Bug Fixes

Tests

Opt-in speedy support for grouped pandas