We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For a grouped summarize, when a grouping column...
AFAICT setting groupby(..., dropna=False) resolves this (cf #251)
groupby(..., dropna=False)
cars6 = cars.copy() cars6["cyl"] = np.nan cars6 >> group_by(_.cyl, _.hp) >> summarize(res = _.mpg.mean())
Raises
ValueError: cannot insert cyl, already exists
ValueError Traceback (most recent call last) Cell In [23], line 4 1 cars6 = cars.copy() 2 cars6["cyl"] = np.nan ----> 4 cars6 >> group_by(_.cyl, _.hp) >> summarize(res = _.mpg.mean()) File ~/.virtualenvs/siuba/lib/python3.8/site-packages/siuba/siu/calls.py:214, in Call.__rrshift__(self, x) 210 if isinstance(strip_symbolic(x), (Call)): 211 # only allow non-calls (i.e. data) on the left. 212 raise TypeError() --> 214 return self(x) File ~/.virtualenvs/siuba/lib/python3.8/site-packages/siuba/siu/calls.py:189, in Call.__call__(self, x) 187 return operator.getitem(inst, *rest) 188 elif self.func == "__call__": --> 189 return getattr(inst, self.func)(*rest, **kwargs) 191 # in normal case, get method to call, and then call it 192 f_op = getattr(operator, self.func) File ~/.pyenv/versions/3.8.12/lib/python3.8/functools.py:875, in singledispatch.<locals>.wrapper(*args, **kw) 871 if not args: 872 raise TypeError(f'{funcname} requires at least ' 873 '1 positional argument') --> 875 return dispatch(args[0].__class__)(*args, **kw) File ~/.virtualenvs/siuba/lib/python3.8/site-packages/siuba/dply/verbs.py:564, in _summarize(__data, *args, **kwargs) 561 df = __data.apply(df_summarize, *args, **kwargs) 563 group_by_lvls = list(range(df.index.nlevels - 1)) --> 564 out = df.reset_index(group_by_lvls) 565 out.index = pd.RangeIndex(df.shape[0]) 567 return out File ~/.virtualenvs/siuba/lib/python3.8/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs) 325 if len(args) > num_allow_args: 326 warnings.warn( 327 msg.format(arguments=_format_argument_list(allow_args)), 328 FutureWarning, 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(*args, **kwargs) File ~/.virtualenvs/siuba/lib/python3.8/site-packages/pandas/core/frame.py:6350, in DataFrame.reset_index(self, level, drop, inplace, col_level, col_fill, allow_duplicates, names) 6344 if lab is not None: 6345 # if we have the codes, extract the values with a mask 6346 level_values = algorithms.take( 6347 level_values, lab, allow_fill=True, fill_value=lev._na_value 6348 ) -> 6350 new_obj.insert( 6351 0, 6352 name, 6353 level_values, 6354 allow_duplicates=allow_duplicates, 6355 ) 6357 new_obj.index = new_index 6358 if not inplace: File ~/.virtualenvs/siuba/lib/python3.8/site-packages/pandas/core/frame.py:4806, in DataFrame.insert(self, loc, column, value, allow_duplicates) 4800 raise ValueError( 4801 "Cannot specify 'allow_duplicates=True' when " 4802 "'self.flags.allows_duplicate_labels' is False." 4803 ) 4804 if not allow_duplicates and column in self.columns: 4805 # Should this be a different kind of error?? -> 4806 raise ValueError(f"cannot insert {column}, already exists") 4807 if not isinstance(loc, int): 4808 raise TypeError("loc must be int") ValueError: cannot insert cyl, already exists
cars5 = cars.copy() cars5["cyl"] = [1] + [np.nan] * (len(cars) - 1) cars5 >> group_by(_.cyl, _.hp) >> summarize(res = _.mpg.mean())
Output
Note there's no cyl or hp column on the result
The text was updated successfully, but these errors were encountered:
Addressed in v0.4.2
Sorry, something went wrong.
No branches or pull requests
For a grouped summarize, when a grouping column...
AFAICT setting
groupby(..., dropna=False)
resolves this (cf #251)Example: all NA levels raises an error, since grouping columns on result and index
Raises
Full traceback
Example: 1 non NA level outputs a table w/o grouping columns
Output
Note there's no cyl or hp column on the result
The text was updated successfully, but these errors were encountered: