BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

dechamps · 2020-08-02T10:51:12Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

print(type(
  pd.DataFrame([['foo']])
  .apply(lambda col: np.array('bar'))
  .iloc[0]))

Output of Pandas 1.0.5

<class 'str'>

Output of Pandas 1.1.0

<class 'numpy.ndarray'>

It is not clear to me if this behaviour change is intended or not. I couldn't find anything obvious in the release notes.

I discovered it because it broke my code, which returns the output of DataFrame.unique().squeeze() (with the intent of extracting a scalar) in an apply() func. Arguably my code is wrong, because DataFrame.unique() returns an np.ndarray, and calling squeeze() on that always returns an ndarray, never a scalar - the correct call would have been item(). Up until now, Pandas would "hide" the bug because it would take the one-element ndarray and treat it as a scalar. Not anymore, it seems. My code ultimately fails downstream of the call as it tries to hash the resulting string, which doesn't work if the string is actually a 1-element ndarray.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : d9fff27
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
Version : #1 SMP Wed Feb 19 05:26:34 PST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 49.2.0
Cython : 0.29.21
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.18
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2020-08-02T11:51:14Z

Thanks @dechamps for the report. As this appears to be an undocumented change in behaviour, i'll mark as a regression for now pending further discussion/investigation

simonjayhawkins · 2020-08-02T12:28:39Z

The commit that caused this change is #34909. (This also caused another regression #35462) cc @jbrockmendel

91802a9 is the first bad commit
commit 91802a9
Author: jbrockmendel [email protected]
Date: Thu Jun 25 16:06:10 2020 -0700

PERF: avoid creating many Series in apply_standard (#34909)

dechamps · 2020-08-02T13:06:07Z

I also just found #35518 which is somewhat similar in that it concerns values returned from the apply func being interpreted in a different way in Pandas 1.1.0.

The previous code would break when Pandas is upgraded from 1.0.5 to 1.1.0. See pandas-dev/pandas#35517

jbrockmendel · 2020-08-10T21:02:37Z

If anything the 1.1.0 behavior looks more correct to me.

simonjayhawkins · 2020-09-07T09:30:01Z

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

rhshadrach · 2020-09-20T17:40:47Z

Agreeing with @jbrockmendel, apply unpacks list-like (e.g. replacing np.ndarray('bar') with np.ndarray(['bar'])) but it seems odd to me to extend this behavior to certain types of scalars.

simonjayhawkins · 2020-10-05T12:49:24Z

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline and discussion ongoing.

simonjayhawkins · 2020-10-29T14:20:41Z

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

If anything the 1.1.0 behavior looks more correct to me.

@jbrockmendel is this issue actionable?

jbrockmendel · 2020-10-29T15:23:37Z

Probably not, and definitely not before tomorrow.

simonjayhawkins · 2020-10-29T15:25:17Z

Probably not, and definitely not before tomorrow.

yeah, didn't expect a fix. wondering whether to apply a closing candidate label.

mroeschke · 2021-08-08T19:43:34Z

Sounds like this behavior is the intended (better) behavior. I supposed we could use a test for this.

dechamps added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020

simonjayhawkins added Apply Apply, Aggregate, Transform, Map Compat pandas objects compatability with Numpy or Python functions Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020

simonjayhawkins added this to the 1.1.1 milestone Aug 2, 2020

dechamps mentioned this issue Aug 2, 2020

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518

Open

3 tasks

dechamps added a commit to dechamps/LoudspeakerExplorer that referenced this issue Aug 2, 2020

Use item(), not squeeze(), to get a scalar from ndarray.

1b5d401

The previous code would break when Pandas is upgraded from 1.0.5 to 1.1.0. See pandas-dev/pandas#35517

jbrockmendel mentioned this issue Aug 10, 2020

BUG: DataFrame.apply with func altering row in-place #35633

Merged

4 tasks

simonjayhawkins modified the milestones: 1.1.1, 1.1.2 Aug 20, 2020

simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020

simonjayhawkins modified the milestones: 1.1.3, 1.1.4 Oct 5, 2020

simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020

jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Oct 29, 2020

jreback removed this from the 1.1.5 milestone Nov 25, 2020

jreback added this to the Contributions Welcome milestone Nov 25, 2020

simonjayhawkins mentioned this issue Dec 21, 2020

RLS: 1.2 #37784

Closed

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Closing Candidate May be closeable, needs more eyeballs labels Aug 8, 2021

weikhor mentioned this issue Mar 2, 2022

Test : Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #46199

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 1.5 Mar 4, 2022

mroeschke closed this as completed in #46199 Mar 7, 2022

rhshadrach mentioned this issue Dec 31, 2024

TST(string dtype): Resolve xfail with apply returning an ndarray #60636

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

dechamps commented Aug 2, 2020

INSTALLED VERSIONS

simonjayhawkins commented Aug 2, 2020

simonjayhawkins commented Aug 2, 2020

dechamps commented Aug 2, 2020

jbrockmendel commented Aug 10, 2020

simonjayhawkins commented Sep 7, 2020

rhshadrach commented Sep 20, 2020 •

edited

Loading

simonjayhawkins commented Oct 5, 2020

simonjayhawkins commented Oct 29, 2020

jbrockmendel commented Oct 29, 2020

simonjayhawkins commented Oct 29, 2020

mroeschke commented Aug 8, 2021

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

Comments

dechamps commented Aug 2, 2020

Code Sample, a copy-pastable example

Output of Pandas 1.0.5

Output of Pandas 1.1.0

Output of pd.show_versions()

INSTALLED VERSIONS

simonjayhawkins commented Aug 2, 2020

simonjayhawkins commented Aug 2, 2020

dechamps commented Aug 2, 2020

jbrockmendel commented Aug 10, 2020

simonjayhawkins commented Sep 7, 2020

rhshadrach commented Sep 20, 2020 • edited Loading

simonjayhawkins commented Oct 5, 2020

simonjayhawkins commented Oct 29, 2020

jbrockmendel commented Oct 29, 2020

simonjayhawkins commented Oct 29, 2020

mroeschke commented Aug 8, 2021

Output of `pd.show_versions()`

rhshadrach commented Sep 20, 2020 •

edited

Loading