Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

Closed
2 of 3 tasks
dechamps opened this issue Aug 2, 2020 · 11 comments · Fixed by #46199
Closed
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform, Map Compat pandas objects compatability with Numpy or Python functions good first issue Needs Tests Unit test(s) needed to prevent regressions Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@dechamps
Copy link

dechamps commented Aug 2, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

print(type(
  pd.DataFrame([['foo']])
  .apply(lambda col: np.array('bar'))
  .iloc[0]))

Output of Pandas 1.0.5

<class 'str'>

Output of Pandas 1.1.0

<class 'numpy.ndarray'>

It is not clear to me if this behaviour change is intended or not. I couldn't find anything obvious in the release notes.

I discovered it because it broke my code, which returns the output of DataFrame.unique().squeeze() (with the intent of extracting a scalar) in an apply() func. Arguably my code is wrong, because DataFrame.unique() returns an np.ndarray, and calling squeeze() on that always returns an ndarray, never a scalar - the correct call would have been item(). Up until now, Pandas would "hide" the bug because it would take the one-element ndarray and treat it as a scalar. Not anymore, it seems. My code ultimately fails downstream of the call as it tries to hash the resulting string, which doesn't work if the string is actually a 1-element ndarray.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
Version : #1 SMP Wed Feb 19 05:26:34 PST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 49.2.0
Cython : 0.29.21
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.18
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.48.0

@dechamps dechamps added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020
@simonjayhawkins
Copy link
Member

Thanks @dechamps for the report. As this appears to be an undocumented change in behaviour, i'll mark as a regression for now pending further discussion/investigation

@simonjayhawkins simonjayhawkins added Apply Apply, Aggregate, Transform, Map Compat pandas objects compatability with Numpy or Python functions Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Aug 2, 2020
@simonjayhawkins
Copy link
Member

The commit that caused this change is #34909. (This also caused another regression #35462) cc @jbrockmendel

91802a9 is the first bad commit
commit 91802a9
Author: jbrockmendel [email protected]
Date: Thu Jun 25 16:06:10 2020 -0700

PERF: avoid creating many Series in apply_standard (#34909)

@dechamps
Copy link
Author

dechamps commented Aug 2, 2020

I also just found #35518 which is somewhat similar in that it concerns values returned from the apply func being interpreted in a different way in Pandas 1.1.0.

dechamps added a commit to dechamps/LoudspeakerExplorer that referenced this issue Aug 2, 2020
The previous code would break when Pandas is upgraded from 1.0.5 to
1.1.0. See pandas-dev/pandas#35517
@jbrockmendel
Copy link
Member

If anything the 1.1.0 behavior looks more correct to me.

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.1, 1.1.2 Aug 20, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

@rhshadrach
Copy link
Member

rhshadrach commented Sep 20, 2020

Agreeing with @jbrockmendel, apply unpacks list-like (e.g. replacing np.ndarray('bar') with np.ndarray(['bar'])) but it seems odd to me to extend this behavior to certain types of scalars.

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.3, 1.1.4 Oct 5, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline and discussion ongoing.

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

If anything the 1.1.0 behavior looks more correct to me.

@jbrockmendel is this issue actionable?

@jbrockmendel
Copy link
Member

Probably not, and definitely not before tomorrow.

@simonjayhawkins
Copy link
Member

Probably not, and definitely not before tomorrow.

yeah, didn't expect a fix. wondering whether to apply a closing candidate label.

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Oct 29, 2020
@jreback jreback removed this from the 1.1.5 milestone Nov 25, 2020
@jreback jreback added this to the Contributions Welcome milestone Nov 25, 2020
@mroeschke
Copy link
Member

Sounds like this behavior is the intended (better) behavior. I supposed we could use a test for this.

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Closing Candidate May be closeable, needs more eyeballs labels Aug 8, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.5 Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Compat pandas objects compatability with Numpy or Python functions good first issue Needs Tests Unit test(s) needed to prevent regressions Regression Functionality that used to work in a prior pandas version
Projects
None yet
6 participants