-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517
Comments
Thanks @dechamps for the report. As this appears to be an undocumented change in behaviour, i'll mark as a regression for now pending further discussion/investigation |
The commit that caused this change is #34909. (This also caused another regression #35462) cc @jbrockmendel 91802a9 is the first bad commit
|
I also just found #35518 which is somewhat similar in that it concerns values returned from the apply func being interpreted in a different way in Pandas 1.1.0. |
The previous code would break when Pandas is upgraded from 1.0.5 to 1.1.0. See pandas-dev/pandas#35517
If anything the 1.1.0 behavior looks more correct to me. |
moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline |
Agreeing with @jbrockmendel, apply unpacks list-like (e.g. replacing |
moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline and discussion ongoing. |
moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline
@jbrockmendel is this issue actionable? |
Probably not, and definitely not before tomorrow. |
yeah, didn't expect a fix. wondering whether to apply a closing candidate label. |
Sounds like this behavior is the intended (better) behavior. I supposed we could use a test for this. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Output of Pandas 1.0.5
Output of Pandas 1.1.0
It is not clear to me if this behaviour change is intended or not. I couldn't find anything obvious in the release notes.
I discovered it because it broke my code, which returns the output of
DataFrame.unique().squeeze()
(with the intent of extracting a scalar) in anapply()
func. Arguably my code is wrong, becauseDataFrame.unique()
returns annp.ndarray
, and callingsqueeze()
on that always returns an ndarray, never a scalar - the correct call would have beenitem()
. Up until now, Pandas would "hide" the bug because it would take the one-element ndarray and treat it as a scalar. Not anymore, it seems. My code ultimately fails downstream of the call as it tries to hash the resulting string, which doesn't work if the string is actually a 1-element ndarray.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
Version : #1 SMP Wed Feb 19 05:26:34 PST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 49.2.0
Cython : 0.29.21
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.18
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.48.0
The text was updated successfully, but these errors were encountered: