Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_table margins=True default aggfunc='mean' does integer division #24893

Closed
gitgithan opened this issue Jan 24, 2019 · 1 comment · Fixed by #28248
Closed

pivot_table margins=True default aggfunc='mean' does integer division #24893

gitgithan opened this issue Jan 24, 2019 · 1 comment · Fixed by #28248
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@gitgithan
Copy link

Code Sample, a copy-pastable example if possible

Code

df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 
                   'a':[4,5,1,3], 'b':[6,10,3,11]},index=['one','two','three','four'])
pd.pivot_table(df,index='State',margins=True)

Output

	a	b
State		
Florida	2.00	7
Texas	4.50	8
All	3.25	7

Problem description

The margin value for the b column is 7 (calculated from default aggfunc mean of 7+8) when i expect 7.5.

This issue seems to be opposite of issue #17013 which complains of integers becoming floats

Expected Output

P.s I changed 6 to 6.0 in to create desired output.
Code

df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 
                   'a':[4,5,1,3], 'b':[6.0,10,3,11]},index=['one','two','three','four'])
pd.pivot_table(df,index='State',margins=True)

Desired Output (For the margin value only, i don't necessarily require 7, 8 to become 7.0, 8.0)

	a	b
State		
Florida	2.00	7.0
Texas	4.50	8.0
All	3.25	7.5

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.8.0
pip: 18.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.7.0

@mroeschke mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Numeric Operations Arithmetic, Comparison, and Logical operations labels May 27, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 1, 2019
@mabelvj
Copy link
Contributor

mabelvj commented Sep 1, 2019

Is it really a bug or is it expected that since the result fo the columns are integer the margin are integer too?

I see that there are tests referencing that np.means of ints are casted back into ints:

@pytest.mark.xfail(reason="GH#17035 (np.mean of ints is casted back to ints)")

@pytest.mark.xfail(reason="GH#17035 (np.mean of ints is casted back to ints)")

However, giving that for the aggregations in the rows, floats are kept when the np.mean of integers is a float, it does not make sense that this behavior does not hold for the margins.

mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 1, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 3, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 3, 2019
@jreback jreback added this to the 1.0 milestone Sep 4, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 5, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 5, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 5, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 5, 2019
mabelvj added a commit to mabelvj/pandas that referenced this issue Sep 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants