Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Adapt value_counts behavior to match pandas-2.x #12835

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Feb 23, 2023

Description

This PR updates value_counts behavior to match pandas-2.x, the result name will be count (or proportion if normalize=True is passed), and the index will be named after the original object name. This PR also fixes two dtype APIs that are breaking changes on pandas side.

Here are the pytests that were run locally to test these changes :

(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_dataframe.py::test_value_counts
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /nvme/0/pgali/cudf/python/cudf/cudf/tests, configfile: pytest.ini
plugins: anyio-3.6.2, benchmark-4.0.0, xdist-3.2.0, cov-4.0.0, hypothesis-6.68.2, cases-3.6.13
collected 64 items                                                                                                                                                                 

python/cudf/cudf/tests/test_dataframe.py ................................................................                                                                    [100%]

================================================================================ 64 passed in 2.31s ================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_value_counts
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /nvme/0/pgali/cudf/python/cudf/cudf/tests, configfile: pytest.ini
plugins: anyio-3.6.2, benchmark-4.0.0, xdist-3.2.0, cov-4.0.0, hypothesis-6.68.2, cases-3.6.13
collected 4 items                                                                                                                                                                  

python/cudf/cudf/tests/test_series.py ....                                                                                                                                   [100%]

================================================================================ 4 passed in 1.25s =================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_datetime_value_counts
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /nvme/0/pgali/cudf/python/cudf/cudf/tests, configfile: pytest.ini
plugins: anyio-3.6.2, benchmark-4.0.0, xdist-3.2.0, cov-4.0.0, hypothesis-6.68.2, cases-3.6.13
collected 24 items                                                                                                                                                                 

python/cudf/cudf/tests/test_series.py ........................                                                                                                               [100%]

================================================================================ 24 passed in 1.32s ================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_categorical_value_counts
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /nvme/0/pgali/cudf/python/cudf/cudf/tests, configfile: pytest.ini
plugins: anyio-3.6.2, benchmark-4.0.0, xdist-3.2.0, cov-4.0.0, hypothesis-6.68.2, cases-3.6.13
collected 12 items                                                                                                                                                                 

python/cudf/cudf/tests/test_series.py ............                                                                                                                           [100%]

================================================================================ 12 passed in 1.28s ================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_value_counts_bins
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /nvme/0/pgali/cudf/python/cudf/cudf/tests, configfile: pytest.ini
plugins: anyio-3.6.2, benchmark-4.0.0, xdist-3.2.0, cov-4.0.0, hypothesis-6.68.2, cases-3.6.13
collected 3 items                                                                                                                                                                  

python/cudf/cudf/tests/test_series.py ...                                                                                                                                    [100%]

================================================================================ 3 passed in 1.22s =================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_value_counts_bins_dropna
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /nvme/0/pgali/cudf/python/cudf/cudf/tests, configfile: pytest.ini
plugins: anyio-3.6.2, benchmark-4.0.0, xdist-3.2.0, cov-4.0.0, hypothesis-6.68.2, cases-3.6.13
collected 6 items                                                                                                                                                                  

python/cudf/cudf/tests/test_series.py ......                                                                                                                                 [100%]

================================================================================ 6 passed in 1.25s =================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ pytest python/cudf/cudf/tests/test_series.py::test_series_value_counts_optional_arguments
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /nvme/0/pgali/cudf/python/cudf/cudf/tests, configfile: pytest.ini
plugins: anyio-3.6.2, benchmark-4.0.0, xdist-3.2.0, cov-4.0.0, hypothesis-6.68.2, cases-3.6.13
collected 8 items                                                                                                                                                                  

python/cudf/cudf/tests/test_series.py ........                                                                                                                               [100%]

================================================================================ 8 passed in 1.20s =================================================================================
(cudfdev) pgali@dt07:/nvme/0/pgali/cudf$ conda list | grep "pandas"
pandas                    2.0.0rc0                 pypi_0    pypi

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function breaking Breaking change labels Feb 23, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner February 23, 2023 20:40
@galipremsagar galipremsagar self-assigned this Feb 23, 2023
@galipremsagar galipremsagar requested review from vyasr, skirui-source, shwina and mroeschke and removed request for a team February 23, 2023 20:40
@vyasr
Copy link
Contributor

vyasr commented Feb 23, 2023

Could you update the PR description to reflect what the actual behavior change is (the name of the output is now set to "proportion" or "count" AFAICT)?

@galipremsagar
Copy link
Contributor Author

Could you update the PR description to reflect what the actual behavior change is (the name of the output is now set to "proportion" or "count" AFAICT)?

Yup, that's correct. Updated the PR description.

@vyasr
Copy link
Contributor

vyasr commented Feb 23, 2023

I assume tests are failing due to incompatibilities with current conda versions, may need to add some more conditionals?

@vyasr
Copy link
Contributor

vyasr commented Feb 23, 2023

Forgot that each of these PRs is just a partial fix for pandas 2.0 going into a feature branch, so tests aren't expected to pass. No worries here then.

@galipremsagar
Copy link
Contributor Author

Thanks, @mroeschke & @vyasr for the reviews! Merging.

@galipremsagar galipremsagar merged commit 14f54ac into rapidsai:pandas_2.0_feature_branch Feb 23, 2023
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants