[BUG] shape differs for the same query between cudf and dask_cudf #8409

jangorecki · 2021-05-31T20:03:22Z

Describe the bug
Attribute shape differs when data are having NAs for the same query when run on cudf vs dask_cudf.

Steps/Code to reproduce bug

This time nice example

cat > data.csv <<EOL
id1,id2,id3,id4,id5,id6,v1,v2,v3
,id016,id0000042202,15,24,5971,5,11,37.211254
id039,id045,id0000029558,40,49,,5,4,48.951141
EOL
cat > cu.py <<EOL
import cudf as cu
x = cu.read_csv("data.csv", header=0, dtype=['str','str','str','int32','int32','int32','int32','int32','float64'])
x['id1'] = x['id1'].astype('category')
x['id2'] = x['id2'].astype('category')
x['id3'] = x['id3'].astype('category')
ans = x.groupby('id1', as_index=False, dropna=False).agg({'v1':'sum'})
print(ans.shape[0], flush=True)
EOL
cat > dc.py <<EOL
import dask_cudf as dc
x = dc.read_csv("data.csv", header=0, dtype=['str','str','str','int32','int32','int32','int32','int32','float64'])
x['id1'] = x['id1'].astype('category')
x['id2'] = x['id2'].astype('category')
x['id3'] = x['id3'].astype('category')
x = x.persist()
ans = x.groupby('id1', as_index=False, dropna=False).agg({'v1':'sum'}).compute()
print(ans.shape[0], flush=True)
EOL

execute

python cu.py
#2
python dc.py
#1

Expected behavior
shape should not depend on cudf or dask_cudf.

Expected behavior
Query should complete successfully.

Environment overview (please complete the following information)

Environment location: Bare-metal
Method of cuDF install: conda

Environment details

Click here to see environment details

 **git***
 commit de056ca926109569998c99b68213de04b2230977 (HEAD -> cudf-use-dask, upstream/cudf-use-dask)
 Author: jangorecki <[email protected]>
 Date:   Thu May 27 13:14:08 2021 +0200
 
 median not implemented in dask cudf?
 **git submodules***
 
 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=16.04
 DISTRIB_CODENAME=xenial
 DISTRIB_DESCRIPTION="Ubuntu 16.04.7 LTS"
 NAME="Ubuntu"
 VERSION="16.04.7 LTS (Xenial Xerus)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 16.04.7 LTS"
 VERSION_ID="16.04"
 HOME_URL="http://www.ubuntu.com/"
 SUPPORT_URL="http://help.ubuntu.com/"
 BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
 VERSION_CODENAME=xenial
 UBUNTU_CODENAME=xenial
 Linux mr-dl11 4.15.0-122-generic #124~16.04.1-Ubuntu SMP Thu Oct 15 16:08:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***
 Thu May 27 04:23:32 2021
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  GeForce GTX 108...  On   | 00000000:02:00.0 Off |                  N/A |
 | 23%   34C    P8    16W / 250W |      1MiB / 11178MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  GeForce GTX 108...  On   | 00000000:81:00.0 Off |                  N/A |
 | 23%   38C    P8    12W / 250W |      1MiB / 11178MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |  No running processes found                                                 |
 +-----------------------------------------------------------------------------+
 
 ***CPU***
 Architecture:          x86_64
 CPU op-mode(s):        32-bit, 64-bit
 Byte Order:            Little Endian
 CPU(s):                40
 On-line CPU(s) list:   0-39
 Thread(s) per core:    2
 Core(s) per socket:    10
 Socket(s):             2
 NUMA node(s):          2
 Vendor ID:             GenuineIntel
 CPU family:            6
 Model:                 79
 Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
 Stepping:              1
 CPU MHz:               2595.848
 CPU max MHz:           3100.0000
 CPU min MHz:           1200.0000
 BogoMIPS:              4401.79
 Virtualization:        VT-x
 L1d cache:             32K
 L1i cache:             32K
 L2 cache:              256K
 L3 cache:              25600K
 NUMA node0 CPU(s):     0-9,20-29
 NUMA node1 CPU(s):     10-19,30-39
 Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
 
 ***CMake***
 /usr/local/bin/cmake
 cmake version 3.13.2
 
 CMake suite maintained and supported by Kitware (kitware.com/cmake).
 
 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
 Copyright (C) 2017 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 
 ***nvcc***
 /usr/local/cuda-9.2/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2018 NVIDIA Corporation
 Built on Tue_Jun_12_23:07:04_CDT_2018
 Cuda compilation tools, release 9.2, V9.2.148
 
 ***Python***
 /usr/bin/python
 Python 2.7.12
 
 ***Environment Variables***
 PATH                            : /usr/local/cuda-9.2/bin:/home/jan/bin:/home/jan/.local/bin:/home/jan/bin:/home/jan/.local/bin:/home/jan/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/mapd:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:/opt/julia-1.5.3/bin:/opt/julia-1.6.0/bin
 LD_LIBRARY_PATH                 : /usr/local/cuda-9.2/lib64:/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    :
 PYTHON_PATH                     :
 
 ***conda packages***
 /home/jan/anaconda3/condabin/conda
 # packages in environment at /home/jan/anaconda3:
 #
 # Name                    Version                   Build  Channel
 _ipyw_jlab_nb_ext_conf    0.1.0                    py36_0
 _libgcc_mutex             0.1                        main
 alabaster                 0.7.12                     py_0    conda-forge
 anaconda-client           1.7.1                      py_0    conda-forge
 anaconda-navigator        1.9.2                    py36_0
 anaconda-project          0.8.2                      py_1    conda-forge
 appdirs                   1.4.3                      py_1    conda-forge
 arrow-cpp                 0.10.0           py36h70250a7_0    conda-forge
 asn1crypto                0.24.0                py36_1003    conda-forge
 astroid                   2.0.3                 py36_1000    conda-forge
 astropy                   3.0.5            py36h470a237_0    conda-forge
 atomicwrites              1.2.1                      py_0    conda-forge
 attrs                     18.2.0                     py_0    conda-forge
 automat                   0.7.0                      py_1    conda-forge
 babel                     2.6.0                      py_1    conda-forge
 backcall                  0.1.0                      py_0    conda-forge
 backports                 1.0                        py_2    conda-forge
 backports.os              0.1.1                 py36_1000    conda-forge
 backports.shutil_get_terminal_size 1.0.0                      py_3    conda-forge
 beautifulsoup4            4.6.3                 py36_1000    conda-forge
 bitarray                  0.8.3            py36h470a237_0    conda-forge
 bkcharts                  0.2                      py36_0    conda-forge
 blas                      1.0                         mkl
 blaze                     0.11.3                   py36_0    conda-forge
 bleach                    3.0.2                    pypi_0    pypi
 blinker                   1.4                        py_1    conda-forge
 blosc                     1.14.4               hdbcaa40_0
 bokeh                     1.0.1                 py36_1000    conda-forge
 boost-cpp                 1.67.0               h3a22d5f_0    conda-forge
 boto                      2.49.0                   py36_0
 boto3                     1.9.38                     py_0    conda-forge
 botocore                  1.12.39                    py_0    conda-forge
 bottleneck                1.2.1            py36h7eb728f_1    conda-forge
 bz2file                   0.98                       py_0    conda-forge
 bzip2                     1.0.6                h14c3975_5
 ca-certificates           2020.1.1                      0
 cairo                     1.14.12              h8948797_3
 certifi                   2020.4.5.1               py38_0
 cffi                      1.14.0           py38he30daa8_1
 chardet                   3.0.4                 py38_1003
 click                     7.0                        py_0    conda-forge
 cloudpickle               0.6.1                      py_0    conda-forge
 clyent                    1.2.2                      py_1    conda-forge
 colorama                  0.4.0                      py_0    conda-forge
 conda                     4.8.3                    py38_0
 conda-build               3.16.2                   py36_0    conda-forge
 conda-env                 2.6.0                         1
 conda-package-handling    1.6.1            py38h7b6447c_0
 constantly                15.1.0                     py_0    conda-forge
 contextlib2               0.5.5                      py_2    conda-forge
 cryptography              2.9.2            py38h1ba5d50_0
 cryptography-vectors      2.3.1                 py36_1000    conda-forge
 curl                      7.61.0               h84994c4_0
 cycler                    0.10.0                     py_1    conda-forge
 cython                    0.28.5          py36hf484d3e_1000    conda-forge
 cytoolz                   0.9.0.1          py36h470a237_1    conda-forge
 dask                      0.20.0                     py_0    conda-forge
 dask-core                 0.20.0                     py_0    conda-forge
 datashape                 0.5.4                    py36_0    conda-forge
 dbus                      1.13.2               h714fa37_1
 decorator                 4.3.0                      py_0    conda-forge
 defusedxml                0.5.0                      py_1    conda-forge
 distributed               1.24.0                py36_1000    conda-forge
 docutils                  0.14                  py36_1001    conda-forge
 entrypoints               0.2.3                 py36_1002    conda-forge
 et_xmlfile                1.0.1                    py36_0    conda-forge
 expat                     2.2.6                he6710b0_0
 fastcache                 1.0.2            py36h470a237_1    conda-forge
 filelock                  3.0.10                     py_0    conda-forge
 flask                     1.0.2                      py_2    conda-forge
 flask-cors                3.0.6                      py_0    conda-forge
 fontconfig                2.13.0               h9420a91_0
 freetype                  2.9.1                h8a8886c_1
 fribidi                   1.0.5                h7b6447c_0
 gensim                    3.5.0                    py36_0    conda-forge
 get_terminal_size         1.0.0                haa9412d_0
 gevent                    1.3.7            py36h470a237_0    conda-forge
 glib                      2.56.2               hd408876_0
 glob2                     0.6                        py_0    conda-forge
 gmp                       6.1.2                h6c8ec71_1
 gmpy2                     2.0.8            py36hb705a9b_2    conda-forge
 graphite2                 1.3.12               h23475e2_2
 greenlet                  0.4.13                   py36_0    conda-forge
 gst-plugins-base          1.14.0               hbbd80ab_1
 gstreamer                 1.14.0               hb453b48_1
 h5py                      2.8.0            py36h7eb728f_3    conda-forge
 harfbuzz                  1.8.8                hffaf4a1_0
 hdf5                      1.10.2               hba1933b_1
 heapdict                  1.0.0                 py36_1000    conda-forge
 html5lib                  1.0.1                      py_0    conda-forge
 hyperlink                 17.3.1                     py_0    conda-forge
 icu                       58.2                 h9c2bf20_1
 idna                      2.9                        py_1
 imageio                   2.4.1                      py_0    conda-forge
 imagesize                 1.1.0                      py_0    conda-forge
 importlib_metadata        0.6                      py36_0    conda-forge
 incremental               17.5.0                     py_0    conda-forge
 intel-openmp              2019.0                      118
 ipykernel                 5.1.0              pyh24bf2e0_0    conda-forge
 ipython                   7.1.1           py36h24bf2e0_1000    conda-forge
 ipython_genutils          0.2.0                      py_1    conda-forge
 ipywidgets                7.4.2                      py_0    conda-forge
 isort                     4.3.4                 py36_1000    conda-forge
 itsdangerous              1.1.0                      py_0    conda-forge
 jbig                      2.1                  hdba287a_0
 jdcal                     1.4                        py_1    conda-forge
 jedi                      0.13.1                py36_1000    conda-forge
 jeepney                   0.4                        py_0    conda-forge
 jinja2                    2.10                       py_1    conda-forge
 jmespath                  0.9.3                      py_1    conda-forge
 jpeg                      9b                   h024ee3a_2
 jsonschema                3.0.0a3               py36_1000    conda-forge
 jupyter                   1.0.0                      py_1    conda-forge
 jupyter_client            5.2.3                      py_1    conda-forge
 jupyter_console           6.0.0                      py_0    conda-forge
 jupyter_core              4.4.0                      py_0    conda-forge
 jupyterlab                0.35.4                   py36_0    conda-forge
 jupyterlab_launcher       0.13.1                     py_2    conda-forge
 jupyterlab_server         0.2.0                      py_0    conda-forge
 keyring                   16.0.1                   py36_0    conda-forge
 kiwisolver                1.0.1            py36h2d50403_2    conda-forge
 lazy-object-proxy         1.3.1            py36h470a237_0    conda-forge
 ld_impl_linux-64          2.33.1               h53a641e_7
 libarchive                3.3.3                h823be47_0    conda-forge
 libcudf                   0.4.0                 cuda9.2_0    rapidsai
 libcudf_cffi              0.4.0            cuda9.2_py36_0    rapidsai
 libcurl                   7.61.0               h1ad7b7a_0
 libedit                   3.1.20181209         hc058e9b_0
 libffi                    3.3                  he6710b0_1
 libgcc-ng                 9.1.0                hdf63c60_0
 libgdf                    0.2.0                cuda9.2_95    rapidsai
 libgdf_cffi               0.2.0           cuda9.2_py36_95    rapidsai
 libgfortran-ng            7.3.0                hdf63c60_0
 libiconv                  1.15                 h470a237_3    conda-forge
 libpng                    1.6.34               hb9fc6fc_0
 libsodium                 1.0.16               h1bed415_0
 libssh2                   1.8.0                h9cfc8f7_4
 libstdcxx-ng              8.2.0                hdf63c60_1
 libtiff                   4.0.9                he85c1e1_2
 libtool                   2.4.6                h544aabb_3
 libuuid                   1.0.3                h1bed415_2
 libxcb                    1.13                 h1bed415_1
 libxml2                   2.9.8                h26e45fe_1
 libxslt                   1.1.32               h1312cb7_0
 llvmlite                  0.25.0           py36hf484d3e_0    numba
 locket                    0.2.0                      py_2    conda-forge
 lxml                      4.2.5            py36hc9114bc_0    conda-forge
 lzo                       2.10                 h49e0be7_2
 markupsafe                1.1.0            py36h470a237_0    conda-forge
 matplotlib                3.0.0            py36h5429711_0
 mccabe                    0.6.1                      py_1    conda-forge
 mistune                   0.8.4            py36h470a237_0    conda-forge
 mkl                       2019.0                      118
 mkl-service               1.1.2            py36h90e4bf4_5
 mkl_fft                   1.0.6                    py36_0    conda-forge
 mkl_random                1.0.2                    py36_0    conda-forge
 more-itertools            4.3.0                 py36_1000    conda-forge
 mpc                       1.1.0                h10f8cd9_1
 mpfr                      4.0.1                hdf1c602_3
 mpmath                    1.0.0                      py_1    conda-forge
 msgpack-python            0.5.6            py36h2d50403_3    conda-forge
 multipledispatch          0.6.0                      py_0    conda-forge
 navigator-updater         0.2.1                    py36_0
 nbconvert                 5.4.0                         1    conda-forge
 nbformat                  4.4.0                      py_1    conda-forge
 ncurses                   6.2                  he6710b0_1
 networkx                  2.2                        py_1    conda-forge
 nltk                      3.2.5                      py_0    conda-forge
 nose                      1.3.7                 py36_1002    conda-forge
 notebook                  5.7.0                 py36_1000    conda-forge
 numba                     0.41.0dev0      np114py36hf484d3e_176    numba
 numexpr                   2.6.8            py36hf8a1672_0    conda-forge
 numpy                     1.14.2           py36hdbf6ddf_0
 numpy-base                1.15.4           py36h81de0dd_0
 numpydoc                  0.8.0                      py_1    conda-forge
 nvstrings                 0.2.0            cuda9.2_py36_3    nvidia
 oauthlib                  2.1.0                      py_0    conda-forge
 odo                       0.5.1                      py_1    conda-forge
 olefile                   0.46                       py_0    conda-forge
 openpyxl                  2.5.9                      py_0    conda-forge
 openssl                   1.1.1g               h7b6447c_0
 packaging                 18.0                       py_0    conda-forge
 pandas                    0.20.3                   py36_1    conda-forge
 pandoc                    1.19.2.1             hea2e7c5_1
 pandocfilters             1.4.2                      py_1    conda-forge
 pango                     1.42.4               h049681c_0
 parquet-cpp               1.5.0.pre            h83d4a3d_0    conda-forge
 parso                     0.3.1                      py_0    conda-forge
 partd                     0.3.9                      py_0    conda-forge
 patchelf                  0.9                  hf484d3e_2
 path.py                   11.5.0                     py_0    conda-forge
 pathlib2                  2.3.2                 py36_1000    conda-forge
 patsy                     0.5.1                      py_0    conda-forge
 pcre                      8.42                 h439df22_0
 pep8                      1.7.1                      py_0    conda-forge
 pexpect                   4.6.0                 py36_1000    conda-forge
 pickleshare               0.7.5                 py36_1000    conda-forge
 pillow                    5.3.0            py36h34e0f95_0
 pip                       20.0.2                   py38_3
 pixman                    0.34.0               hceecf20_3
 pkginfo                   1.4.2                      py_1    conda-forge
 pluggy                    0.8.0                      py_0    conda-forge
 ply                       3.11                       py_1    conda-forge
 prometheus_client         0.4.2                      py_0    conda-forge
 prompt_toolkit            2.0.7                      py_0    conda-forge
 psutil                    5.4.8            py36h470a237_0    conda-forge
 ptyprocess                0.6.0                 py36_1000    conda-forge
 py                        1.7.0                      py_0    conda-forge
 pyarrow                   0.10.0           py36hfc679d8_0    conda-forge
 pyasn1                    0.4.4                      py_1    conda-forge
 pyasn1-modules            0.2.1                      py_0    conda-forge
 pycodestyle               2.4.0                      py_1    conda-forge
 pycosat                   0.6.3            py38h7b6447c_1
 pycparser                 2.20                       py_0
 pycrypto                  2.6.1            py36h470a237_2    conda-forge
 pycurl                    7.43.0.2         py36hb7f436b_0
 pyflakes                  2.0.0                      py_0    conda-forge
 pygments                  2.2.0                      py_1    conda-forge
 pyhamcrest                1.9.0                      py_2    conda-forge
 pyjwt                     1.6.4                      py_0    conda-forge
 pylint                    2.1.1                 py36_1000    conda-forge
 pyodbc                    4.0.24           py36hfc679d8_0    conda-forge
 pyopenssl                 19.1.0                   py38_0
 pyparsing                 2.3.0                      py_0    conda-forge
 pyqt                      5.9.2            py36h05f1152_2
 pyrsistent                0.14.5           py36h470a237_1    conda-forge
 pysocks                   1.7.1                    py38_0
 pytables                  3.4.4            py36h4f72b40_1    conda-forge
 pytest                    3.10.0                py36_1000    conda-forge
 pytest-arraydiff          0.2                        py_0    conda-forge
 pytest-astropy            0.4.0                      py_0    conda-forge
 pytest-doctestplus        0.1.3                      py_0    conda-forge
 pytest-openfiles          0.3.0                      py_0    conda-forge
 pytest-remotedata         0.3.1                      py_0    conda-forge
 python                    3.8.2               hcff3b4d_14
 python-crfsuite           0.9.6            py36h2d50403_0    conda-forge
 python-dateutil           2.7.5                      py_0    conda-forge
 python-libarchive-c       2.8                   py36_1004    conda-forge
 pytz                      2018.7                     py_0    conda-forge
 pywavelets                1.0.1            py36h7eb728f_0    conda-forge
 pyyaml                    3.13             py36h470a237_1    conda-forge
 pyzmq                     17.1.2           py36hae99301_1    conda-forge
 qt                        5.9.6                h8703b6f_2
 qtawesome                 0.5.2              pyh8a2030e_0    conda-forge
 qtconsole                 4.4.2                      py_1    conda-forge
 qtpy                      1.5.2              pyh8a2030e_0    conda-forge
 readline                  8.0                  h7b6447c_0
 requests                  2.23.0                   py38_0
 requests-oauthlib         1.0.0                      py_1    conda-forge
 rope                      0.10.7                     py_1    conda-forge
 ruamel_yaml               0.15.87          py38h7b6447c_0
 s3transfer                0.1.13                py36_1001    conda-forge
 scikit-image              0.14.1           py36hfc679d8_0    conda-forge
 scikit-learn              0.20.0           py36h4989274_1
 scipy                     1.1.0            py36hfa4b5c9_1
 seaborn                   0.9.0                      py_0    conda-forge
 secretstorage             3.1.0                 py36_1001    conda-forge
 send2trash                1.5.0                      py_0    conda-forge
 service_identity          17.0.0                     py_0    conda-forge
 setuptools                46.2.0                   py38_0
 simplegeneric             0.8.1                      py_1    conda-forge
 singledispatch            3.4.0.3               py36_1000    conda-forge
 sip                       4.19.8           py36hfc679d8_0    conda-forge
 six                       1.14.0                   py38_0
 smart_open                1.7.1                      py_0    conda-forge
 snappy                    1.1.7                hbae5bb6_3
 snowballstemmer           1.2.1                      py_1    conda-forge
 sortedcollections         1.0.1                      py_1    conda-forge
 sortedcontainers          2.0.5                      py_0    conda-forge
 sphinx                    1.8.1                 py36_1000    conda-forge
 sphinxcontrib             1.0                      py36_1
 sphinxcontrib-websupport  1.1.0                      py_1    conda-forge
 spyder                    3.2.8                    py36_0    conda-forge
 spyder-kernels            1.1.0                      py_0    conda-forge
 sqlalchemy                1.2.13           py36h470a237_0    conda-forge
 sqlite                    3.31.1               h62c20be_1
 statsmodels               0.9.0            py36h7eb728f_0    conda-forge
 sympy                     1.3                   py36_1000    conda-forge
 tblib                     1.3.2                      py_1    conda-forge
 terminado                 0.8.1                 py36_1001    conda-forge
 testpath                  0.4.2                 py36_1000    conda-forge
 tk                        8.6.8                hbc83047_0
 toolz                     0.9.0                      py_1    conda-forge
 tornado                   5.1.1            py36h470a237_0    conda-forge
 tqdm                      4.46.0                     py_0
 traitlets                 4.3.2                 py36_1000    conda-forge
 twisted                   18.9.0           py36h470a237_0    conda-forge
 twython                   3.7.0                      py_0    conda-forge
 typed-ast                 1.1.0                    py36_0    conda-forge
 unicodecsv                0.14.1                     py_1    conda-forge
 unixodbc                  2.3.7                h14c3975_0
 urllib3                   1.25.8                   py38_0
 wcwidth                   0.1.7                      py_1    conda-forge
 webencodings              0.5.1                    pypi_0    pypi
 werkzeug                  0.14.1                     py_0    conda-forge
 wheel                     0.34.2                   py38_0
 widgetsnbextension        3.4.2                 py36_1000    conda-forge
 wrapt                     1.10.11          py36h470a237_1    conda-forge
 xlrd                      1.1.0                      py_2    conda-forge
 xlsxwriter                1.1.2                      py_0    conda-forge
 xlwt                      1.3.0                      py_1    conda-forge
 xz                        5.2.5                h7b6447c_0
 yaml                      0.1.7                had09818_2
 zeromq                    4.2.5                hf484d3e_1
 zict                      0.1.3                      py_0    conda-forge
 zlib                      1.2.11               h7b6447c_3
 zope                      1.0                      py36_1
 zope.interface            4.6.0            py36h470a237_0    conda-forge

Additional context
none

The text was updated successfully, but these errors were encountered:

shwina · 2021-05-31T20:58:58Z

Which behaviour is preferred here? Pandas and Dask seem to have the same behaviour, although it could be argued that it seems like a bug in Pandas:

>>> x
     id1    id2           id3  id4  id5   id6  v1  v2         v3
0   <NA>  id016  id0000042202   15   24  5971   5  11  37.211254
1  id039  id045  id0000029558   40   49  <NA>   5   4  48.951141
>>> x.groupby('id1', dropna=False, as_index=False).agg({'v1': 'sum'})  # NA is *not* dropped by cuDF
     id1  v1
0   <NA>   5
1  id039   5
>>> x.to_pandas().groupby('id1', dropna=False, as_index=False).agg({'v1': 'sum'})   # NA is dropped by Pandas
     id1  v1
0  id039   5

jangorecki · 2021-06-01T07:18:08Z

To keep NAs, as cudf is doing now.
Bug report in pandas is here: pandas-dev/pandas#36327

github-actions · 2021-11-15T21:03:50Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

vyasr · 2022-07-18T21:39:00Z

Since we decided that the pandas behavior is undesirable here, I'm going to close this as not actionable on our end.

jangorecki added Needs Triage Need team to review and classify bug Something isn't working labels May 31, 2021

jangorecki mentioned this issue May 31, 2021

cudf does not handle NAs anymore h2oai/db-benchmark#221

Open

shwina added Python Affects Python cuDF API. dask-cudf and removed Needs Triage Need team to review and classify labels May 31, 2021

github-actions bot added the inactive-90d label Nov 15, 2021

vyasr closed this as completed Jul 18, 2022

vyasr added dask Dask issue and removed dask-cudf labels Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] shape differs for the same query between cudf and dask_cudf #8409

[BUG] shape differs for the same query between cudf and dask_cudf #8409

jangorecki commented May 31, 2021

shwina commented May 31, 2021

jangorecki commented Jun 1, 2021

github-actions bot commented Nov 15, 2021

vyasr commented Jul 18, 2022

[BUG] shape differs for the same query between cudf and dask_cudf #8409

[BUG] shape differs for the same query between cudf and dask_cudf #8409

Comments

jangorecki commented May 31, 2021

shwina commented May 31, 2021

jangorecki commented Jun 1, 2021

github-actions bot commented Nov 15, 2021

vyasr commented Jul 18, 2022