-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] shape differs for the same query between cudf and dask_cudf #8409
Comments
Which behaviour is preferred here? Pandas and Dask seem to have the same behaviour, although it could be argued that it seems like a bug in Pandas: >>> x
id1 id2 id3 id4 id5 id6 v1 v2 v3
0 <NA> id016 id0000042202 15 24 5971 5 11 37.211254
1 id039 id045 id0000029558 40 49 <NA> 5 4 48.951141
>>> x.groupby('id1', dropna=False, as_index=False).agg({'v1': 'sum'}) # NA is *not* dropped by cuDF
id1 v1
0 <NA> 5
1 id039 5
>>> x.to_pandas().groupby('id1', dropna=False, as_index=False).agg({'v1': 'sum'}) # NA is dropped by Pandas
id1 v1
0 id039 5 |
To keep NAs, as cudf is doing now. |
This issue has been labeled |
Since we decided that the pandas behavior is undesirable here, I'm going to close this as not actionable on our end. |
Describe the bug
Attribute
shape
differs when data are having NAs for the same query when run on cudf vs dask_cudf.Steps/Code to reproduce bug
This time nice example
execute
Expected behavior
shape
should not depend on cudf or dask_cudf.Expected behavior
Query should complete successfully.
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
none
The text was updated successfully, but these errors were encountered: