Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] top N rows by group #2592

Closed
jangorecki opened this issue Aug 15, 2019 · 1 comment · Fixed by #12939
Closed

[FEA] top N rows by group #2592

jangorecki opened this issue Aug 15, 2019 · 1 comment · Fixed by #12939
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@jangorecki
Copy link

I would like to select top N (2 in below example) rows by group. Below is pandas code used for that.

import pandas as pd
d = pd.DataFrame.from_dict(dict([('id2', [1, 2, 1, 2, 1, 2]), ('id4', [1, 1, 1, 1, 1, 1]), ('v3', [1, 3, 2, 3, 3, 3])]))
d.sort_values('v3', ascending=False).groupby(['id2','id4']).head(2)

What is the cudf API to achieve it?

@jangorecki jangorecki added Needs Triage Need team to review and classify question Further information is requested labels Aug 15, 2019
@kkraus14 kkraus14 changed the title [QST] top N rows by group [FEA] top N rows by group Aug 15, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify question Further information is requested labels Aug 15, 2019
@kkraus14
Copy link
Collaborator

@jangorecki Changed this to a feature request.

@kkraus14 kkraus14 added the feature request New feature or request label Aug 15, 2019
@GregoryKimball GregoryKimball assigned shwina and unassigned shwina Jun 24, 2022
@wence- wence- self-assigned this Mar 9, 2023
wence- added a commit to wence-/cudf that referenced this issue Mar 14, 2023
rapids-bot bot pushed a commit that referenced this issue Mar 16, 2023
These methods can be implemented by grouping the dataframe and
then selecting appropriate slices from each group. This is less
memory-efficient than it could be (since the entire grouping must
be constructed before discarding most of it).

- Closes #2592
- Closes #12245

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #12939
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants