Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for a group_subset() function #7625

Closed
marcuslehr opened this issue Jan 15, 2025 · 1 comment
Closed

Request for a group_subset() function #7625

marcuslehr opened this issue Jan 15, 2025 · 1 comment

Comments

@marcuslehr
Copy link

marcuslehr commented Jan 15, 2025

Hi, so I frequently find myself attempting to subset a particular group from a grouped dataframe. Usually for troubleshooting purposes of some sort. There's already a set of group_ helper functions which I usually try to inspect for this task. You can make these work to select a group or call filter() and manually filter down to a single group, but either way it's a bit tedious. Especially when you're looking to quickly grab a random group or two for dev/debugging purposes. The most efficient way I can find to do this is:
grouped_df[group_rows(grouped_df)[[1]],]

This will subset the data from the first group. However, this is a bit tedious and difficult to remember. Plus, it doesn't work well with pipes as the data frame must be called twice (and pipes don't play well with subsetting in the first place). For demonstration, the piped equivalent is:
grouped_df %>% group_rows() %>% .[[1]] %>% grouped_df[.,]

Both of these are ugly and hard to remember so I think it would be nice to have a helper function specifically for this purpose. It could be called group_subset() or group_select(), tho the latter could be construed with select() (even though groups are row-based, but I can see why one might want to avoid it). Heck, I would actually argue for replacing group_data(), as you'd be forgiven for thinking that's what group_data() is for. But it's not.. it returns row numbers not data, which is misleading imo. In fact group_data() is so similar to group_rows() that I would argue they're basically redundant and group_data() could simply be repurposed.

Anyways, my envisioned syntax to replace the above calls is:
grouped_df %>% group_subset(1)

This would be a really nice clean solution to return a single group subset via a group index. If you're highly adverse to adding new functions or making breaking changes, then group_data() could at least be modified to return a data column. Then you could do
grouped_df %>% group_data() %>% slice(1) %>% pull(.data)

This would at least make group_data() true to it's name and be an improvement. But I still like the dedicated function option better (eg group_subset) and it seems reasonable given there's already a suite of helper functions.

@marcuslehr
Copy link
Author

Nevermind, I found group_split(). Not quite as nice as I would like because it still requires an extra pipe to subset, but it does what I want. The call is:
grouped_df %>% group_split() %>% .[[1]]

Also, just as a side note should anyone else come looking here, I forgot another syntax yesterday which is:
grouped_df %>% nest() %>% ungroup() %>% slice(1) %>% pull(data)

It's long, but does the job. Still wouldn't hate a 'group_select()' or 'group_subset()' function that takes an index, but group_split() is pretty close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant