Request for a group_subset() function #7625

marcuslehr · 2025-01-15T23:06:30Z

Hi, so I frequently find myself attempting to subset a particular group from a grouped dataframe. Usually for troubleshooting purposes of some sort. There's already a set of group_ helper functions which I usually try to inspect for this task. You can make these work to select a group or call filter() and manually filter down to a single group, but either way it's a bit tedious. Especially when you're looking to quickly grab a random group or two for dev/debugging purposes. The most efficient way I can find to do this is:
grouped_df[group_rows(grouped_df)[[1]],]

This will subset the data from the first group. However, this is a bit tedious and difficult to remember. Plus, it doesn't work well with pipes as the data frame must be called twice (and pipes don't play well with subsetting in the first place). For demonstration, the piped equivalent is:
grouped_df %>% group_rows() %>% .[[1]] %>% grouped_df[.,]

Both of these are ugly and hard to remember so I think it would be nice to have a helper function specifically for this purpose. It could be called group_subset() or group_select(), tho the latter could be construed with select() (even though groups are row-based, but I can see why one might want to avoid it). Heck, I would actually argue for replacing group_data(), as you'd be forgiven for thinking that's what group_data() is for. But it's not.. it returns row numbers not data, which is misleading imo. In fact group_data() is so similar to group_rows() that I would argue they're basically redundant and group_data() could simply be repurposed.

Anyways, my envisioned syntax to replace the above calls is:
grouped_df %>% group_subset(1)

This would be a really nice clean solution to return a single group subset via a group index. If you're highly adverse to adding new functions or making breaking changes, then group_data() could at least be modified to return a data column. Then you could do
grouped_df %>% group_data() %>% slice(1) %>% pull(.data)

This would at least make group_data() true to it's name and be an improvement. But I still like the dedicated function option better (eg group_subset) and it seems reasonable given there's already a suite of helper functions.

The text was updated successfully, but these errors were encountered:

marcuslehr · 2025-01-16T18:41:21Z

Nevermind, I found group_split(). Not quite as nice as I would like because it still requires an extra pipe to subset, but it does what I want. The call is:
grouped_df %>% group_split() %>% .[[1]]

Also, just as a side note should anyone else come looking here, I forgot another syntax yesterday which is:
grouped_df %>% nest() %>% ungroup() %>% slice(1) %>% pull(data)

It's long, but does the job. Still wouldn't hate a 'group_select()' or 'group_subset()' function that takes an index, but group_split() is pretty close.

marcuslehr closed this as completed Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for a group_subset() function #7625

Request for a group_subset() function #7625

marcuslehr commented Jan 15, 2025 •

edited

Loading

marcuslehr commented Jan 16, 2025

Request for a group_subset() function #7625

Request for a group_subset() function #7625

Comments

marcuslehr commented Jan 15, 2025 • edited Loading

marcuslehr commented Jan 16, 2025

marcuslehr commented Jan 15, 2025 •

edited

Loading