Could we defer to flox for `GroupBy.first`? #9647

max-sixty · 2024-10-18T20:55:43Z

Is your feature request related to a problem?

I was wondering why a groupby("foo").first() call was going so slowly — I think we run a python loop for this, rather than calling into flox:

xarray/xarray/core/groupby.py

Lines 1218 to 1231 in b9780e7

    
           def _first_or_last(self, op, skipna, keep_attrs): 
        
               if all( 
        
                   isinstance(maybe_slice, slice) 
        
                   and (maybe_slice.stop == maybe_slice.start + 1) 
        
                   for maybe_slice in self.encoded.group_indices 
        
               ): 
        
                   # NB. this is currently only used for reductions along an existing 
        
                   # dimension 
        
                   return self._obj 
        
               if keep_attrs is None: 
        
                   keep_attrs = _get_keep_attrs(default=True) 
        
               return self.reduce( 
        
                   op, dim=[self._group_dim], skipna=skipna, keep_attrs=keep_attrs 
        
               )

Describe the solution you'd like

Could we call into flox? Numbagg has the routines...

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

dcherian · 2024-10-20T04:37:04Z

Yes , the minor complication is that we should dispatch nanfirst and nanlast but not first, last. The latter are simply indexing using an indexer we already know, so the reduction approach is overkill.

Closing #8025 in favor of this one.

Out of curiosity how many groups does your problem have?

max-sixty · 2024-10-20T19:29:39Z

Sorry I missed #8025, I thought I searched; I guess first hit lots of unrelated issues and I missed it.

Out of curiosity how many groups does your problem have?

About 15K...

dcherian · 2024-10-21T13:04:43Z

About 15K...

Do you end up using dask for this, or just numbagg? Are these groups randomly distributed along the dimension, or are there patterns to how they are distributed (e.g. are they sequential)?

Just curious...

max-sixty · 2024-10-21T17:41:45Z

Do you end up using dask for this, or just numbagg?

I ended up just leaving it running for hours!

Are these groups randomly distributed along the dimension, or are there patterns to how they are distributed (e.g. are they sequential)?

Yes they're largely sequential!

1. Use flox where possible. 2. Use simple indexing where possible. Closes pydata#9647

This reverts commit a848044. Opens pydata#9647 Closes pydata#9993

max-sixty added the enhancement label Oct 18, 2024

TomNicholas added the topic-groupby label Oct 19, 2024

dcherian added a commit to dcherian/xarray that referenced this issue Jan 25, 2025

Optimize grouped first, last.

9b1a90b

1. Use flox where possible. 2. Use simple indexing where possible. Closes pydata#9647

dcherian mentioned this issue Jan 25, 2025

Use flox for grouped first, last #9986

Merged

3 tasks

dcherian closed this as completed in #9986 Jan 27, 2025

dcherian added a commit to dcherian/xarray that referenced this issue Jan 29, 2025

Revert "Use flox for grouped first, last (pydata#9986)"

f806a71

This reverts commit a848044. Opens pydata#9647 Closes pydata#9993

dcherian mentioned this issue Jan 29, 2025

Revert "Use flox for grouped first, last (#9986)" #10001

Merged

dcherian reopened this Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could we defer to flox for `GroupBy.first`? #9647

Could we defer to flox for `GroupBy.first`? #9647

max-sixty commented Oct 18, 2024

dcherian commented Oct 20, 2024 •

edited

Loading

max-sixty commented Oct 20, 2024

dcherian commented Oct 21, 2024

max-sixty commented Oct 21, 2024

Could we defer to flox for GroupBy.first? #9647

Could we defer to flox for GroupBy.first? #9647

Comments

max-sixty commented Oct 18, 2024

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

dcherian commented Oct 20, 2024 • edited Loading

max-sixty commented Oct 20, 2024

dcherian commented Oct 21, 2024

max-sixty commented Oct 21, 2024

Could we defer to flox for `GroupBy.first`? #9647

Could we defer to flox for `GroupBy.first`? #9647

dcherian commented Oct 20, 2024 •

edited

Loading