Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement 1D to ND interpolation #3262

Closed
wants to merge 3 commits into from
Closed

Conversation

nbren12
Copy link
Contributor

@nbren12 nbren12 commented Aug 24, 2019

The tests cover:

1. index is halfway between coordinates
2. index is the same as old coordinates
3. ND index
@pep8speaks
Copy link

pep8speaks commented Aug 24, 2019

Hello @nbren12! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 670:5: F841 local variable 'old_coord' is assigned to but never used
Line 676:36: E226 missing whitespace around arithmetic operator
Line 676:51: E226 missing whitespace around arithmetic operator
Line 679:21: E226 missing whitespace around arithmetic operator
Line 693:31: E231 missing whitespace after ','
Line 711:1: F811 redefinition of unused 'test_interp_1d_nd_targ' from line 704
Line 718:1: E302 expected 2 blank lines, found 1
Line 729:1: W391 blank line at end of file

Comment last updated at 2020-06-10 23:33:02 UTC

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!


def interp_nd(data, coords):
for dim in coords:
data = interp_1d(data, dim, coords[dim])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of this approach (indexing separately along each dimension) is that it is potentially much more expensive than doing all indexing at once, e.g., if the result of indexing along multiple dimensions is only a single point.

I think it would be interesting to see if this could be done by indexing all dimensions at once 2**len(coords) times, with all combinations of bfill/ffill along all dimensions.

This would require modifying sel to support a dict of options for method indicating different indexing methods along different axes, along the lines of the proposal from #3223 for tolerance.

upper = data.sel(**selectors, method='bfill')
lower_x = data[dim].sel(**selectors, method='ffill')
upper_x = data[dim].sel(**selectors, method='bfill')
weight = np.abs(vals - lower_x)/np.abs(upper_x-lower_x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean towards using the built in abs() here, which is slightly more generic than the NumPy function.

@nbren12
Copy link
Contributor Author

nbren12 commented Aug 26, 2019

@shoyer Thanks for the comments. I was struggling to incorporate it into Dataset.interp since core.missing is a pretty complicated. Would it be worth refactoring that module to clarify how interp calls are mapped to a given function? Also, most of the methods in interp work like Dataset -> Variables -> Numpy arrays, but the method you proposed above operates on the Dataset level, so it doesn't quite fit into core.missing.interp.

The interpolation code I was working with doesn't regrid the coordinates appropriately, so we would need to do that too.

@shoyer
Copy link
Member

shoyer commented Aug 27, 2019

Feel free to refactor as you see fit, but it may still make sense to do indexing at the Variable rather than Dataset level. That potentially would let you avoid redundant operations on the entire Dataset object.

Take a look at the _localize() helper function in missing.py for an example of how to do stuff with in the underlying index. I think something like the following helper function could do the trick:

def linear_interp(var, indexes_coords):
    lower_indices = {}
    upper_indices = {}
    for dim, [x, new_x] in indexes_coords.items():
        index = x.to_index()
        # ideally should precompute these, rather than calling get_indexer_nd for each
        # variable separately
        lower_indices[dim] = get_indexer_nd(index, new_x.values, method="ffill")
        upper_indices[dim] = get_indexer_nd(index, new_x.values, method="bfill")
    result = 0
    for weight, indexes in ...  # need to compute weights and all lower/upper combinations
        result += weight * var.isel(**indexes)
    return result

@nbren12
Copy link
Contributor Author

nbren12 commented Aug 27, 2019

Thanks so much for the help. This is a good learning experience for me.

That potentially would let you avoid redundant operations on the entire Dataset object.

Yes. This is where I got stuck TBH.

@crusaderky
Copy link
Contributor

For highly optimized interpolation of an N-dimensional array along any one dimension, see also https://xarray-extras.readthedocs.io/en/latest/api/interpolate.html

@shoyer
Copy link
Member

shoyer commented Nov 2, 2019

One missing part of the algorithm I wrote in #3262 (comment) was looping over all index/weight combinations. I recently wrote a version of this for another project that might be a good starting point here:

def prod(items):
  out = 1
  for item in items:
    out *= item
  return out

def index_by_linear_interpolation(array, float_indices):
  all_indices_and_weights = []
  for origin in float_indices:
    lower = np.floor(origin)
    upper = np.ceil(origin)
    l_index = xlower.astype(np.int32)
    u_index = upper.astype(np.int32)
    l_weight = origin - lower
    u_weight = 1 - l_weight
    all_indices_and_weights.append(
        ((l_index, l_weight), (u_index, u_weight))
    )

  out = 0
  for items in itertools.product(*all_indices_and_weights):
    indices, weights = zip(*items)
    indices = tuple(index % size for index, size in zip(indices, array.shape))
    out += prod(weights) * array[indices]
  return out

@nbren12
Copy link
Contributor Author

nbren12 commented Nov 2, 2019

Unfortunately, I don’t think I have much time now to contribute to a general purpose solution leveraging xarray’s built-in indexing. So feel free to add to or close this PR. To be successful, I would need to study xarray’s indexing internals more since I don’t think it is as easily implemented as a routine calling DataArray methods. Some custom numba code I wrote fits in my brain much better, and is general enough for my purposes when wrapped with xr.apply_ufunc. I encourage someone else to pick up where I left off, or we could close this PR.

@shoyer
Copy link
Member

shoyer commented Nov 2, 2019 via email

@nbren12
Copy link
Contributor Author

nbren12 commented Dec 17, 2020

I'm going to close this since I won't be working on it any longer.

@nbren12 nbren12 closed this Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

interp and reindex should work for 1d -> nd indexing
4 participants