Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support subspacing to nearest neighbour of provided value #835

Open
sadielbartholomew opened this issue Nov 21, 2024 · 0 comments
Open

Support subspacing to nearest neighbour of provided value #835

sadielbartholomew opened this issue Nov 21, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@sadielbartholomew
Copy link
Member

sadielbartholomew commented Nov 21, 2024

A feature request for new functionality for the subspace and indices methods. (xarray supports this functionality for subspacing but we don't, however it seems too useful not to want to steal!)

Picture the scene. You want to do a subspace on a specific data value for some coordinate but due to natural complexity of real-life data, it is a float with many digits to define, as it often is. At the moment in cf-python you have to specify that value exactly e.g. f.subspace(grid_latitude=7.480000078678131) to subspace down to the grid_latitude of 7.480000078678131 (assuming you want to use subspacing by metadata, let's assume you do, you could use indexing but that involves knowing/calculating the appropriate index). But it would be nice to be able to also request by some means 'the value nearest to' a specified value, in this case provide only 7.48 for the grid_latitude to avoid having to put in the exact float, just a close-enough approximation.

xarray supports such 'nearest neighbour' lookups, where you can specify a 'nearest' neighbour method as a method keyword, e.g. 7.48 to go to 7.480000078678131 as the nearest neighbour, plus a tolerance on inexact look-up e.g. 7.5 with +/- 0.1 tolerance to catch this same value that way.

I would like us to support directly the nearest neightbour match, and better advertise how to do the inexact subspace in two lines using our tolerance functions in a context manager.

We already have a new-ish 'halo' subspacing approach and are going to add a 'bounding box' query too. It would be good to consider how to include the above 'nearest neighbour' to the new possibilities for more flexible subspacing. I have made some suggestions below to get the conversation started.

Example

Example set up of a subspaces we'd like to provide the above functionality to simplify:

>>> print(f)
Field: relative_humidity (ncvar%UM_m01s16i204_vn405)
----------------------------------------------------
Data            : relative_humidity(air_pressure(17), grid_latitude(30), grid_longitude(24)) %
Cell methods    : time(1): mean
Dimension coords: time(1) = [1978-12-16 12:00:00] gregorian
                : air_pressure(17) = [1000.0000610351562, ..., 10.0] hPa
                : grid_latitude(30) = [7.480000078678131, ..., -5.279999852180481] degrees
                : grid_longitude(24) = [-5.720003664493561, ..., 4.399996280670166] degrees
Auxiliary coords: latitude(grid_latitude(30), grid_longitude(24)) = [[61.004354306111864, ..., 48.51422609871432]] degrees_north
                : longitude(grid_latitude(30), grid_longitude(24)) = [[-13.762685427418687, ..., 4.622216504491947]] degrees_east
Coord references: grid_mapping_name:rotated_latitude_longitude
>>> f1 = f.subspace(air_pressure=1000.0000610351562)
>>> f2 = f.subspace(grid_latitude=7.480000078678131)

Suggestion for API to provide support

Add a new 'mode' with string identifier 'nearest' to do the nearest neighbour case, e.g. for the above example this would work:

f.subspace(air_pressure=1000, mode="nearest")

The latter case is controlled by our tolerance functions cf.atol and cf.rtol but there are no examples in the documentation to advertise how simply one can control the subspacing tolerance with:

with cf.atol(1e-2):
    f.subspace(grid_latitude=7.480000078678131)

so I'd also like us to showcase examples of using a context manager like above as a two-line means to easily do inexact subspacing.

@sadielbartholomew sadielbartholomew added the enhancement New feature or request label Nov 21, 2024
@sadielbartholomew sadielbartholomew changed the title Support subspacing (by metadata) to nearest neighbour of provided value Support metadata-subspacing to nearest neighbour of provided value Nov 21, 2024
@sadielbartholomew sadielbartholomew changed the title Support metadata-subspacing to nearest neighbour of provided value Support subspacing to nearest neighbour of provided value Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant