Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QGDataset does not support data input with even number of latitude gridpoints #85

Closed
csyhuang opened this issue Oct 4, 2023 · 7 comments
Assignees

Comments

@csyhuang
Copy link
Owner

csyhuang commented Oct 4, 2023

When climate data with even number of latitudinal gridpoints is input to QGFieldBase, the following will occur inside the class:

if (ylat.size % 2 == 0) & (sum(ylat == 0.0) == 0):
# Even grid
self.need_latitude_interpolation = True
self.ylat_no_equator = ylat
self.ylat = np.linspace(-90., 90., ylat.size+1, endpoint=True)
self.equator_idx = \
np.argwhere(self.ylat == 0)[0][0] + 1

When returning the computation results to the user, the computed fields are interpolated back to the original grid (self.ylat_no_equator).

Right now, this behavior is not mirrored in QGDataset, which causes an inconsistency in results output when input data has even number of latitude gridpoints.

TODO: Modify QGDataset.interpolate_fields to handle this scenario.

@csyhuang csyhuang mentioned this issue Oct 9, 2023
5 tasks
@chpolste chpolste self-assigned this Oct 11, 2023
@chpolste
Copy link
Collaborator

Did you already implement a fix for this issue? https://github.com/csyhuang/hn2016_falwa/blob/master/hn2016_falwa/xarrayinterface.py#L299 looks like it does the trick and ylat_output just needs to be used in the rest of the method. Is there something else to do?

We could alternatively keep all the logic in QGField and introduce a companion to get_latitude_dim that always returns the appropriate latitude coordinates. Do you have a preference?

@chpolste
Copy link
Collaborator

It looks like I am going to rewrite/reorganize the dimension, shape and metadata generation of the xarrayinterface with the implementation of a dask-aware QGDataset for #50. I will address this issue at the same time, most likely by introducing a new ylat_interpolated (I'll check again what would be most consistent with the existing variable naming) property to QGField that always provides the appropriate latitude coordinates.

@csyhuang
Copy link
Owner Author

csyhuang commented Oct 18, 2023

@chpolste Thanks for taking care of this! I am thinking if this would be a better solution: rename self.ylat inside QGField to self._ylat such that it is for internal usage only. Users shall only be able to access the input latitude grid saved as self._ylat_input at the beginning. To keep the external interface the same, we can do something like

@property
def ylat(self):
    return self._ylat_input

Since all the computed field will be interpolated onto self._ylat_input when being access, on the user end, everything is consistent. Let me know what you think about this.

Besides that, there is another procedure in QGDataset which I wonder if you may change: instead of doing _map_collect to collect the output arguments from the method, would it be possible to get the computed quantities directly from the QGField object? Instead of

# Call interpolate_fields on all QGField objects
out_fields = _map_collect(
    lambda field: field.interpolate_fields(),
    self._fields,
    ["qgpv", "interpolated_u", "interpolated_v", "interpolated_theta", "static_stability"],
    postprocess=np.asarray
)

Is it possible to get directly from each field object field.qgpv, field.interpolated_u, field.interpolated_v etc.? In this way, we can disentangle the dependency of QGDataset to the interface of the methods of QGField.

In the near future, I want to eliminate the returning of processed arguments in the 3 public methods of QGField to save some computation time and speed up procedures in MDTF - In the MDTF use case, for the first iteration, I want to get only Uref, Qref, u_baro and lwa_baro. We can save time skipping the axes swap and interpolation for other quantities.

Let me know what you think is pragmatic in the given time frame. If that's too much to change, we can do the second part in another release. 😃 Thanks a lot!

@csyhuang
Copy link
Owner Author

@chpolste For the suggested changes I mentioned above (use self._input_ylat and self._ylat inside QGField), I did the changes on the branch csyhuang-refactor-oopinterface-ylat:

https://github.com/csyhuang/hn2016_falwa/compare/release1.0...csyhuang-refactor-oopinterface-ylat?expand=1

The unit tests all pass so it should be working properly. If you think the changes look OK, you can start from there when working on QGDataset such that no changes is needed in QGField.

Let me know what you think 😃 Thanks for taking care of this!

@chpolste
Copy link
Collaborator

@chpolste For the suggested changes I mentioned above (use self._input_ylat and self._ylat inside QGField), I did the changes on the branch csyhuang-refactor-oopinterface-ylat:

https://github.com/csyhuang/hn2016_falwa/compare/release1.0...csyhuang-refactor-oopinterface-ylat?expand=1

The unit tests all pass so it should be working properly. If you think the changes look OK, you can start from there when working on QGDataset such that no changes is needed in QGField.

Perfect, thanks!

Besides that, there is another procedure in QGDataset which I wonder if you may change: instead of doing _map_collect to collect the output arguments from the method, would it be possible to get the computed quantities directly from the QGField object? Instead of

# Call interpolate_fields on all QGField objects
out_fields = _map_collect(
    lambda field: field.interpolate_fields(),
    self._fields,
    ["qgpv", "interpolated_u", "interpolated_v", "interpolated_theta", "static_stability"],
    postprocess=np.asarray
)

Is it possible to get directly from each field object field.qgpv, field.interpolated_u, field.interpolated_v etc.? In this way, we can disentangle the dependency of QGDataset to the interface of the methods of QGField.

In the near future, I want to eliminate the returning of processed arguments in the 3 public methods of QGField to save some computation time and speed up procedures in MDTF - In the MDTF use case, for the first iteration, I want to get only Uref, Qref, u_baro and lwa_baro. We can save time skipping the axes swap and interpolation for other quantities.

Sounds good. Branch xr-delayed up to a26b150 implements this, with _map_collect completely removed from the codebase. No pull request yet, as I will continue working on the branch to implement dask support. I need a lot of the new metadata support for that anyway.

@csyhuang
Copy link
Owner Author

csyhuang commented Nov 1, 2023

@chpolste : 👍 Awesome! I finally got a chance to test the code with MDTF calculations! Your updated version of QGDataset works seamlessly and saves me lines of code. Thanks a lot for the adjustment! 😄
FYI, I'll keep working from my new branch csyhuang-from-xr-delayed-ua2-fix which branches from your commit a26b150. I've just touched base with Noboru - the bugfix for zonal advective flux from the cosine weighting error you spotted out will be added on that branch. We will write up an Issue about this bugfix in a day or two.
My hope is to wrap all these up by this weekend (Nov 5): (1) bugfix for advective flux, (2) falwa release v1.0, (3) MDTF pull request, and (4) the Geoscience Data Journal paper (70% finished), hopefully not too ambitious 🙈 . I think you've finished your share of work for (1)-(3) so let me wrap up the final part of relay. I'll send you (4) to comment after I finish (hopefully soon). Thanks a lot!

@csyhuang
Copy link
Owner Author

csyhuang commented Nov 5, 2023

Christopher has addressed this issue in commits up to a26b150 so this can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants