-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill missing data_vars during concat by reindexing #7400
Conversation
3f6206f
to
a9d7ac5
Compare
There are no tests added currently. I'm just wondering, if that approach would work in general. The current assumptions here:
|
Thanks @Illviljan for activating the benchmark runs. Are those errors related to the changes? I'm not up to date with mypy, are these errors induced by changes here? |
Benchmark is a numba issue, probably #7306. Mypy is real, cannot getitem a object. Try out using isinstance instead of the try/except to narrow the typing. |
OK, green light's now also on mypy. Looks like the approach would work in general. Trying to add some tests now. |
0d2dd18
to
3da2ded
Compare
This is ready for a first round of review. Thanks! |
Hi, thanks for doing this @kmuehlbauer . FWIW I'm no longer seeing the issue I was previously seeing when I submitted #3545 when I just run with released xarray v2022.12.0 (I haven't gone back further to see when the issue started going away so I'm not really sure if the old error has just been suppressed or if the single case I was seeing back then was resolved in a previous PR--or there is also a chance there is something which changed in my data over that long time period). That being said I also applied this PR to my workflow and reran the concat code and it continues to pass correctly with this PR from what I've seen so far. @kmuehlbauer did you see the tests I created here? https://github.com/pydata/xarray/blob/03f9b3b85aee039f47dd693322492ab09f57fb73/xarray/tests/test_concat.py |
Thanks @scottcha for taking the time to testing things. I'll have a look at your tests in more detail now. I've concentrated on my use case in the first place and hoped to get away with it 😀. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some typing suggestions.
Thanks @Illviljan, your suggestions and help is much appreciated. |
@scottcha I've found a glitch in the code due to your tests. Already pushed the changes here. I'm going to cherry pick your tests here next. |
@scottcha I've tried to cherry-pick, but ended up copy/pasting and adding your authorship to the commit. I think the final problem is the order in:
These tests are flaky. Sometimes the order is correct and sometimes not. Can't immediately see the root cause here. @Illviljan I'll try to add typing to these additional tests. Should be good for learning that. |
@Illviljan OK, I'm stuck now. I can't make anything out of the remaining mypy errors. Would be great if you could have another look here, thanks! |
79b0054
to
12fac20
Compare
@scottcha I think I've managed to get along with your tests. It looks like everything is running now. One thing which is still unresolved:
@Illviljan @dcherian This is ready for another round of review. Thanks for considering. |
I was hoping to gain something by merging the variable order code with The current behaviour, and the best I've come up so far in terms of performance:
|
Finally, this is as far I could get with it. I'll leave it as is now. Looking forward for reviews and suggestions. Thanks @Illviljan for the great support! |
…se_datasets, provide fast lane for variable order estimation
This also removes the need to handle fill_value
Co-authored-by: Deepak Cherian <[email protected]>
95c03e0
to
6cb163e
Compare
@dcherian rebased on latest main and fixed whats-new.rst. Should be good for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kmuehlbauer and @scottcha . This is great to see finished! Very clean implementation in the end.
@dcherian There slipped an old item from whats-new.rst back into. I've removed it. Should be OK now. Great to see this functionality coming to next xarray version. |
* upstream/main: RTD maintenance (pydata#7477) fix the RTD build skipping feature (pydata#7476) Add benchmarks for to_dataframe and to_dask_dataframe (pydata#7474) allow skipping RTD builds (pydata#7470) create separate environment files for `python=3.11` (pydata#7469) Bump mamba-org/provision-with-micromamba from 14 to 15 (pydata#7466) install `numbagg` from `conda-forge` (pydata#7415) Fill missing data_vars during concat by reindexing (pydata#7400) [skip-cii] Add pyodide update instructions to HOW_TO_RELEASE (pydata#7449) [skip-ci] whats-new for next release (pydata#7455) v2023.01.0 whats-new (pydata#7440)
whats-new.rst
api.rst
This is another attempt to solve #508. Took inspiration from #3545 by @scottcha.
This follows along @dcherian's comment in the same above PR (#3545 (comment)).
Update:
After review the variable order is estimated by order of appearance in the list of datasets. That keeps full backwards compatibility and is deterministic. Thanks @dcherian and @keewis for the suggestions.