-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dtype not consistent for concat() #215
Comments
Thanks for raising this @mdsumner .
Currently it is your responsibility. VirtualiZarr is effectively complaining because you're trying to make something that represents one Zarr Array but with two different dtypes inside it, which is a violation of Zarr's data model, where one array has only one dtype. I think the simplest way would be for you to actually load the offending variables as " |
Ah ok, thanks! I'll report to the provider too. Degenerate rectilinear coords strikes again. It makes sense to load the trivial coords inline And, I realise, we could drop z which is useless here, and possibly prevent the churn of investigating lon,lat from every file ... I'll explore 👌 |
awesome, that works - thanks for the guidance! I'm becoming more comfy in the python space and will be exploring the docs a lot more. from virtualizarr import open_virtual_dataset
import xarray
base = "https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr"
app = "#mode=bytes"
nc = ["202002/oisst-avhrr-v02r01.20200227.nc",
"202002/oisst-avhrr-v02r01.20200228.nc"
]
## don't open as fs
u = [f'{base}/{file}{app}' for file in nc]
## fine
vd = [
open_virtual_dataset(filepath, indexes = {}, loadable_variables=['lon', 'lat', 'time', 'zlev'])
for filepath in u
]
## succeeds at concat because while dtype of time has changed, we loaded the coord vars that are (logicially) constant
## across all files rather than store them as reference ranges
xarray.concat(vd, dim = 'time', coords = 'minimal', compat = 'override') |
Great! This package is still very young, so let us know any way you think it can be improved.
Great. I'm going to close this issue, and add a comment to #5 to indicate that a solution to that issue would have also allowed you to concatenate two arrays with different dtypes in general. (The solution I suggested here would be a bad idea for actually-large arrays.) |
With these two NetCDF files, there was a change in the sequence from 'time' as double to float, I've chosen the two files that cross that modification date.
(also the main varnames changed order but that doesn't affect anything here).
I just wonder if it's my responsibility to check those types, or if VirtualiZarr should have a virtualization of xarray's type-standardization here (eek). Frankly I think the files should stay consistent and be updated, which I'll pursue but I think this might be a worthwhile situation to catch.
(apologies if this is noise, still finding my way around)
The text was updated successfully, but these errors were encountered: