better handling of invalid files in open_mfdataset #6736

vnoel · 2022-06-29T08:00:18Z

Is your feature request related to a problem?

Suppose I'm trying to read a large number of netCDF files with open_mfdataset.

Now suppose that one of those files is for some reason incorrect -- for instance there was a problem during the creation of that particular file, and its file size is zero, or it is not valid netCDF. The file exists, but it is invalid.

Currently open_mfdataset will raise an exception with the message
ValueError: did not find a match in any of xarray's currently installed IO backends

As far as I can tell, there is currently no way to identify which one(s) of the files being read is the source of the problem. If there are several hundreds of those, finding the problematic files is a task by itself, even though xarray probably knows them.

Describe the solution you'd like

It would be most useful to this particular user if the error message could somehow identify the file(s) responsible for the exception.

Apart from better reporting, I would find it very useful if I could pass to open_mfdataset some kind of argument that would make it ignore invalid files altogether (ignore_invalid=False comes to mind).

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

dcherian · 2022-06-29T15:44:20Z

t would be most useful to this particular user if the error message could somehow identify the file(s) responsible for the exception.

+1. You could make this change to open_dataset and it will be raised with open_mfdataset too. Attempting to read a bad netCDF file is a common source of trouble. So an error saying something like

Reading file XXX failed. The file is possibly corrupted, or the file path is wrong.

would be quite helpful!

I would find it very useful if I could pass to open_mfdataset some kind of argument that would make it ignore invalid files altogether (ignore_invalid=False comes to mind).

This I'm not sure about because a user wouldn't know if they were missing some data in the middle...

yt87 · 2023-07-09T23:36:11Z

My vote is to have both, a warning, and an option to fill missing data with NaNs. My use case:

I have an archive of 15 years of monthly forecasts. For one month one of the ensemble members is missing. I am converting binary format to zarr. The code is:

ds = xr.open_mfdataset(
    paths,
    engine=BinaryBackend,
    dtype=np.float32,
    combine="nested",
    concat_dim=(ensmem_ix, fcsttime_ix, reftime_ix),
    parallel=False,
).rename_vars(foo="sic")

Currently, my only option is to remove the remaining ensemble member data files before processing. Since I have to use a custom backend (based on https://github.com/aurghs/xarray-backend-tutorial/tree/main), I tried to add code to return array filled with nans when np.fromfile() fails. That, however, is not enough, the missing file is also accessed in _chunk_ds() in xarray/backends/api.py, to create a token for dask. That could be easily handled by adding a try ... except block.

max-sixty · 2024-10-18T01:51:03Z

Contributions welcome!

vnoel added the enhancement label Jun 29, 2022

dcherian added topic-error reporting and removed enhancement labels Jun 29, 2022

This comment was marked as duplicate.

Sign in to view

pratiman-91 linked a pull request Jan 16, 2025 that will close this issue

Open mfdataset enchancement #9955

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better handling of invalid files in open_mfdataset #6736

better handling of invalid files in open_mfdataset #6736

vnoel commented Jun 29, 2022 •

edited

Loading

dcherian commented Jun 29, 2022

yt87 commented Jul 9, 2023 •

edited

Loading

This comment was marked as duplicate.

max-sixty commented Oct 18, 2024

better handling of invalid files in open_mfdataset #6736

better handling of invalid files in open_mfdataset #6736

Comments

vnoel commented Jun 29, 2022 • edited Loading

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

dcherian commented Jun 29, 2022

yt87 commented Jul 9, 2023 • edited Loading

This comment was marked as duplicate.

max-sixty commented Oct 18, 2024

vnoel commented Jun 29, 2022 •

edited

Loading

yt87 commented Jul 9, 2023 •

edited

Loading