Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocess argument for open_mfdataset and threading lock #446

Merged
merged 2 commits into from
Jun 29, 2015

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Jun 28, 2015

Fixes #443
Fixes #444

@shoyer shoyer added this to the 0.5.2 milestone Jun 28, 2015
@shoyer
Copy link
Member Author

shoyer commented Jun 29, 2015

Going to merge this shortly, unless anyone has a better name for preprocess

shoyer added a commit that referenced this pull request Jun 29, 2015
Preprocess argument for open_mfdataset and threading lock
@shoyer shoyer merged commit ffa8e69 into pydata:master Jun 29, 2015
@shoyer shoyer deleted the open_mfdataset branch June 29, 2015 18:06
@razcore-rad
Copy link

I have a question about this preprocess thing. Would it mean now that... basically xray will load all data in memory? because of the preprocesing step, whereas before... or at least that's what I understood from the documentation, xray would access the data by a need only basis.

@shoyer
Copy link
Member Author

shoyer commented Jul 2, 2015

Nope, each dataset is loaded lazily when using open_mfdataset (via dask). As long as you stick to xray operations and don't actually manually load data into memory (e.g., by calling .load()) the data is only accessed and transformed by preprocess on a need only basis.

@razcore-rad
Copy link

I need to get my head around this... I know that when you do list comprehension, this isn't lazy so basically it goes through the loop and evaluates for each iteration... so I thought that:

    if preprocess is not None:
        datasets = [preprocess(ds) for ds in datasets]

translates to forcing the application of the preprocess function to the dataset effectively loading it in memory... anyway, this is really cool, I'll definitely try it out 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants