Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix with negative test for dual DH operating on data handlers tha… #179

Merged
merged 19 commits into from
Nov 28, 2023

Conversation

grantbuster
Copy link
Member

…t did not load from cache

@grantbuster grantbuster requested a review from bnb32 November 22, 2023 17:25
@grantbuster grantbuster marked this pull request as ready for review November 22, 2023 20:51
Copy link
Collaborator

@bnb32 bnb32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of requiring the lr_handler cache to be loaded we could move the stats related methods from BaseDataHandler into the TrainingPrepMixIn class so they're inherited by dual handler and then have the methods operate directly on lr_data/lr_val_data (with .data/.val_data aliases)

For example, the DualDataHandler._get_stats method could be:

super()._get_stats()
self.hr_dh._get_stats()

where super()._get_stats() will define self._means and self._stds.

and the normalize method could be

super().normalize(...)
self.hr_dh.normalize(...)

The .means property could be

out = copy.deepcopy(self.hr_dh.means)
out.update(self._means)
return out

etc.

Maybe I'm missing it but I don't see lr_data data being normalized, just lr_handler.data, but lr_data is what is sampled by the get_next method.

@grantbuster
Copy link
Member Author

@bnb32 GO ENJOY YOUR DAY OFF

@@ -253,18 +251,15 @@ def normalize(self, means=None, stds=None, max_workers=None):
super().normalize(means=means, stds=stds,
features=self.lr_dh.features,
max_workers=self.lr_dh.norm_workers)
self.lr_dh.normalize(means=means, stds=stds,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? We should be able to compute stats without loading lr_dh.data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lr data handler data is loaded i definitely want to normalize that data too. It's really confusing having copies/views of these data floating around with different norm factors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok but you're forcing it to be loaded a few lines above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just fixed that exception raise clause in the last commit (i think this is what you're talking about and i agree this was leftover from previous)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok nice. Works for me!

'DataHandler.data=None! Try initializing DualDataHandler '
'with load_cached=True')
if self.hr_dh.data is None:
msg = ('High-res DataHandler object has DataHandler.data=None! '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should say try initializing the high-res handler with load_cached=True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -113,7 +113,7 @@ def get_data(self):
"""Check hr and lr shapes and trim hr data if needed to match required
relationship to lr shape based on enhancement factors. Then regrid lr
data and split hr and lr data into training and validation sets."""
self._shape_check()
self._set_hr_data()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this confused me so much hahaha

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol yeah it's a confusing method for sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly just the name was confusing, I kept trying to figure out where hr_data was being set and took me a while to realize this method did that

std_arr = np.array([stds[fn] for fn in self.lr_dh.features])
self.lr_data = (self.lr_data - mean_arr) / std_arr
self.lr_data = self.lr_data.astype(np.float32)

if id(self.hr_data.base) != id(self.hr_dh.data):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we actually alias hr_dh.data with hr_data so the latter is updated with normalization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point i'd prefer not to... The two variables deviate if you set val_data so it would get confusing with an alias (or it wouldnt work) and if you don't set val data then they always have the same base array and this wont get run

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think it would just get updated after val_data is set since the val_split method just updates the hr_dh.data attribute, but I agree it might be confusing. Theres also the option of just using the _normalize method to directory normalize hr_data but since you want everything normalized maybe this current way makes the most sense.

@grantbuster grantbuster merged commit 1c6ad1e into main Nov 28, 2023
8 checks passed
@grantbuster grantbuster deleted the gb/ddh_stats_bug branch November 28, 2023 15:41
github-actions bot pushed a commit that referenced this pull request Nov 28, 2023
bug fix with negative test for dual DH operating on data handlers tha…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants