-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug fix with negative test for dual DH operating on data handlers tha… #179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of requiring the lr_handler
cache to be loaded we could move the stats related methods from BaseDataHandler
into the TrainingPrepMixIn
class so they're inherited by dual handler and then have the methods operate directly on lr_data
/lr_val_data
(with .data
/.val_data
aliases)
For example, the DualDataHandler._get_stats
method could be:
super()._get_stats()
self.hr_dh._get_stats()
where super()._get_stats()
will define self._means
and self._stds
.
and the normalize method could be
super().normalize(...)
self.hr_dh.normalize(...)
The .means
property could be
out = copy.deepcopy(self.hr_dh.means)
out.update(self._means)
return out
etc.
Maybe I'm missing it but I don't see lr_data
data being normalized, just lr_handler.data
, but lr_data
is what is sampled by the get_next
method.
@bnb32 GO ENJOY YOUR DAY OFF |
Bnb/ddh norm
@@ -253,18 +251,15 @@ def normalize(self, means=None, stds=None, max_workers=None): | |||
super().normalize(means=means, stds=stds, | |||
features=self.lr_dh.features, | |||
max_workers=self.lr_dh.norm_workers) | |||
self.lr_dh.normalize(means=means, stds=stds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? We should be able to compute stats without loading lr_dh.data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lr data handler data is loaded i definitely want to normalize that data too. It's really confusing having copies/views of these data floating around with different norm factors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but you're forcing it to be loaded a few lines above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just fixed that exception raise clause in the last commit (i think this is what you're talking about and i agree this was leftover from previous)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nice. Works for me!
'DataHandler.data=None! Try initializing DualDataHandler ' | ||
'with load_cached=True') | ||
if self.hr_dh.data is None: | ||
msg = ('High-res DataHandler object has DataHandler.data=None! ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should say try initializing the high-res handler with load_cached=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -113,7 +113,7 @@ def get_data(self): | |||
"""Check hr and lr shapes and trim hr data if needed to match required | |||
relationship to lr shape based on enhancement factors. Then regrid lr | |||
data and split hr and lr data into training and validation sets.""" | |||
self._shape_check() | |||
self._set_hr_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this confused me so much hahaha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol yeah it's a confusing method for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly just the name was confusing, I kept trying to figure out where hr_data was being set and took me a while to realize this method did that
std_arr = np.array([stds[fn] for fn in self.lr_dh.features]) | ||
self.lr_data = (self.lr_data - mean_arr) / std_arr | ||
self.lr_data = self.lr_data.astype(np.float32) | ||
|
||
if id(self.hr_data.base) != id(self.hr_dh.data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we actually alias hr_dh.data with hr_data so the latter is updated with normalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point i'd prefer not to... The two variables deviate if you set val_data so it would get confusing with an alias (or it wouldnt work) and if you don't set val data then they always have the same base array and this wont get run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I think it would just get updated after val_data
is set since the val_split
method just updates the hr_dh.data
attribute, but I agree it might be confusing. Theres also the option of just using the _normalize
method to directory normalize hr_data but since you want everything normalized maybe this current way makes the most sense.
bug fix with negative test for dual DH operating on data handlers tha…
…t did not load from cache