-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gb/solar experiment #250
Gb/solar experiment #250
Conversation
…actuals, 24hrs for synthetic, to encourage realism in nighttime hours for synthetic
…p3rcc-solar dynamics are more dynamic
…ctually shouldn't even be used here
…->100kmhourly training experiments
…ot CST as was stated
…emoved stride var no longer used
@@ -402,6 +418,8 @@ def compute(cls, data): | |||
'cloud_mask': CloudMask, | |||
'clearsky_ratio': ClearSkyRatio, | |||
'sza': Sza, | |||
'latitude_feature': LatitudeFeature, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just do 'latitude_feature': 'latitude'
here. The data handler can already get 'latitude' through dh.data
it just isn't seen in dh.features
because dh.features
is defined as list(dh.data.data_vars)
and 'latitude' is in dh.data.coords
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice, done here: 9f5dfcd
if hasattr(self.full_lat_lon, 'vindex'): | ||
return self.full_lat_lon.vindex[self.raster_index] | ||
return self.full_lat_lon[self.raster_index.flatten] | ||
return self.full_lat_lon[self.raster_index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch here. This getting through must mean there was no test coverage for the h5 rasterizers with data loaded in memory? What did you run to hit this? Would you mind adding something similar as a simple smoke test, in test_rasterizer_general.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this got raised when running a multistep fwp pipeline on h5 files but i honestly can't remember exactly. I'm trying to setup a test based on test_rasterizer_general.py::test_topography_h5
but even with preload via chunks=None i can't get self.full_lat_lon
to be anything but a dask array 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy or dask? With chunks=None
it should be a numpy array and then run into the misplaced flatten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops sorry, edited the comment, lat/lon is always a dask array even with chunks=None. Here's the draft test (stupid simple)
def test_preloaded_h5():
"""test preload of h5 file"""
rasterizer = Rasterizer(
file_paths=pytest.FP_WTK, target=(39.01, -105.15), shape=(20, 20),
chunks=None
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that i ran into this error before #247 got merged and then rebased and now this code is never used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is weird actually bc Sup3rX.compute
only loads data_vars
into memory:
sup3r/sup3r/preprocessing/accessor.py
Line 233 in 94eebae
for f in self._ds.data_vars: |
and
chunks=None
forces a call to compute
after rasterization is complete anyway: if preload: |
It seems like it would take some funky manual method calls to get to
_get_flat_data_lat_lon
with a numpy
array at the current version but I don't remember changing anything in that PR that would have impacted this. Either way, I added the test and coordinate compute in PR #251.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah definitely an edge case but i ran into it at some point and we should keep the fix for posterity
daily true high res sample. | ||
- Discriminator sees random n_days of 8-hour samples of the daily | ||
synthetic high res sample. | ||
- Pointwise content loss (MAE/MSE) is only on the center 2 daylight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say a bit about the benefit / your thinking of going from center 8 hours to just 2 and the mean for pointwise loss? Are you trying to weight the mid day peak more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so the problem was that if you're doing a pointwise loss on the center 8 hours you end up getting a regression to the mean as it tries to be accurate during daytime with very little cloud movement across timesteps and then big weird changes in cloud pattern as you deviate off of those pointwise loss hours into sunset/sunrise. The temporal mean loss ensures that all hours including through sunrise/sunset are reasonable but still allows enough freedom for some cloud movement. The pointwise loss on two noonish hours ensures you have a realistic cloud field in the middle of the day and then the discriminator makes sure that transitions to/from these hours is realistic including realistic cloud movement. The only other test i'd run is doing a temporal-mean-only content loss and removing the pointwise loss altogether... but i can just set this class attribute to zero. This is kind of interesting actually i'm going to run a test overnight and maybe add one more commit tomorrow changing the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this mostly makes sense. Wouldn't using just the temporal mean in the content loss encourage regression to the mean though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could but the adversarial loss should protect against that. Really the problem with too much pointwise loss is that you don't get cloud movement in the middle 8 hours of the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay i finished a test with no pointwise loss and the model fell apart and didn't have thick enough clouds in the day. Going to merge this PR with the default pointwise loss set to two hours.
9410dfd
to
ad62263
Compare
preload of coords added to compute and h5 preload test added.
Gb/solar experiment
Mostly changing the loss function for the SolarCC model so that all timesteps are encouraged to somewhat match the low-res inputs but we get pointwise loss on a few hours in the middle of the day with continuity provided by the adversarial loss.
Added a few other edits that are not solar related
weird bug in extended rasterizer: https://github.com/NREL/sup3r/pull/250/files#diff-67ba2e9078e12e55874258e42c6b99cf4f81647358b5eddc64c3dc8626dda45fR200
added lat/lon as features for geospatial encoding: https://github.com/NREL/sup3r/pull/250/files#diff-027a96c03ce10528eb3c5bec99820feedb7150a5e28036caf7050c86a9d09621R394-R411