-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bnb/dh refactor #220
Merged
Merged
Bnb/dh refactor #220
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bnb32
force-pushed
the
bnb/dh_refactor
branch
10 times, most recently
from
June 27, 2024 17:34
ebb154c
to
bfe2f9f
Compare
bnb32
force-pushed
the
bnb/dh_refactor
branch
4 times, most recently
from
July 1, 2024 15:58
53d1c66
to
bbc4af1
Compare
bnb32
force-pushed
the
bnb/dh_refactor
branch
4 times, most recently
from
July 19, 2024 20:07
59b9817
to
a546b27
Compare
…added pytest.warns() catches for some intentional checks.
added simple test on cc batching for daily boundaries
…ing, etc would not be applied to data loaded from cache.
Bnb/caching fixes
…. added tests for chunks=None with height interp derivation
…el_check keys. the latter is default False, as this takes a long time since it has to load arrays into memory to compute min / max levels. ) Modified the linear interpolation method to use the 2 closest levels rather than the two closest levels which also happen to be above and below the requested level. This speeds up the interpolation by orders of magnitude.
…ape in some cases.
…nks = auto and then load only the rasterized data into memory.
Gb/bc kwargs
grantbuster
approved these changes
Nov 5, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ok, here we go...
sup3r/preprocessing
was previously just data handlers and batch handlers, essentially. Now we haveLoaders
,Extracters
,Derivers
,Cachers
which are composed insup3r.preprocessing.data_handlers.factory
to build objects similar to the oldDataHandlers
. These do basically everything the old handlers used to do, except for training / batching related routines like sampling, normalization, etc.Loaders
just load netcdf / h5 data into axr.Dataset
- like container.Extracters
extract spatiotemporal regions of data.Derivers
derive new features from raw feature data.Cachers
, well, they cache data to either h5 or netcdf depending on the extension of the output file provided.In
sup3r/preprocessing
we additionally haveSamplers
andBatchQueues
. These are composed insup3r.preprocessing.batch_handlers.factory
to build objects similar to the oldBatchHandlers
. These do basically everything that the old batch handlers used to do, with some exceptions. The most notable exception is probably that they don't split data into training and validation sets.BatchHandler
objects will take "collections" of data handler like objects (these can beDataHandlers
,Extracters
,Derivers
, etc) for both training and validation and separate batch queues will be used for each.Samplers
simply contain axr.Dataset
- like object and sample that data as an iterator.BatchQueue
objects interface with samplers to keep a queue full of batches / samples while models are training.All these smaller objects like
loaders
,extracters
,derivers
,samplers
are built on top of xr.Dataset - like objects (sup3r.preprocessing.accessor.Sup3rX
andsup3r.preprocessing.base.Sup3rDataset
) which serve as the familiar.data
attribute for data and batch handlers.Sup3rDataset
is wrapped aroundSup3rX
to provide an interface for "dual" dataset objects contained by dual handlers and acts exactly likeSup3rX
when datasets are not dual.Sup3rX
is anxr.Dataset
"accessor" class, which is the recommended way to extendxr.Datasets
(as opposed to subclassing). TheseSup3rX
/Sup3rDataset
objects act similar toxr.Datasets
but with extended functionality. The tests intests/data_wrappers/
show how to interact with these objects.Since the fundamental dataset objects are now
xr.Dataset
- like, they can use dask arrays to store data. This means we don't need to load data into memory until we need the result of a computation.ForwardPassStrategy
andForwardPass
have been updated accordingly, since we can lazy load the full input dataset and then index the data handler.data
attribute to select generator input chunks, all before loading into memory.BatchHandler
objects have amode
argument which can be set to eitherlazy
(load batches into memory only when they are sent out for training) oreager
(load.data
into memory upon handler initialization).Tests have been added for all new preprocessing modules and lots of documentation / notes have been added throughout. Tests should hopefully provide good examples of use patterns for these new objects.