You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a POC. And I would like to first get your feedback about the idea, before finishing up tests, doc and coverage. This comes from the #43, follows up from the comment in #103. #43 is a lot more elaborated since it's trying to strive for statically typed feedback, whilst this is a lot more stripped down.
I have tried to:
not use extra wrapper around xr Dataset (since we have discussed that we would like to avoid it)
strive for a consistent API whilst still be concise and relatively flexible
The core idea is:
hold a "schema" in attrs of the xr Dataset
define specs of meaningful/useful arrays and validate the spec against variables in the Dataset at schema spec
schema then is essentially a data/array spec + pointer to variable in a Dataset
schema spec can point to many variables, and by default points to the default variable
Benefits:
A single place for definition of useful/reserved variable and their constraints
A single way to declare that specific variable(s) have specific meaning and spec
As the pipeline flows and results are merged into a single dataset, that dataset can be used for different function even if it contains custom variable names (which would be declared once)
As a user:
if you don't change any of the precomputed variable names, you don't need to do anything
if a function requires that you specify which variables to use for computation, you must do so via SgkitSchema.spec before calling the function
SgkitSchema.spec returns a new Dataset (shallow copied) with updated schema/attrs
What is missing:
get your feedback, and pending on your feedback:
polish the API (eg. make it easier to fetch a single name variables)
discuss and complete all the specs constraints
make it easier to merge DS together with schema
more documentation
more tests
update regenie (essentially the same as regression)
This POC removes all the required/optional variable names from function arguments, and if those need to be specified or are custom user needs to specify it via SgkitSchema.spec, alternatively we could keep those where necessary (example) and call SgkitSchema.spec inside the functions (triggering validation etc).
One more point: we could make it redundant to declare default names in schema, and if missing in schema, but requested: assume default name, check variable against the spec, and return name.
The text was updated successfully, but these errors were encountered:
Issue metadata
Issue description
Pull request metadata
Pull request description
This is a POC. And I would like to first get your feedback about the idea, before finishing up tests, doc and coverage. This comes from the #43, follows up from the comment in #103. #43 is a lot more elaborated since it's trying to strive for statically typed feedback, whilst this is a lot more stripped down.
I have tried to:
The core idea is:
attrs
of the xr DatasetBenefits:
As a user:
SgkitSchema.spec
before calling the functionSgkitSchema.spec
returns a new Dataset (shallow copied) with updated schema/attrsWhat is missing:
regenie
(essentially the same as regression)This POC removes all the required/optional variable names from function arguments, and if those need to be specified or are custom user needs to specify it via
SgkitSchema.spec
, alternatively we could keep those where necessary (example) and callSgkitSchema.spec
inside the functions (triggering validation etc).One more point: we could make it redundant to declare default names in schema, and if missing in schema, but requested: assume default name, check variable against the spec, and return name.
The text was updated successfully, but these errors were encountered: