-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(SE) task_question.yaml
: a design issue
#314
Comments
task_question.yaml
: a design issue
I will add some thoughts/feedback here somewhat regarding this issue. In #377, I'm adding a new configuration key To be able to use the templated version I overall like the |
A major caveat to any significant changes here is that That said, I agree with @rtodling that I'm not a huge fan of the big I think a better approach is to stop storing internal data in YAML and instead store data in Python classes (especially, taking advantage of Python's (1) With a bit of effort, we can use Python's static type checking or the libraries themselves to enforce data validity, without having to write our own assertions. (2) We can implement more sophisticated logic into default values (e.g., default values can be set dynamically based on other values); (3) We can rely on inheritance to reduce the amount of information that each class has to store (and, more importantly, the number of places where a given piece of information would have to be updated). Right now, these are just musings. I'm going to try to do a personal deep dive into how swell uses this information and do some experimentation with alternative design patterns. As I have more concrete ideas and design sketches, I'll post them here (and I would encourage others to post their ideas as well!). |
Another quick thought (while it's fresh): The current design of the E.g., from cycle_times:
ask_question: True
default_value: defer_to_model
options: defer_to_model
models:
- all
suites:
- 3dfgat_atmos
- 3dvar_atmos
- 3dvar_cycle
- 3dvar
- convert_ncdiags
- forecast_geos
- hofx
- localensembleda
- ufo_testing
prompt: Enter the cycle times for this model.
type: string-check-list I wonder if a better general design pattern is to somehow organize this around suites or tasks, such that for a given suite, you had a complete list of the corresponding questions. Pseudocode: from dataclasses import dataclass
# General class for a SwellQuestion
@dataclass
class SwellQuestion():
dtype: str
default_value: str
options: str
prompt: str
# Add some common question methods here.
# Example of declaring a full SWELL question
cycle_times = SwellQuestion(
dtype = "string-check-list",
default_value = "defer_to_model",
options = "defer_to_model",
prompt = "Enter the cycle times for this model"
)
# Keep declaring other questions
start_cycle_point = SwellQuestion(...)
final_cycle_point = SwellQuestion(...)
runahead_limit = SwellQuestion(...)
model_components = SwellQuestion(...)
##############################
# Now, we move to defining suites, also as a class that defines, among other
# things, a list of questions associated with that suite.
@dataclass
class SwellSuite():
questions: list
# ...some other configs?
s3dfgat_atmos = SwellSuite(questions = [
cycle_times(), # Default behavior
start_cycle_point(...), # Modify the behavior somewhat
final_cycle_point(...), # Modify the behavior somewhat
runahead_limit, # Default behavior
model_components # Default behavior
]) This needs a lot more work, but the more I look at it and think about it, the more I like this overall design. I'll try to post a more complete example here as I come up with it; that will hopefully be easier to comment on, but maybe even this initial example could provide some inspiration. |
I agree with the notion that if we are to make a major design philosophy change it's better to do it sooner. This would impact our CI workflow strategy too. We should definitely have a broader discussion with more GMAO developers/group leads. That way, people would feel involved and invest from the (second) beginning which would inherently help during the transition/adoption phase.
These two sound very appealing to me. As @rtodling mentioned, static default has been a hassle. However, I would like to stick with a built-in functionality before we commit to another external library unless it benefit us quite a bit.
We were using inheritence earlier during the development stage but this kind of got out of control in terms of tracking what class is coming from what part of the code. Perhaps with a different design strategy that could be prevented? With the current design, |
Just noticing this thread. Note that for the GCM we expect (hope) to support a hierarchy of questions so that questions relevant to a component can be maintained in the repository for that component. E.g., Ocean questions would only "found" if the GCM question about ocean, select one. |
I am finding more and more that a single task_question file to handle all question that can ever be asked of swell for all possible tasks is a true bottle neck.
many questions for a particular task have nothing to do w/ another. Keep them in the same file causes confusion
the way things are also prevents generality; for example: the LETKF uses a cubed ensemble; the Variational analysis uses a lat-lon ensemble; the GetEnsemble task refers to a variable path_to_ensemble which means only one default can be set right - preventing us from being able to set defaults for two completely independent suites LETKF and Var. Sure there are ways around it be they become entwined w/ more variables needing to be defined w/o need.
default npc and npy can only be given as general settings - equal to all applications - but that will not be the case.
There're other examples of the inconvenience of this single all-purpose file. It would be nice to rethink this.
The text was updated successfully, but these errors were encountered: