-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A prototype for alternative task and suite questions #491
base: develop
Are you sure you want to change the base?
Conversation
@mranst Just a query, as I notice you are using a fork. Do you not have write access here? |
I believe I do, I'm just used to working on my own fork. I should probably branch from here directly in the future. |
"skip_ensemble_hofx", | ||
"window_type" | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: Any reason not to store the question objects themselves directly in these objects? I.e., Something like:
import swell.suites.suite_questions as sq
# [...]
all_models = QuestionList(
list_type="model_dependent",
name="all_models",
questions=[
sq.cycle_times,
sq.ensemble_hofx_packets,
sq.ensemble_hofx_strategy,
sq.skip_ensemble_hofx,
sq.window_type
]
)
That eliminates an additional layer of indirection and lets us rely that much more on Python's static checkers to catch issues.
Moreover, if each question is a class or a function rather than a static object (i.e., something that takes arguments), we can put initialization parameters directly into this question list (if that's a capability we need). One use case of this might be specifying different defaults for different models; for example, multiple suites might prompt the user about skipping ensemble HOFX, but some suites might default to skipping HOFX while others might default to not skipping HOFX.
# [...]
question_list = [
cycle_times(retry=False),
skip_ensemble_hofx(default=False),
...
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, much more powerful that way
@dataclass | ||
class SwellQuestion: | ||
"""Basic dataclass for defining Swell questions for suites and tasks""" | ||
name: str | ||
dtype: str | ||
question_type: str | ||
default_value: str | ||
prompt: str | ||
depends: dict = None | ||
models: list = None | ||
ask_question: bool = False | ||
options: str = None | ||
|
||
def get(self, attr, default=None): | ||
return getattr(self, attr, default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More brainstorming: We should look for opportunities to be a bit stricter with types here. In particular, some of these things can be specific enums with specific values (for example, suite_type
and list_type
both have a small number of fixed values that would be well supported by enums). Other things can be more precise types or even custom swell-specific classes; for example, depends
is not just any dict but really a list of Swell questions with specific configurations (right?).
No problem. Just making sure there wasn't a GitHub issue! Forks are nice in that you have control! |
This is a solid first pass, nice work! It's great to see a real version of this, even in draft form. I left a few isolated thoughts as comments, and I may add more as we go. Big picture: I would "checkpoint" the current commit and now start doing a bit more exploration of the practical value of this refactor to fully justify it. Some ideas:
Note: Probably not all of these ideas will work out in practice. Also, some of these ideas will point to tweaks to the implementation of the Questions/Tasks classes and lists. |
Thanks, these are good points! I'll see what I can do |
@rtodling Per request, here's a simple example of the old method vs. the new method of defining questions: The old yaml syntax:Suite questionsswell/src/swell/suites/suite_questions.yaml Lines 1 to 29 in 53e69e9
Task questionsswell/src/swell/tasks/task_questions.yaml Lines 15 to 41 in 53e69e9
Each question defines what suites and tasks they belong to. Every single question is read in at the start, then whittled down depending on what is needed. Let's say that we're running The suite questions are read at the beginning, then Like with suites, listed task questions are input into a dictionary at the start, then The new syntax:Each suite and task is defined as a corresponding list of questions Suite questionsswell/src/swell/suites/suite_question_list.py Lines 47 to 57 in 53e69e9
Task questionsswell/src/swell/tasks/task_question_list.py Lines 46 to 54 in 53e69e9
When building the question dictionary, swell now only seeks out the questions listed under the suite and tasks that are applicable, though some whittling down is still necessary depending on model, etc. The properties of the questions themselves are defined separately: swell/src/swell/tasks/task_questions.py Lines 33 to 43 in 53e69e9
As Alexey noted, in its current form this PR is more of a syntax change than a fix to the underlying issues with the current system of questions, though there is potential to leverage python to address these issues. Like he noted above, we could assign different default values per-task or suite in the question lists. If you wanted different options for In var = SuiteQuestionList(
questions = [
sq.path_to_ensemble(default_value = 'lat-lon')
]
)
letkf = SuiteQuestionList(
questions = [
sq.path_to_ensemble(default_value = 'cubed')
]
) We could also set lists of commonly associated questions (like workflows that use JEDI, for example) to reduce the huge number of questions in the lists, and better use type-checking to verify validity. Edited to fix permalinks |
@mranst Thank you so much the example above - and the comments that come w/ it. I understand things considerably better now. I like the idea, or the potential, of eventually separating things a little more and if I understand it well, this is what the changes open the door for happening. I wouldn't have a problem having these initial changes go in, provided you can get zero diff in the tests - just stating the obvious. Thank you for writing the example down. |
This is a prototype for an alternative method of handling task and suite questions. Rather than having a set of questions that tell Swell what tasks they belong to, tasks and suites can be defined as a list of questions, whose properties are defined separately. All Swell questions have been refactored as python dataclass objects, and have had their task/suite declarations transformed into a list of questions associated with each task and suite.
prepare_config_and_suite.py
has been modified to accept this new approach. Rather than reading the entire list of all potential suite and task questions for everything in swell, only the list of questions for the particular suite being run are read, as well as only questions for the tasks needed by the suite.From what I’ve tested, it behaves almost identically to the previous method as far as creating
experiment.yaml
is concerned, with the exception that some questions appear in a different order, which should probably be fixed for readability.code_tests
will need to be modified as well, but I wanted to get opinions on whether this is even a good method.I believe the major positive to this method are that it is more intuitive to look at things from a suite and task perspective, rather than having questions defining what task and suites they are a part of, like Alexey said in (#314). Having questions defined per suite and task also gives some flexibility in letting us spread out definitions a bit, rather than having to read it all in a centralized place, then throw out what we don’t need. The python dataclasses also will allow for some more powerful options for value assignments.
As for cons, in its current configuration I’m not sure this does much to fix the concern of having all swell-wide questions within two files. It adds separate files for declaring all questions for each task and suites
task_question_list.py
,suite_question_list.py
, which we could spread out into separate files (maybe even into the existing task python files). It also doesn’t yet address the need to potentially have different defaults depending on the context. I think we could probably implement a simple system wheredefault_value
is defined as a dictionary with some basic dependencies like task or suite being run, but that might get messy if you have multiple tasks using the same question during setup.Let me know if this looks like a good way to proceed, or any suggestions!