A prototype for alternative task and suite questions #491

mranst · 2025-01-17T20:45:13Z

This is a prototype for an alternative method of handling task and suite questions. Rather than having a set of questions that tell Swell what tasks they belong to, tasks and suites can be defined as a list of questions, whose properties are defined separately. All Swell questions have been refactored as python dataclass objects, and have had their task/suite declarations transformed into a list of questions associated with each task and suite.

prepare_config_and_suite.py has been modified to accept this new approach. Rather than reading the entire list of all potential suite and task questions for everything in swell, only the list of questions for the particular suite being run are read, as well as only questions for the tasks needed by the suite.

From what I’ve tested, it behaves almost identically to the previous method as far as creating experiment.yaml is concerned, with the exception that some questions appear in a different order, which should probably be fixed for readability. code_tests will need to be modified as well, but I wanted to get opinions on whether this is even a good method.

I believe the major positive to this method are that it is more intuitive to look at things from a suite and task perspective, rather than having questions defining what task and suites they are a part of, like Alexey said in (#314). Having questions defined per suite and task also gives some flexibility in letting us spread out definitions a bit, rather than having to read it all in a centralized place, then throw out what we don’t need. The python dataclasses also will allow for some more powerful options for value assignments.

As for cons, in its current configuration I’m not sure this does much to fix the concern of having all swell-wide questions within two files. It adds separate files for declaring all questions for each task and suites task_question_list.py, suite_question_list.py, which we could spread out into separate files (maybe even into the existing task python files). It also doesn’t yet address the need to potentially have different defaults depending on the context. I think we could probably implement a simple system where default_value is defined as a dictionary with some basic dependencies like task or suite being run, but that might get messy if you have multiple tasks using the same question during setup.

Let me know if this looks like a good way to proceed, or any suggestions!

mathomp4 · 2025-01-20T14:17:28Z

@mranst Just a query, as I notice you are using a fork. Do you not have write access here?

mranst · 2025-01-21T14:17:07Z

@mranst Just a query, as I notice you are using a fork. Do you not have write access here?

I believe I do, I'm just used to working on my own fork. I should probably branch from here directly in the future.

ashiklom · 2025-01-21T17:21:36Z

src/swell/suites/suite_question_list.py

+        "skip_ensemble_hofx",
+        "window_type"
+    ]
+)


Thinking out loud: Any reason not to store the question objects themselves directly in these objects? I.e., Something like:

import swell.suites.suite_questions as sq # [...] all_models = QuestionList( list_type="model_dependent", name="all_models", questions=[ sq.cycle_times, sq.ensemble_hofx_packets, sq.ensemble_hofx_strategy, sq.skip_ensemble_hofx, sq.window_type ] )

That eliminates an additional layer of indirection and lets us rely that much more on Python's static checkers to catch issues.

Moreover, if each question is a class or a function rather than a static object (i.e., something that takes arguments), we can put initialization parameters directly into this question list (if that's a capability we need). One use case of this might be specifying different defaults for different models; for example, multiple suites might prompt the user about skipping ensemble HOFX, but some suites might default to skipping HOFX while others might default to not skipping HOFX.

# [...] question_list = [ cycle_times(retry=False), skip_ensemble_hofx(default=False), ... ]

Great point, much more powerful that way

ashiklom · 2025-01-21T17:38:38Z

src/swell/utilities/swell_questions.py

+@dataclass
+class SwellQuestion:
+    """Basic dataclass for defining Swell questions for suites and tasks"""
+    name: str
+    dtype: str
+    question_type: str
+    default_value: str
+    prompt: str
+    depends: dict = None
+    models: list = None
+    ask_question: bool = False
+    options: str = None
+
+    def get(self, attr, default=None):
+        return getattr(self, attr, default)


More brainstorming: We should look for opportunities to be a bit stricter with types here. In particular, some of these things can be specific enums with specific values (for example, suite_type and list_type both have a small number of fixed values that would be well supported by enums). Other things can be more precise types or even custom swell-specific classes; for example, depends is not just any dict but really a list of Swell questions with specific configurations (right?).

mathomp4 · 2025-01-21T18:27:30Z

@mranst Just a query, as I notice you are using a fork. Do you not have write access here?

I believe I do, I'm just used to working on my own fork. I should probably branch from here directly in the future.

No problem. Just making sure there wasn't a GitHub issue! Forks are nice in that you have control!

ashiklom · 2025-01-21T18:35:00Z

This is a solid first pass, nice work! It's great to see a real version of this, even in draft form.

I left a few isolated thoughts as comments, and I may add more as we go. Big picture: I would "checkpoint" the current commit and now start doing a bit more exploration of the practical value of this refactor to fully justify it. Some ideas:

Be stricter/more sophisticated with our types, to start taking advantage of Python's static analysis tools.
Simplify our implementation of the relationship between questions and tasks. I.e., Right now, there's a lot of reasonably complex code for actually asking the questions, processing the answers, and then passing them into the tasks. I would love to simplify the design of that machinery through more elegant object-oriented composition/inheritance in Python.
Shrink and/or reorganize the current huge task questions list? E.g., If there's a lot of redundancy in questions across tasks, maybe we can capture that redundancy in lists that multiple tasks can share. Or, rather than defining questions in a single big file, maybe we define questions related to specific Swell functionality alongside the implementation of that functionality (e.g., put JEDI-related questions next to Swell's JEDI-related functions).

Note: Probably not all of these ideas will work out in practice. Also, some of these ideas will point to tweaks to the implementation of the Questions/Tasks classes and lists.

mranst · 2025-01-21T19:03:49Z

This is a solid first pass, nice work! It's great to see a real version of this, even in draft form.

I left a few isolated thoughts as comments, and I may add more as we go. Big picture: I would "checkpoint" the current commit and now start doing a bit more exploration of the practical value of this refactor to fully justify it. Some ideas:
1. Be stricter/more sophisticated with our types, to start taking advantage of Python's static analysis tools.

2. Simplify our implementation of the relationship between questions and tasks. I.e., Right now, there's a lot of reasonably complex code for actually asking the questions, processing the answers, and then passing them into the tasks. I would love to simplify the design of that machinery through more elegant object-oriented composition/inheritance in Python.

3. Shrink and/or reorganize the current huge task questions list? E.g., If there's a lot of redundancy in questions across tasks, maybe we can capture that redundancy in lists that multiple tasks can share. Or, rather than defining questions in a single big file, maybe we define questions related to specific Swell functionality alongside the implementation of that functionality (e.g., put JEDI-related questions next to Swell's JEDI-related functions).
Note: Probably not all of these ideas will work out in practice. Also, some of these ideas will point to tweaks to the implementation of the Questions/Tasks classes and lists.

Thanks, these are good points! I'll see what I can do

mranst · 2025-01-24T18:19:32Z

@rtodling Per request, here's a simple example of the old method vs. the new method of defining questions:

The old yaml syntax:

Suite questions

swell/src/swell/suites/suite_questions.yaml

Lines 1 to 29 in 53e69e9

    
           cycle_times: 
        
             ask_question: True 
        
             default_value: defer_to_model 
        
             options: defer_to_model 
        
             models: 
        
             - all 
        
             suites: 
        
             - 3dfgat_atmos 
        
             - 3dfgat_cycle 
        
             - 3dvar_atmos 
        
             - 3dvar_cycle 
        
             - 3dvar 
        
             - convert_ncdiags 
        
             - forecast_geos 
        
             - hofx 
        
             - localensembleda 
        
             - ufo_testing 
        
             prompt: Enter the cycle times for this model. 
        
             type: string-check-list 
        
           ensemble_hofx_strategy: 
        
             ask_question: False 
        
             default_value: 'defer_to_model' 
        
             models: 
        
             - all 
        
             suites: 
        
             - localensembleda 
        
             prompt: Enter the ensemble hofx strategy. 
        
             type: string

Task questions

swell/src/swell/tasks/task_questions.yaml

Lines 15 to 41 in 53e69e9

    
           analysis_variables: 
        
             ask_question: false 
        
             default_value: defer_to_model 
        
             models: 
        
             - all 
        
             options: defer_to_model 
        
             prompt: What are the analysis variables? 
        
             tasks: 
        
             - GenerateBClimatology 
        
             - PrepareAnalysis 
        
             - RunJediConvertStateSoca2ciceExecutable 
        
             - RunJediEnsembleMeanVariance 
        
             - RunJediFgatExecutable 
        
             - RunJediVariationalExecutable 
        
             type: string-check-list 
        
           background_error_model: 
        
             ask_question: false 
        
             default_value: defer_to_model 
        
             models: 
        
             - all 
        
             options: defer_to_model 
        
             prompt: Which background error model do you want to use? 
        
             tasks: 
        
             - GenerateBClimatology 
        
             - GenerateBClimatologyByLinking 
        
             type: string-drop-list

Each question defines what suites and tasks they belong to. Every single question is read in at the start, then whittled down depending on what is needed. Let's say that we're running 3dvar, which only has PrepareAnalysis as a task, as defined in it's flow.cylc (just a crude example).

The suite questions are read at the beginning, then ensemble_hofx_strategy is eliminated since 3dvar is not listed in its applicable suites.

Like with suites, listed task questions are input into a dictionary at the start, then background_error_model is deleted, while analysis_variables remains. This method necessitates all questions for everything in Swell to be read at the start, then removed as appropriate. Being in yaml format, it also complicates dynamically setting default values based on suite or task.

The new syntax:

Each suite and task is defined as a corresponding list of questions

Suite questions

swell/src/swell/suites/suite_question_list.py

Lines 47 to 57 in 53e69e9

    
           _3dvar = SuiteQuestionList( 
        
               name="3dvar", 
        
               questions=[ 
        
                   sq.cycle_times, 
        
                   sq.final_cycle_point, 
        
                   sq.marine_models, 
        
                   sq.model_components, 
        
                   sq.runahead_limit, 
        
                   sq.start_cycle_point 
        
               ] 
        
           )

Task questions

swell/src/swell/tasks/task_question_list.py

Lines 46 to 54 in 53e69e9

    
           PrepareAnalysis = TaskQuestionList( 
        
               name="PrepareAnalysis", 
        
               questions=[ 
        
                   tq.analysis_forecast_window_offset, 
        
                   tq.analysis_variables, 
        
                   tq.mom6_iau, 
        
                   tq.total_processors 
        
               ] 
        
           )

When building the question dictionary, swell now only seeks out the questions listed under the suite and tasks that are applicable, though some whittling down is still necessary depending on model, etc. The properties of the questions themselves are defined separately:

swell/src/swell/tasks/task_questions.py

Lines 33 to 43 in 53e69e9

    
           analysis_variables = SwellQuestion( 
        
               name="analysis_variables", 
        
               question_type="task", 
        
               default_value="defer_to_model", 
        
               options="defer_to_model", 
        
               models=[ 
        
                   "all_models" 
        
               ], 
        
               prompt="What are the analysis variables?", 
        
               dtype="string-check-list" 
        
           )

As Alexey noted, in its current form this PR is more of a syntax change than a fix to the underlying issues with the current system of questions, though there is potential to leverage python to address these issues. Like he noted above, we could assign different default values per-task or suite in the question lists. If you wanted different options for path_to_ensemble depending on suite, for example:

In suite_question_list.py:

var = SuiteQuestionList(
    questions = [
        sq.path_to_ensemble(default_value = 'lat-lon')
    ]
)
letkf = SuiteQuestionList(
    questions = [
        sq.path_to_ensemble(default_value = 'cubed')
    ]
)

We could also set lists of commonly associated questions (like workflows that use JEDI, for example) to reduce the huge number of questions in the lists, and better use type-checking to verify validity.

Edited to fix permalinks

rtodling · 2025-02-01T16:03:27Z

@mranst Thank you so much the example above - and the comments that come w/ it. I understand things considerably better now. I like the idea, or the potential, of eventually separating things a little more and if I understand it well, this is what the changes open the door for happening.

I wouldn't have a problem having these initial changes go in, provided you can get zero diff in the tests - just stating the obvious.

Thank you for writing the example down.

Initial commit

e1bcd20

mranst requested review from ashiklom and Dooruk January 17, 2025 20:45

ashiklom reviewed Jan 21, 2025

View reviewed changes

mranst added 3 commits January 23, 2025 15:10

questions are now directly stored in question_lists

6b537f0

revised question type hinting

e1f063a

Merge branch 'GEOS-ESM:develop' into alt_questions

52d04f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A prototype for alternative task and suite questions #491

A prototype for alternative task and suite questions #491

mranst commented Jan 17, 2025

mathomp4 commented Jan 20, 2025

mranst commented Jan 21, 2025

ashiklom Jan 21, 2025

mranst Jan 21, 2025

ashiklom Jan 21, 2025

mathomp4 commented Jan 21, 2025

ashiklom commented Jan 21, 2025

mranst commented Jan 21, 2025

mranst commented Jan 24, 2025 •

edited

Loading

rtodling commented Feb 1, 2025

A prototype for alternative task and suite questions #491

Are you sure you want to change the base?

A prototype for alternative task and suite questions #491

Conversation

mranst commented Jan 17, 2025

mathomp4 commented Jan 20, 2025

mranst commented Jan 21, 2025

ashiklom Jan 21, 2025

Choose a reason for hiding this comment

mranst Jan 21, 2025

Choose a reason for hiding this comment

ashiklom Jan 21, 2025

Choose a reason for hiding this comment

mathomp4 commented Jan 21, 2025

ashiklom commented Jan 21, 2025

mranst commented Jan 21, 2025

mranst commented Jan 24, 2025 • edited Loading

The old yaml syntax:

Suite questions

Task questions

The new syntax:

Suite questions

Task questions

rtodling commented Feb 1, 2025

mranst commented Jan 24, 2025 •

edited

Loading