Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cycling FGAT for geos_marine and towards erasing geos_ocean #419

Merged
merged 26 commits into from
Oct 16, 2024

Conversation

Dooruk
Copy link
Collaborator

@Dooruk Dooruk commented Sep 12, 2024

This is a step towards getting rid of geos_ocean model section and sticking with geos_marine but that will require further testing and verification beyond the scope of this PR.

For context, SOCA (marine component of JEDI) is designed to handle different model types such as ocean(mom6), sea-ice (cice6), waves(ww3), and BGC. This means, there will be optional tasks (such as SOCA2CICE) that not need triggered while CICE model is not active. Additionally, there are optional configuration entries for the input config YAML, for instance ice_filename and wav_filename below in the state section are optional entries. More detailed information here.

state:
  read_from_file: 1
  basename: ./INPUT/
  ocn_filename: MOM.res.nc
  ice_filename: cice.res.nc
  wav_filename: wav.res.nc
  date: 2018-04-15T00:00:00Z
  state variables: [cicen, hicen, socn, tocn, ssh, hocn, uocn, vocn, swh]

To handle this, this PR introduces a new key called marine_models to be used in flow.cylc and various tasks. Jinja is able to utilize this key using if statements, I tested it with both flow.cylc and EVA YAMLs.

Second, I added a dynamic_keys feature, (see utilities/render_jedi_interface_files.py). These are entries that are dynamic, for instance states in PseudoModel would be dependent on window_length and background_frequency as well as marine_models, see below for it's output, it has two entries because background_frequency is 3H and window_length is 6H:

name: PseudoModel
states:
- basename: ./
    date: '2021-07-02T06:00:00Z'
    ice_filename: ice.fc.2021-07-02T03:00:00Z.PT3H.nc
    ocn_filename: ocn.fc.2021-07-02T03:00:00Z.PT3H.nc
    read_from_file: 1
- basename: ./
    date: '2021-07-02T09:00:00Z'
    ice_filename: ice.fc.2021-07-02T03:00:00Z.PT6H.nc
    ocn_filename: ocn.fc.2021-07-02T03:00:00Z.PT6H.nc
    read_from_file: 1
tstep: PT3H

@Dooruk Dooruk changed the title Cycling FGAT for marine Cycling FGAT for geos_marine and towards erasing geos_ocean Sep 12, 2024
@Dooruk Dooruk changed the title Cycling FGAT for geos_marine and towards erasing geos_ocean Cycling FGAT for geos_marine and towards erasing geos_ocean Sep 12, 2024
@Dooruk
Copy link
Collaborator Author

Dooruk commented Sep 25, 2024

Thank you for the reviews! Before this one, the following needs to be merged:
#430

Then I will work on your suggestions.

@Dooruk
Copy link
Collaborator Author

Dooruk commented Oct 7, 2024

I'm still struggling with kwargs typehinting. I tried @mranst's suggestion Union[int, str, None] = None but that didn't work unfortunately. For now I commented it out but I will test the rest of this PR and come back to it.

Also, following @ashiklom's suggestion on using cwd for subprocess, I will not providing a default so it will be cwd: str. Hence, it will always throw an error if it's not defined, I think that makes sense?. This is the part I'm talking about:

def run_track_log_subprocess(
logger: Logger,
command: Union[list[str], str],
output_log: Optional[str] = None
) -> None:

@ashiklom
Copy link
Collaborator

ashiklom commented Oct 7, 2024

**kwargs don't need to be type hinted. The main situation in which you would type hint them is if all the arguments had the same type (e.g., every **kwargs was an integer). If that's not the case, then the type hint is confusing and, more importantly, defeats a lot of the flexibility of using kwargs in the first place.

Unfortunately, type hinters don't seem to be smart enough to propagate type checking through kwargs, but that's OK --- we shouldn't let the perfect be the enemy of the good!

If you use **kwargs, you don't need to propagate the cwd argument directly. However, if you want to propagate it anyway, I would just follow the semantics of the parent function subprocess.run and default to cwd: Optional(str) = None, where cwd = None means use the current working directory.

@Dooruk
Copy link
Collaborator Author

Dooruk commented Oct 10, 2024

So I have one issue (more of a design flaw 🥲 ) left related to low-level design of task/suite/experiment config creation but this is good to go. For now I will set a hard-coded value in the task and move on but I can also create an issue depending on the comments.

The issue arises a key in task_questions.yaml or suite_questions.yaml needs to be used by a conditional task in flow.cylc. Following is the entry in my task_questions.yaml. I don't see cice6_domains in my experiment.yaml after swell create step:

cice6_domains:
  ask_question: true
  default_value: defer_to_model
  models:
  - geos_marine
  options: defer_to_model
  prompt: Which CICE6 domains do you wish to run DA for?
  tasks:
  - RunJediConvertStateSoca2ciceExecutable
  type: string-check-list

Only the RunJediConvertStateSoca2ciceExecutable task requires this. In flow.cylc, this task is used only when cice6 is active:

    {% if 'cice6' in marine_models %}

    [[RunJediConvertStateSoca2ciceExecutable-{{model_component}}]]
        script = "swell task RunJediConvertStateSoca2ciceExecutable $config -d $datetime -m {{model_component}}"
        platform = {{platform}}
        execution time limit = {{scheduling["RunJediConvertStateSoca2ciceExecutable"]["execution_time_limit"]}}
        [[[directives]]]
        {%- for key, value in scheduling["RunJediConvertStateSoca2ciceExecutable"]["directives"][model_component].items() %}
            --{{key}} = {{value}}
        {%- endfor %}

    {% endif %}

While creating experiment.yaml, Swell scans flow.cylc [runtime] section to find unique tasks and hence to define associated questions (keys) in task_questions.yaml. marine_models was not defined during this non-exhaustive search step, hence this task inside the conditional is essentially not registered. I'm talking about at this stage of the code:

suite_str = template_string_jinja2(self.logger, self.suite_str, self.experiment_dict, True)

Here, self.experiment_dict contains model_components so tasks inside those conditionals work:

    {% for model_component in model_components %}

    [[LinkGeosOutput-{{model_component}}]]
        script = "swell task LinkGeosOutput $config -d $datetime -m {{model_component}}"
    {% endfor %}

Anyways, this was really long explanation but happy to explain even further, I had to do some digging to find this issue.

@Dooruk Dooruk marked this pull request as ready for review October 10, 2024 15:43
src/swell/tasks/link_geos_output.py Outdated Show resolved Hide resolved
src/swell/tasks/link_geos_output.py Outdated Show resolved Hide resolved
src/swell/tasks/link_geos_output.py Outdated Show resolved Hide resolved
src/swell/tasks/link_geos_output.py Outdated Show resolved Hide resolved
src/swell/tasks/link_geos_output.py Outdated Show resolved Hide resolved
src/swell/utilities/data_assimilation_window_params.py Outdated Show resolved Hide resolved
@Dooruk Dooruk requested a review from ashiklom October 11, 2024 19:01
Copy link
Collaborator

@ashiklom ashiklom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone else should review the DA aspects, but code looks good. Appreciate the documentation!

@Dooruk Dooruk merged commit 7d88fc6 into develop Oct 16, 2024
15 checks passed
@Dooruk Dooruk deleted the feature/fgat_cycle branch October 16, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants