Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(SE) Use swell_static_files_user as an alternate location for all tasks #360

Closed
Dooruk opened this issue Jun 20, 2024 · 6 comments · Fixed by #417
Closed

(SE) Use swell_static_files_user as an alternate location for all tasks #360

Dooruk opened this issue Jun 20, 2024 · 6 comments · Fixed by #417
Assignees
Labels
core development design related issues and improvements enhancement New feature or request

Comments

@Dooruk
Copy link
Collaborator

Dooruk commented Jun 20, 2024

Currently, swell_static_files_user is only used in generate_b_climatology_by_linking task. For others tasks and CI tests, the generic location for static files is /discover/nobackup/projects/gmao/advda/SwellStaticFiles. However, we could give users ability to use swell_static_files_user for their own static files as an alternative to the generic location.

@Dooruk Dooruk added enhancement New feature or request core development design related issues and improvements labels Jun 20, 2024
@ashiklom
Copy link
Collaborator

ashiklom commented Jul 3, 2024

I found references to swell_static_files in the following tasks:

  • GenerateBClimatology (but it seems to be unused?)
  • GenerateBClimatologyByLinking (already configured)
  • GetGeosRestart -- copy files from swell_static_files directory to target directory
  • PrepGeosRunDir -- copy files from swell_static_files directory to target directory
  • StageJedi -- Replace corresponding key in Jedi template

From what I can tell, in GenerateBClimatologyByLinking, the current behavior is that both swell_static_files_main and swell_static_files_user are searched, with preference given to swell_static_files_user and then, any additional files not present in the user directory are linked from swell_static_files_main.

Do we want similar behavior for the other tasks as well? I.e., We effectively pull in every file from swell_static_files_main and then treat any files present in swell_static_files_user effectively as optional overrides?

One key issue with this behavior is that there is no way to force swell to not copy over files from swell_static_files_main that it otherwise would. For example, if your swell_static_files_user contains everything in swell_static_files_main except a few files that you want excluded, those files will be included anyway --- there's no way to specify that certain files need to be skipped. In practice, that may not be a problem, but I wanted to flag it and give you a chance to weigh in.

Note that an insidious version of this is if you have the same files but the naming and/or directory structure of the files changes between swell_static_files and swell_static_files_user. Then, you may end up with multiple files/directories that, in practice, contain duplicates of the same data.

Personally, my vote would be to just define a task-specific static_files parameter with some default value that could be overridden and then force the user to provide a complete set of required static files even if it means copying. That's more work on the user in exchange for clarity of behavior --- you always know exactly where any given swell environment got its files. Whereas, under the current rules, a user may be surprised to see "extra" static files that were pulled in from the global location. I.e., This would mean changing the behavior of GenerateBClimatologyByLinking to something like:

        swell_static_files = self.config.swell_static_files()
        swell_static_files_user = self.config.swell_static_files_user(None)
        if swell_static_files_user is not None:
            swell_static_files = swell_static_files_user
# [...]
        source_paths = [os.path.join(swell_static_files, ...)]
        link_all_files...(self.logger, source_paths, target_path)

Alternatively, maybe the point is that, if you want to provide a complete replacement, you just set the swell_static_files parameter (as an override?) and swell_static_files_user is specifically for this "file merging" behavior.

Let me know what you think.

ashiklom added a commit that referenced this issue Jul 3, 2024
@Dooruk
Copy link
Collaborator Author

Dooruk commented Jul 5, 2024

Personally, my vote would be to just define a task-specific static_files parameter with some default value that could be overridden and then force the user to provide a complete set of required static files even if it means copying. That's more work on the user in exchange for clarity of behavior --- you always know exactly where any given swell environment got its files.

I thought about this and read your thoughts. I think users should choose only one of these options (user or global). As you mentioned, combining sources make it very confusing and hard to follow which files/configurations are being used. Also, in a way, users would get familiar with how to use SWELL and where files are coming from (given that we should provide documentation on that 😄). I personally learn the most when the code fails!

Alternatively, maybe the point is that, if you want to provide a complete replacement, you just set the swell_static_files parameter (as an override?) and swell_static_files_user is specifically for this "file merging" behavior.

I think user still should have full control of their local static folder and make their changes there. That's how Ricardo and I have been operating.

This is a separate concern but during the SWELL development meeting there was a brief discussion on version control with the swell_static_files. Sounds like there is no straightforward solution to that since it requires handling many different large file formats. My suggestion for now is that we can name the static files as the SWELL version and keep the previous static folders up to 4-5 most recent versions.

@ashiklom
Copy link
Collaborator

I thought about this and read your thoughts. I think users should choose only one of these options (user or global). As you mentioned, combining sources make it very confusing and hard to follow which files/configurations are being used. Also, in a way, users would get familiar with how to use SWELL and where files are coming from (given that we should provide documentation on that 😄). I personally learn the most when the code fails!

Got it. In that case, I think the correct implementation is not to have a separate swell_static_files_user, but rather to make swell_static_files an override-able configuration option (with an associated questionary question) that has the current swell_static_files value as the default. That will be a minor breaking change to GenerateBClimatologyByLinking, but shouldn't affect any of the other tasks.

This is a separate concern but during the SWELL development meeting there was a brief discussion on version control with the swell_static_files. Sounds like there is no straightforward solution to that since it requires handling many different large file formats. My suggestion for now is that we can name the static files as the SWELL version and keep the previous static folders up to 4-5 most recent versions.

Agreed that we should figure this out, and also that it should be a separate issue. I'll open that now.

@ashiklom
Copy link
Collaborator

ashiklom commented Jul 17, 2024

@Dooruk Sorry, one more question: Is there a use case where users would use swell_static_files_user for some tasks but swell_static_files for other tasks? Or should users really be using the same swell_static_files for all tasks within a single swell run?

If the latter, this becomes trivial because it already works --- we just drop swell_static_files_user altogether and just make people use the swell_static_files. If you need to change the static files, you just set the swell_static_files override.

If there are situations where, within a single swell run, you might want to use swell_static_files for some tasks but swell_static_files_user for other tasks, then we really do need a swell_static_files_user. That's also not hard to do.

@Dooruk
Copy link
Collaborator Author

Dooruk commented Jul 18, 2024

The more I think about it the more indecisive I get, but let me share my thoughts.

Short answer, yes an option to pick which tasks one would like to use their own folder would be useful, but I just worry this complicates usage even further. R2D2 does some of what you are proposing elegantly with the fetch order, but does not specify tasks.

In the case of GenerateBClimatology and StageJedi, one might be using SWELL in coupled mode and might want geos_atmosphere files from the shared static folder while using geos_marine files from their local folder.

In the case of GetGeosRestart and PrepGeosRunDir users should be able to point to their static folder with GEOS experiment settings and restarts. Actually, that should allow for external folders too not just the static folder, perhaps I can rewrite that task at some point.

There would be more tasks using this in the future.

@ashiklom
Copy link
Collaborator

Understood, thanks! For now, I'll implement swell_static_files_user for all tasks that currently support swell_static_files, with this logic (pseudocode):

swell_static_files = Config.swell_static_files()
swell_static_files_user = Config.swell_static_files_user()
if swell_static_files_user:
  swell_static_files = swell_static_files_user

do_stuff(swell_static_files)

I'm not thrilled about this design for multiple reasons, but pending large scale overhaul of how we do configs (per #314), this is probably the easiest thing to do.

@Dooruk Dooruk linked a pull request Sep 13, 2024 that will close this issue
ashiklom added a commit that referenced this issue Sep 17, 2024
* sles15 is now default platform, added to docs reflecting this and other additions (#407)

* swell_static_files_user in config now used in appropriate tasks (#360)

* addressed potential error on abort caused by undefined logger call

* renamed datetime utility to avoid hinting confusion with base package, added type hinting (#349)

* type hinting for all python methods (#349)

* removed unnecessary pass of logger to functions copy_platform_files and copy_eva_files

* type hinting in run_geos_executable uses Optional

* Hinting changes, Logger properly type hinted, pd.DataFrame now used, typing.Optional used where appropriate

* Update docs/configuring_cylc.md

Co-authored-by: Alexey Shiklomanov <[email protected]>

* added proper hinting for SilentUndefined instance in jinja2.py

* proper syntax for linking docs

* direct type hinting for FileHandler objects

* added Optional typing where appropriate

* added missing type hint in run_jedi_hofx_executable.py, fixed missing typing functions

* more types in hint

* more precise type hinting

* Resolved code test error

* more specific type hinting per mypy

* corrected logger instance in type hint

* Revised type hinting

* Revised datetime typing, other refinements

* Further revised type hinting

* Update src/swell/utilities/render_jedi_interface_files.py

Updated type hint to optional for consistency

Co-authored-by: Alexey Shiklomanov <[email protected]>

---------

Co-authored-by: Alexey Shiklomanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core development design related issues and improvements enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants