Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Build configuration incrementally by allowing to combine multiple configuration files #2557

Merged
merged 61 commits into from
Nov 21, 2022

Conversation

ekouts
Copy link
Contributor

@ekouts ekouts commented Jul 4, 2022

This PR introduces some changes for the configuration:

  • ReFrame will always start from a basic minimal configuration with a generic system (local scheduler and launcher) and the builtin environment. It also includes the basic logging handler and handlers_perflog
  • The user can pass multiple configuration files and they will be added one after the other. If systems/env etc have the same name the last one will be kept.

EDIT @vkarak: This PR also adds and changes the following:

  • We warn on system, partition and environment redefinitions

  • All sections with unnamed objects are merged and resolved based on the
    selected system. As a result, this sections can be combined arbitrarily through
    as the chain of configuration files is loaded and the resulting internal config
    will always has a single such object, so that it is guaranteed that calls, such
    as get('general/0/<option>') will always yield the correct value.

  • Add a special section named handlers$ that contains the basic stream handler
    that handles the framework's output. This allows users to override the
    standard handlers without having to override the basic output handler.

  • Only a single stdout and stderr stream handler is considered and if a second
    one is defined this is silently ignored. This was done primarily for
    backward compatibility with the current user configurations, so that their
    framework stdout handler in handlers will be simply ignored in favor of the
    one in the special section handlers$.

  • Unit tests were updated and cleaned up.

Will close #1725 .

Todos

  • Update the docs with the latest updates related to this PR.

@vkarak vkarak requested review from vkarak, teojgo and victorusu July 12, 2022 10:32
Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a unit test to explicitly test the update_config() (your add_config() currently) as well as to update the docs. Also regarding the order that the various sections are combined we should be careful, since in several places in the code we assume the place 0 as the "default", since we don't expect to have another entry (see for example general/0/option-x). I've seen that you had tweak the unit tests to pass specifically for this and this can be dangerous. Perhaps we need place 0 to be the latest, but we must have a look at it more thoroughly. Playing with the --show-config option should help with this.

@ekouts ekouts added the 4.x label Aug 8, 2022
Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to focus a bit more on how we intend to use this feature.

@vkarak
Copy link
Contributor

vkarak commented Aug 9, 2022

Regarding the usage of this feature, I was thinking the following:

  1. We should not change-C to accept multiple configuration files.
  2. Have a RFM_CONFIG_PATH type of variable that reframe will look for a settings.py or settings.json file in each path and load it in order.
  3. The -C config_file will replace completely the configuration chain as of now unless it is written as -C +config_file in which case it will be applied incrementally on top of the configurations from the RFM_CONFIG_PATH.
  4. We could also cover the case of different configuration files per system by interpreting the --system option and then looking for a specific configuration file (we lose the auto-selection in this case, though).

@ekouts @victorusu What do you think?

@ekouts
Copy link
Contributor Author

ekouts commented Aug 9, 2022

Regarding the usage of this feature, I was thinking the following:

1. We should not change`-C` to accept multiple configuration files.

2. Have a `RFM_CONFIG_PATH` type of variable that reframe will look for a `settings.py` or `settings.json` file in each path and load it in order.

3. The `-C config_file` will replace completely the configuration chain as of now unless it is written as `-C +config_file` in which case it will be applied incrementally on top of the configurations from the `RFM_CONFIG_PATH`.

4. We could also cover the case of different configuration files per system by interpreting the `--system` option and then looking for a specific configuration file (we lose the auto-selection in this case, though).

@ekouts @victorusu What do you think?

  1. I am fine with that, the original suggestion from the issue was -C file1:file2:file3 so this is why I started with multiple files.
  2. I guess we also accept full path of configurations? Or do we want only files named settings.py?
  3. 👍
  4. I don't understand what you mean here. How we would look for the right config file?
  5. What happens now if the user doesn't give a config at all? We used to look by default in ${HOME}/.reframe/settings.{py,json}, ${RFM_INSTALL_PREFIX}/settings.{py,json}, /etc/reframe.d/settings.{py,json}. I guess we keep that the same? But then the user still needs to write all parts of his configurations, they cannot avoid the logging unless they know the path to the reframe/core/settings and use RFM_CONFIG_PATH.

@victorusu
Copy link
Contributor

Regarding the usage of this feature, I was thinking the following:

  1. We should not change-C to accept multiple configuration files.
  2. Have a RFM_CONFIG_PATH type of variable that reframe will look for a settings.py or settings.json file in each path and load it in order.
  3. The -C config_file will replace completely the configuration chain as of now unless it is written as -C +config_file in which case it will be applied incrementally on top of the configurations from the RFM_CONFIG_PATH.
  4. We could also cover the case of different configuration files per system by interpreting the --system option and then looking for a specific configuration file (we lose the auto-selection in this case, though).

@ekouts @victorusu What do you think?

  1. My issue with multiple files is the consistency... We have a plethora of different ways to pass various options. For instance, the -J option can be passed multiple times to aggregate different job options. If we use the -C with the proposed syntax -C file1:file2:file3, then it becomes inconsistent. I prefer to avoid accepting multiple files in a single option. We need to review this anyway for all the other options to make everything consistent. Probably, we should move to action-based commands, as the "new" tools use in the wild, as proposed by @teojgo.
  2. If I understand correctly, the RFM_CONFIG_PATH environment will accept the syntax path1:path2:path3, as is the normality for BASH. Then the settings.py and settings.json can be provided in those paths. I vote for that!
  3. I am not so sure if we should do the + syntax here... I would vote for consistency and use the same construction that we use for -J, based on the arguments I gave on 1.
  4. I agree with this. I am not sure how this will be used in practice, but this is in principle the right way of doing it.

@ekouts
Copy link
Contributor Author

ekouts commented Aug 9, 2022

@victorusu about (1) I think @vkarak suggests to keep -C as it was before, it will accept only one file and the user cannot pass multiple config files from the command line, so no inconsistency there. I only mentioned the user's suggestion since the intention was to give multiple config files from the cli. I think the env var would be enough.
About (2) I assume it would make sense to have config files with the machine name instead of different directories, like config/dom.py, config/daint.py etc instead of config/dom/settings.py, config/daint/settings.py, etc.. But if you both think it's better I am fine with it.

@vkarak
Copy link
Contributor

vkarak commented Aug 9, 2022

I guess we also accept full path of configurations? Or do we want only files named settings.py?

I would say no and just use directories with settings.py inside them, so as to be consistent with other PATH variables in Linux, as @victorusu explained.

I don't understand what you mean here. How we would look for the right config file?

That's easy; we simply look for a specific config file pattern that includes the system name, e.g., settings-daint.py instead of settings.py. But as @victorusu pointed out that it's not clear how we use that in practice, we can wait for this behaviour until we have a concrete example.

What happens now if the user doesn't give a config at all? We used to look by default in ${HOME}/.reframe/settings.{py,json}, ${RFM_INSTALL_PREFIX}/settings.{py,json}, /etc/reframe.d/settings.{py,json}. I guess we keep that the same? But then the user still needs to write all parts of his configurations, they cannot avoid the logging unless they know the path to the reframe/core/settings and use RFM_CONFIG_PATH.

For your first part of the question, the answer is yes. We will put those in the path. For the second, the user only needs to do -C +myconfig. But, frankly, it might be better to always include the reframe/core/settings.py regardless of whether the path is followed or not.

I am not so sure if we should do the + syntax here... I would vote for consistency and use the same construction that we use for -J, based on the arguments I gave on 1.

I cannot understand this very well.

@vkarak
Copy link
Contributor

vkarak commented Aug 9, 2022

I am a bit puzzled of what is the best for the default behaviour of the -C option, because eventually, I believe that we should always build incrementally on top of reframe/core/settings.py. So the most consistent behaviour would be for -C to add to the configuration to those found in the path, including the builtin. We could have a syntax such as -C :myconfig.py (or pick one) that would allow users to tell reframe to use their config file exclusively. What do you think?

@victorusu
Copy link
Contributor

I am a bit puzzled of what is the best for the default behaviour of the -C option, because eventually, I believe that we should always build incrementally on top of reframe/core/settings.py. So the most consistent behaviour would be for -C to add to the configuration to those found in the path, including the builtin. We could have a syntax such as -C :myconfig.py (or pick one) that would allow users to tell reframe to use their config file exclusively. What do you think?

I agree that -C should always build incrementally. Than we should allow to pass multiple -Cs... Which means that this cmdline would be acceptable -C config1.py -C config2.json -C config3.py

Having said that, I am not sure if the -C :myconfig.py is the right syntax. Probably yes, but I am not so sure...
When passing multiple options one would do the following...

reframe -C config1.py -C config2.json -C :config3.py

Is it ok?

@victorusu
Copy link
Contributor

I am a bit puzzled of what is the best for the default behaviour of the -C option, because eventually, I believe that we should always build incrementally on top of reframe/core/settings.py. So the most consistent behaviour would be for -C to add to the configuration to those found in the path, including the builtin. We could have a syntax such as -C :myconfig.py (or pick one) that would allow users to tell reframe to use their config file exclusively. What do you think?

I agree that -C should always build incrementally. Than we should allow to pass multiple -Cs... Which means that this cmdline would be acceptable -C config1.py -C config2.json -C config3.py

Having said that, I am not sure if the -C :myconfig.py is the right syntax. Probably yes, but I am not so sure... When passing multiple options one would do the following...

reframe -C config1.py -C config2.json -C :config3.py

Is it ok?

The person who wrote this (👆) doesn't know how to write in English... Sorry for that... The correct spelling is the following...

I am a bit puzzled of what is the best for the default behaviour of the -C option, because eventually, I believe that we should always build incrementally on top of reframe/core/settings.py. So the most consistent behaviour would be for -C to add to the configuration to those found in the path, including the builtin. We could have a syntax such as -C :myconfig.py (or pick one) that would allow users to tell reframe to use their config file exclusively. What do you think?

I agree that -C should always build incrementally. Then we should allow passing multiple -Cs... This means that this command line would be acceptable -C config1.py -C config2.json -C config3.py

Having said that, I am not sure if the -C :myconfig.py is the right syntax. Probably yes, but I am not so sure... When passing multiple options one would do the following...

reframe -C config1.py -C config2.json -C :config3.py

Is it ok? What do you think?

@vkarak
Copy link
Contributor

vkarak commented Aug 10, 2022

If we say that the -C option can be specified multiple times, then we don't need the RFM_CONFIG_PATH. Regarding the -C :myconfig.py option, we need a way to break the chain, that's why I propose it. If a :config entry appears, then this will supersede all configs previously loaded. Let me think a bit about it, cos this is another breaking change.

- Update the docs based on the new configuration capabilities
- Minimize the use of rfmdocstart/end annotations
@vkarak
Copy link
Contributor

vkarak commented Nov 12, 2022

@ekouts could you rerun the doc listings on Daint? I have changed substantially the way the configuration is presented and we'd need to rerun.

@ekouts
Copy link
Contributor Author

ekouts commented Nov 18, 2022

@jenkins-cscs retry all

@ekouts
Copy link
Contributor Author

ekouts commented Nov 18, 2022

I cannot approve because I am the original author, but ltgm

vkarak and others added 4 commits November 18, 2022 22:50
The configs that showcase the `resources` and the `container_platforms` are made
minimal redefining only Daint.
With the new configuration mechanism, these configs are still valid if combined
with the baseline `daint.py`.
@ekouts
Copy link
Contributor Author

ekouts commented Nov 21, 2022

@vkarak looks good to me. If As soon as the CI passes let's merge :)

@vkarak vkarak changed the title [feat] Support multiple configuration files [feat] Build configuration incrementally by allowing to combine multiple configuration files Nov 21, 2022
@vkarak vkarak merged commit 37646f7 into reframe-hpc:master Nov 21, 2022
@ekouts ekouts deleted the feat/configuration_scopes branch March 13, 2023 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support configuration scopes
5 participants