Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map rules to components #10609

Merged
merged 6 commits into from
May 30, 2023
Merged

Map rules to components #10609

merged 6 commits into from
May 30, 2023

Conversation

jan-cerny
Copy link
Collaborator

@jan-cerny jan-cerny commented May 23, 2023

Description:

This PR introduces a mapping from operating system components to rules. This mapping is stored in YAML files in components directory. The "component" is a package or a set of related packages.

All rules from the linux_os benchmark directory have been assigned to a component. Most of the rules are mapped to at least 1 component. The remaining rules are put into a generic "operating system" component. This mapping is currently very crude. It will have to be improved manually. Some rules will have to be add to more components or moved to a different component.

The PR also adds a new script utils/assign_components.py which can be used for semi-automated component assignment for rules. The components are assigned to rules in "obvious" cases, such as:

  • the name of the component is a part of the rule ID
  • the rule is in a Group that is dedicated to a specific component
  • the rule is package_installed, package_removed, etc.
  • the rule uses templates where we can easily infer the component based
    on the template or a template parameter

The script uses data in CSV files that have been manually improved and they capture current state of the linux_os benchmark. The script generates a YAML otuput to a given directory where each YAML file will represent 1 component.

The utils/assign_components.py script and CSV data is immediately removed by a subsequent commit in this PR.

More items can be found in the follow up PR #10638.

For more details, please read the commit messages of every commit.

Rationale:

When a rule's behavior needs to be changed / updated, it is likely that other related rules need to be changed / updated as well.

One of the goals of this mapping is to enable content maintainers to see what other rules may require attention.
Another goal is better interface to communicate with component maintainers.

Also, this change is a prerequisite for the "minor release agnostic content".

Review Hints:

Review the contents of YAML files in the components directory.

Summary:

This PR will add:

  • component-rule mapping by YAML files
  • documentation describing format of these "component" files

The one-off script that was used to generate the YAML files is added and the removed by another commit, so it isn't visible in the "Files changed" tab.

There are no changes in the build system and no changes in tests.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Used by openshift-ci bot. label May 23, 2023
@openshift-ci
Copy link

openshift-ci bot commented May 23, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@github-actions
Copy link

Start a new ephemeral environment with changes proposed in this pull request:

Fedora Environment
Open in Gitpod

Oracle Linux 8 Environment
Open in Gitpod

@jan-cerny jan-cerny force-pushed the components branch 3 times, most recently from 2428fc5 to 97a5ccf Compare May 24, 2023 11:31
@matejak matejak self-assigned this May 25, 2023
rule_to_components = ssg.components.rule_components_mapping(components)
linux_os_guide_dir = os.path.join(args.source_dir, "linux_os", "guide")
rules_with_component = set(rule_to_components.keys())
rules_in_benchmark = set(ssg.rules.find_all_rules(linux_os_guide_dir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over built rules is probably better, because it opens doors to examining remediations and checks as well, without having to deal with macros. In-source rule contents are unreliable, as there can be important unexpanded macros.
Moreover, you already use the product context in the code, so it is just one step from using the compiled contents.

return components


template_to_component = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of moving this directly to templates (in the future)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to do that

@jan-cerny jan-cerny added Infrastructure Our content build system enhancement General enhancements to the project. Highlight This PR/Issue should make it to the featured changelog. labels May 25, 2023
@jan-cerny jan-cerny force-pushed the components branch 2 times, most recently from 8ffdf76 to 2acf002 Compare May 26, 2023 12:49
jan-cerny added 3 commits May 26, 2023 17:26
This commit adds a new script `utils/assign_components.py` which
can be used for semi-automated component assignment for rules.
The components are assigned to rules in  "obvious" cases, such as:
- the name of the component is a part of the rule ID
- the rules is in a Group that is dedicated to a specific component
- the rule is package_installed, package_removed, etc.
- the rule uses templates where we can easily infer the component based
  on the template or a template parameter
The script uses data in CSV files that have been manually improved
and they capture current state of the `linux_os` benchmark.
The script generates a YAML otuput to a given directory where each
YAML file will represent 1 component. We expect an immediate removal
of this script and CSV data.
This mapping has been generated by script `utils/assign_components.py`.
All rules from the `linux_os` benchmark directory have been assigned to
a component. Most of the rules are mapped to at least 1 component.  The
remaining rules are put into "unknown" component, but there is only 234
rules that are in the "unknown" component.  This mapping is wild. It
will have to be improved manually. Some rules will have to be add to
more components or moved to a different component.
@jan-cerny
Copy link
Collaborator Author

I have split the PR to #10638 and I have updated the description.

@jan-cerny jan-cerny marked this pull request as ready for review May 26, 2023 16:05
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Used by openshift-ci bot. label May 26, 2023
@vojtapolasek
Copy link
Collaborator

Hello @jan-cerny,
this looks impressive, nice job.
I found just a small problem; the "packages" key is not documented among the keys which can be in the component definition.

@jan-cerny
Copy link
Collaborator Author

@vojtapolasek Thank you!

I have added the packages key to the doc.

@codeclimate
Copy link

codeclimate bot commented May 29, 2023

Code Climate has analyzed commit 0fb59ff and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 52.5% (0.0% change).

View more on Code Climate.

@matejak
Copy link
Member

matejak commented May 29, 2023

I like this concept, the idea basically is that the component mapping is bootstrapped (this is what this PR does), and then it is expanded incrementally as time goes by, based on changes to the project (rule additions). As stated in the description, this mapping connects rules that are likely to change for the same reason. Information about components is located in component files, not in rules, templates, and so on - content authors that want to assure conformance of the content with new product releases are a minority, so data that only they need shouldn't distract others by default.
The follow-up PR contains a test that can assist with keeping this mapping up-to-date, and that is functionally similar to the script that was used as a helper to bootstrap the mapping. I suppose that more work that makes that test smart and powerful can follow, and it should be designed with extensibility in mind.
I think that the schema of a Component can be treated as flexible at the moment, and I would even consider dropping the packages key, as package organization can vary across distributions, but it is the upstream that changes and that forces everybody equally to react to that change sooner or later, regardless how they package that project.

Copy link
Member

@matejak matejak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the PR with the knowledge that the referenced follow-up PR provides some level of support for keeping the contents up-to-date.
I will not merge it for another day to see whether there will be more comments.

@Mab879 Mab879 added this to the 0.1.69 milestone May 30, 2023
@matejak
Copy link
Member

matejak commented May 30, 2023

No additional comments, so here we go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement General enhancements to the project. Highlight This PR/Issue should make it to the featured changelog. Infrastructure Our content build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants