feat(baremetal_validator): Add Validator for Process, Container Metrics #1878

KaiyiLiu1234 · 2024-12-03T00:45:51Z

This commit includes validation setups for Process and Container RAPL related Baremetal power metrics (node, process, container).

…etal Metrics This commit includes validation setups for Process and Container RAPL related power metrics (node, process, container). Signed-off-by: Kaiyi <[email protected]>

github-actions · 2024-12-03T00:46:30Z

🤖 SeineSailor

Here's a concise summary of the pull request changes:

Summary: This pull request introduces a baremetal validator for process and container RAPL-related power metrics, adding a new Bash script and several Python classes for stress testing. Key modifications include:

New baremetal_validator script and Python classes (ProcessOutput, ContainerOutput, StresserError, Local, Process, and Container) for process and container stress testing.
Introduction of new named tuples (BMValidator, Local, LocalProcess, LocalContainer, LocalPrometheus) for configuration data.
Changes to the validate_acpi command to accept a duration argument and addition of a new regression command to the bm_validator group.
Updates to the bload function to parse and initialize new fields from the YAML configuration file, with an updated configuration file format for baremetal validations.

Impact: These changes may affect the external interface and behavior of the code, particularly in the areas of stress testing and configuration data handling.

Observations/Suggestions:

It would be beneficial to include unit tests for the new Python classes and Bash script to ensure their correctness and robustness.
Consider adding documentation for the new regression command and its usage.
The updated configuration file format may require changes to existing integrations or scripts that rely on the old format; ensure that these changes are properly communicated and addressed.

Overall, this pull request introduces significant new functionality for baremetal validations, but it's essential to thoroughly review and test these changes to ensure they do not introduce unintended consequences or break existing functionality.

Added bload to load configuration classes for process, container, and node level baremetal validation. Signed-off-by: Kaiyi <[email protected]>

vprashar2929 · 2024-12-05T04:50:48Z

e2e/tools/validator/scripts/targeted_stresser.sh

Couldn't we modify the existing stressor script instead of adding new ones?

I did not want to modify the other scripts because it might break the metal ci. I think this script is the best to replace the other stressor scripts (as it enables a custom load curve). A future PR should be made for modifying validator to have just one goto stressor script.

vprashar2929 · 2024-12-05T05:12:27Z

e2e/tools/validator/src/validator/config/__init__.py

+    node = Local(
+        load_curve=node_config.get("load_curve", default_config["load_curve"]),
+        iterations=node_config.get("iterations", default_config["iterations"]),
+        mount_dir=os.path.expanduser(node_config.get("mount_dir", default_config["mount_dir"]))


Why do we need to mount the directory? Also how are you planning to deploy Kepler on BM?

Currently, kepler is deployed with docker compose for simplicity. however, this should still work the same even in a kubernetes environment. The mount directory is specifically to get the start and end times of the stressor script as a log file from the container. In a non container, the log file will be saved at the mount directory.

Shouldn't we target a single environment for now and focus on expanding it further? Currently, we use docker compose for our development and validations. Let's stick with that for now and afterwards add support for k8s
Wdyt?

vprashar2929 · 2024-12-05T05:13:21Z

e2e/tools/validator/src/validator/config/__init__.py

+        process_config = {}
+    process = LocalProcess(
+        isolated_cpu=process_config.get("isolated_cpu", default_config["isolated_cpu"]),
+        load_curve=process_config.get("load_curve", default_config["load_curve"]),


What is the difference in the load curve between the process and the container?

No difference. the logic is the same

Then why we require separate configs for both?

vprashar2929 · 2024-12-05T05:20:10Z

e2e/tools/validator/scripts/targeted_stresser.sh

+    > "$TIME_INTERVAL_LOG"
+
+    start_time=$(date +%s)
+    echo "Stress Start Time: $start_time" >> "$TIME_INTERVAL_LOG"


AFAIK we do something similar in our existing validator source by checking the start and end time of the stressor.sh script. I think we can do similar in this case as well instead of computing start and end time in the bash script itself

The reason this is necessary is primarily for container validation and in future if we attempt pods (which should fall under container still anyway). Container setup requires an image installation (in this case, fedora and stressng need to be installed). If we get start and end time outside we will include the time for installing which will skew the results. The easiest solution I can think of was to actually compute it in the bash script so we do not include the installation time.

Added process, container, node/local validation to the cli. Signed-off-by: Kaiyi <[email protected]>

Added relevant config files for baremetal validation including formatted prom validation metrics and baremetal configuration. Signed-off-by: Kaiyi <[email protected]>

Added most important component rapl metrics to validate on bm Signed-off-by: Kaiyi Liu <[email protected]>

vprashar2929 · 2024-12-16T17:09:17Z

e2e/tools/validator/validator.bm.yaml

+
+config: # default settings
+  isolated_cpu: "15" # Logical processor that is fully isolated from scheduler (ex. isolcpus)
+  load_curve: "0:15,10:20,25:20,50:20,75:20,100:30,75:20,50:20,25:20,10:20,0:15"


Why can't we add a new load curve in the existing stressor and not make it part of the config as it overcomplicated the config part How's this different from what default load curve gives us?

feat(baremetal_validator): Add Validator for Process, Container Barem…

7c06cbc

…etal Metrics This commit includes validation setups for Process and Container RAPL related power metrics (node, process, container). Signed-off-by: Kaiyi <[email protected]>

KaiyiLiu1234 marked this pull request as draft December 3, 2024 00:46

feat(bm-validation): Add config setup for Baremetal Validator

ad0e7f8

Added bload to load configuration classes for process, container, and node level baremetal validation. Signed-off-by: Kaiyi <[email protected]>

KaiyiLiu1234 requested a review from vprashar2929 December 4, 2024 18:19

vprashar2929 reviewed Dec 5, 2024

View reviewed changes

KaiyiLiu1234 and others added 3 commits December 9, 2024 22:11

feat(baremetal-validator): Add CLI for running baremetal validations

15106d7

Added process, container, node/local validation to the cli. Signed-off-by: Kaiyi <[email protected]>

feat(baremetal-validation): Add relevant config files

0e2bf49

Added relevant config files for baremetal validation including formatted prom validation metrics and baremetal configuration. Signed-off-by: Kaiyi <[email protected]>

feat(bm-validation): Add bm_validations.yaml configuration

3ecbcf5

Added most important component rapl metrics to validate on bm Signed-off-by: Kaiyi Liu <[email protected]>

KaiyiLiu1234 marked this pull request as ready for review December 16, 2024 04:04

vprashar2929 reviewed Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(baremetal_validator): Add Validator for Process, Container Metrics #1878

feat(baremetal_validator): Add Validator for Process, Container Metrics #1878

KaiyiLiu1234 commented Dec 3, 2024

github-actions bot commented Dec 3, 2024 •

edited

Loading

vprashar2929 Dec 5, 2024

KaiyiLiu1234 Dec 10, 2024

vprashar2929 Dec 5, 2024 •

edited

Loading

KaiyiLiu1234 Dec 11, 2024

vprashar2929 Dec 16, 2024

vprashar2929 Dec 5, 2024

KaiyiLiu1234 Dec 10, 2024

vprashar2929 Dec 16, 2024

vprashar2929 Dec 5, 2024

KaiyiLiu1234 Dec 10, 2024

vprashar2929 Dec 16, 2024

feat(baremetal_validator): Add Validator for Process, Container Metrics #1878

Are you sure you want to change the base?

feat(baremetal_validator): Add Validator for Process, Container Metrics #1878

Conversation

KaiyiLiu1234 commented Dec 3, 2024

github-actions bot commented Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vprashar2929 Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 3, 2024 •

edited

Loading

vprashar2929 Dec 5, 2024 •

edited

Loading