Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Add LinuxMemoryChecker check/warning to ensure system-mem-limit-gb is reasonably set #24149

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

minhancao
Copy link
Contributor

@minhancao minhancao commented Nov 26, 2024

Description

Add LinuxMemoryChecker check and warning to ensure system-memory-gb < system-mem-limit-gb < memory limit for process.

For cgroup v1:
Set memory limit for process to be the smaller number between /proc/meminfo and memory.limit_in_bytes

For cgroup v2:
Set memory limit for process to be the smaller number between /proc/meminfo and memory.max
If memory.max contains "max" string, then look at /proc/meminfo for the MemTotal, otherwise use the value in memory.max.

VELOX_CHECK_LT(system-mem-limit-gb, memory limit for process):

system-mem-limit-gb is higher than the memory limit for process. Expected: system-mem-limit-gb < memory limit for process.

Warning to output to worker's log:

system-mem-limit-gb is smaller than system-memory-gb. Expected: system-mem-limit-gb >= system-memory-gb.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@minhancao minhancao self-assigned this Nov 26, 2024
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Nov 26, 2024
@prestodb-ci prestodb-ci requested review from a team, psnv03 and pramodsatya and removed request for a team November 26, 2024 02:04
@minhancao minhancao marked this pull request as ready for review November 26, 2024 02:07
@minhancao minhancao requested a review from a team as a code owner November 26, 2024 02:07
@minhancao minhancao changed the title [native] Add LinuxMemoryChecker warnings to ensure system-memory-gb < system-mem-limit-gb < actual total memory capacity [native] Add LinuxMemoryChecker warnings to ensure system-mem-limit-gb is reasonably set Nov 26, 2024
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 4478ae1 to 15f55bb Compare November 26, 2024 02:29
@minhancao minhancao changed the title [native] Add LinuxMemoryChecker warnings to ensure system-mem-limit-gb is reasonably set [native] Add LinuxMemoryChecker check/warning to ensure system-mem-limit-gb is reasonably set Nov 26, 2024
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 15f55bb to 7646600 Compare November 26, 2024 06:06
Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test with fake files again just like we did with the original tests for this class?
That way we can try the "max" value for cgv2, and gigantic values and reasonable values. Basically testing the various situations we saw when investigating this.

@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch 2 times, most recently from 8da401b to 4ae2cee Compare December 3, 2024 20:08
Copy link
Contributor

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @minhancao, could you please squash the commits?

std::string statFile_;
std::string memInfoFile_ = "/proc/meminfo";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const std::string kMemInfoFile_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to stay as a non-const variable since I need to change its path to point to the meminfo test file when I am running the LinuxMemoryCheckerTests.

@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 85b3b9d to dab2335 Compare December 13, 2024 00:00
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ideal if we can avoid checking in data files for testing.
We only need a few fields from the file for testing.
Can we write these required fields to a temporary file as part of the testing?

@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch 2 times, most recently from 71c9fa9 to 89a50a8 Compare January 16, 2025 23:53
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 89a50a8 to 163880b Compare January 22, 2025 18:16
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 163880b to 29dc3b5 Compare January 28, 2025 00:40
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch 2 times, most recently from 9a7cbf7 to 80ca9b5 Compare February 6, 2025 22:25
@minhancao
Copy link
Contributor Author

@czentgr @majetideepak @pramodsatya
I have addressed all the PR comments, please review this PR when you can, thank you!

if ((stat(kCgroupV1MaxMemFilePath, &buffer) == 0)) {
memMaxFile_ = kCgroupV1MaxMemFilePath;
PRESTO_STARTUP_LOG(INFO)
<< fmt::format("Using cgroup v1 memory max file {}", memMaxFile_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have cgroup v2 memory stat file along with group v1 memory max file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that combination is not possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could then combine both V1 parts in one block and V2 parts in another block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the memMaxFile_ and memStatFile_ should both have their own if block to check because it's possible for one of them to exist and the other does not on a Linux machine.

@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch 3 times, most recently from bb3ce29 to f25e5b1 Compare February 8, 2025 00:22
…mit-gb is reasonably set

Add additional checks and warnings to ensure
system-memory-gb < system-mem-limit-gb < memory limit for process.

For cgroup v1:
Set memory limit for process to be the smaller number
between /proc/meminfo and memory.limit_in_bytes

For cgroup v2:
Set memory limit for process to be the smaller number
between /proc/meminfo and memory.max
If memory.max contains "max" string, then look at
/proc/meminfo for the MemTotal, otherwise use the
value in memory.max.
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from f25e5b1 to b52384a Compare February 8, 2025 00:23
@minhancao
Copy link
Contributor Author

@majetideepak I have updated the PR with some new changes, please review when you can, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants