Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#2176918) core: backport OOMPolicy fixes #149

Merged
merged 7 commits into from
Mar 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions man/org.freedesktop.systemd1.xml
Original file line number Diff line number Diff line change
Expand Up @@ -10150,6 +10150,8 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {
readonly t RuntimeMaxUSec = ...;
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly t RuntimeRandomizedExtraUSec = ...;
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s OOMPolicy = '...';
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
readonly s Slice = '...';
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
Expand Down Expand Up @@ -10324,6 +10326,8 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {

<!--property RuntimeRandomizedExtraUSec is not documented!-->

<!--property OOMPolicy is not documented!-->

<!--property Slice is not documented!-->

<!--property ControlGroupId is not documented!-->
Expand Down Expand Up @@ -10506,6 +10510,8 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {

<variablelist class="dbus-property" generated="True" extra-ref="RuntimeRandomizedExtraUSec"/>

<variablelist class="dbus-property" generated="True" extra-ref="OOMPolicy"/>

<variablelist class="dbus-property" generated="True" extra-ref="Slice"/>

<variablelist class="dbus-property" generated="True" extra-ref="ControlGroup"/>
Expand Down
2 changes: 2 additions & 0 deletions man/systemd.scope.xml
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@
of scope units are the following:</para>

<variablelist class='unit-directives'>
<xi:include href="systemd.service.xml" xpointer="oom-policy" />

<varlistentry>
<term><varname>RuntimeMaxSec=</varname></term>

Expand Down
39 changes: 19 additions & 20 deletions man/systemd.service.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1120,30 +1120,30 @@
above.</para></listitem>
</varlistentry>

<varlistentry>
<varlistentry id='oom-policy'>
<term><varname>OOMPolicy=</varname></term>

<listitem><para>Configure the out-of-memory (OOM) kernel killer policy. Note that the userspace OOM
<listitem><para>Configure the out-of-memory (OOM) killing policy for the kernel and the userspace OOM
killer
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
is a more flexible solution that aims to prevent out-of-memory situations for the userspace, not just
the kernel.</para>

<para>On Linux, when memory becomes scarce to the point that the kernel has trouble allocating memory
for itself, it might decide to kill a running process in order to free up memory and reduce memory
pressure. This setting takes one of <constant>continue</constant>, <constant>stop</constant> or
<constant>kill</constant>. If set to <constant>continue</constant> and a process of the service is
killed by the kernel's OOM killer this is logged but the service continues running. If set to
<constant>stop</constant> the event is logged but the service is terminated cleanly by the service
manager. If set to <constant>kill</constant> and one of the service's processes is killed by the OOM
killer the kernel is instructed to kill all remaining processes of the service too, by setting the
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>.
On Linux, when memory becomes scarce to the point that the kernel has trouble allocating memory for
itself, it might decide to kill a running process in order to free up memory and reduce memory
pressure. Note that <filename>systemd-oomd.service</filename> is a more flexible solution that aims
to prevent out-of-memory situations for the userspace too, not just the kernel, by attempting to
terminate services earlier, before the kernel would have to act.</para>

<para>This setting takes one of <constant>continue</constant>, <constant>stop</constant> or
<constant>kill</constant>. If set to <constant>continue</constant> and a process in the unit is
killed by the OOM killer, this is logged but the unit continues running. If set to
<constant>stop</constant> the event is logged but the unit is terminated cleanly by the service
manager. If set to <constant>kill</constant> and one of the unit's processes is killed by the OOM
killer the kernel is instructed to kill all remaining processes of the unit too, by setting the
<filename>memory.oom.group</filename> attribute to <constant>1</constant>; also see <ulink
url="https://docs.kernel.org/admin-guide/cgroup-v2.html">kernel documentation</ulink>.
</para>
url="https://docs.kernel.org/admin-guide/cgroup-v2.html">kernel documentation</ulink>.</para>

<para>Defaults to the setting <varname>DefaultOOMPolicy=</varname> in
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
is set to, except for services where <varname>Delegate=</varname> is turned on, where it defaults to
is set to, except for units where <varname>Delegate=</varname> is turned on, where it defaults to
<constant>continue</constant>.</para>

<para>Use the <varname>OOMScoreAdjust=</varname> setting to configure whether processes of the unit
Expand All @@ -1153,10 +1153,9 @@
details.</para>

<para>This setting also applies to <command>systemd-oomd</command>. Similarly to the kernel OOM
kills, this setting determines the state of the service after <command>systemd-oomd</command> kills a
cgroup associated with the service.</para></listitem>
kills, this setting determines the state of the unit after <command>systemd-oomd</command> kills a
cgroup associated with it.</para></listitem>
</varlistentry>

</variablelist>

<para id='shared-unit-options'>Check
Expand Down
90 changes: 37 additions & 53 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -3997,6 +3997,7 @@ foreach tuple : fuzzers
else
sources += 'src/fuzz/fuzz-main.c'
endif
sources += fuzz_generated_directives

# FIXME: Use fs.stem() with meson >= 0.54.0
name = '@0@'.format(sources[0]).split('/')[-1].split('.')[0]
Expand All @@ -4013,19 +4014,17 @@ foreach tuple : fuzzers
build_by_default : fuzzer_build)
fuzzer_exes += exe

if want_tests != 'false'
if want_tests != 'false' and name in fuzz_regression_tests
# Run the fuzz regression tests without any sanitizers enabled.
# Additional invocations with sanitizers may be added below.
foreach p : fuzz_regression_tests
b = p.split('/')[-2]
c = p.split('/')[-1]

if b == name
test('@0@_@1@'.format(b, c),
exe,
suite : 'fuzzers',
args : [project_source_root / p])
endif
foreach tuple : fuzz_regression_tests[name]
fuzz_dir = tuple[0]
fuzz_in = tuple[1]
test('@0@_@1@'.format(name, fuzz_in),
exe,
suite : 'fuzzers',
args : [fuzz_dir != '' ? project_source_root / fuzz_dir / name / fuzz_in
: fuzz_generated_in_dir / '@0@_@1@'.format(name, fuzz_in)])
endforeach
endif
endforeach
Expand Down Expand Up @@ -4114,59 +4113,44 @@ foreach exec : public_programs
endif
endforeach

############################################################

check_directives_sh = find_program('tools/check-directives.sh')

if want_tests != 'false'
test('check-directives',
check_directives_sh,
suite : 'dist-check',
args : [project_source_root, project_build_root])
endif

############################################################

# Enable tests for all supported sanitizers
foreach tuple : sanitizers
foreach tuple : fuzz_sanitizers
sanitizer = tuple[0]
build = tuple[1]

if cc.has_link_argument('-fsanitize=@0@'.format(sanitizer))
prev = ''
foreach p : fuzz_regression_tests
b = p.split('/')[-2]
c = p.split('/')[-1]

name = '@0@:@1@'.format(b, sanitizer)

if name != prev
if want_tests == 'false'
message('Not compiling @0@ because tests is set to false'.format(name))
elif fuzz_tests
exe = custom_target(
name,
output : name,
depends : build,
command : [ln, '-fs',
build.full_path() / b,
'@OUTPUT@'],
build_by_default : true)
else
message('Not compiling @0@ because fuzz-tests is set to false'.format(name))
endif
foreach fuzzer, fuzz_ins : fuzz_regression_tests
name = '@0@:@1@'.format(fuzzer, sanitizer)
if want_tests == 'false'
message('Not compiling @0@ because tests is set to false'.format(name))
continue
endif
prev = name

if fuzz_tests
test('@0@_@1@_@2@'.format(b, c, sanitizer),
if not fuzz_tests
message('Not compiling @0@ because fuzz-tests is set to false'.format(name))
continue
endif
exe = custom_target(
name,
output : name,
depends : [build] + fuzz_generated_directives,
command : [ln, '-fs',
build.full_path() / fuzzer,
'@OUTPUT@'],
build_by_default : true)

foreach tuple : fuzz_ins
fuzz_dir = tuple[0]
fuzz_in = tuple[1]

test('@0@_@1@_@2@'.format(fuzzer, fuzz_in, sanitizer),
env,
suite : 'fuzz+san',
env : ['UBSAN_OPTIONS=print_stacktrace=1:print_summary=1:halt_on_error=1'],
timeout : 60,
args : [exe.full_path(),
project_source_root / p])
endif
fuzz_dir != '' ? project_source_root / fuzz_dir / fuzzer / fuzz_in
: fuzz_generated_in_dir / '@0@_@1@'.format(fuzzer, fuzz_in)])
endforeach
endforeach
endif
endforeach
Expand Down
6 changes: 6 additions & 0 deletions src/core/dbus-scope.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "bus-get-properties.h"
#include "dbus-cgroup.h"
#include "dbus-kill.h"
#include "dbus-manager.h"
#include "dbus-scope.h"
#include "dbus-unit.h"
#include "dbus-util.h"
Expand Down Expand Up @@ -39,6 +40,7 @@ int bus_scope_method_abandon(sd_bus_message *message, void *userdata, sd_bus_err
}

static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_result, scope_result, ScopeResult);
static BUS_DEFINE_SET_TRANSIENT_PARSE(oom_policy, OOMPolicy, oom_policy_from_string);

const sd_bus_vtable bus_scope_vtable[] = {
SD_BUS_VTABLE_START(0),
Expand All @@ -47,6 +49,7 @@ const sd_bus_vtable bus_scope_vtable[] = {
SD_BUS_PROPERTY("Result", "s", property_get_result, offsetof(Scope, result), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
SD_BUS_PROPERTY("RuntimeMaxUSec", "t", bus_property_get_usec, offsetof(Scope, runtime_max_usec), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("RuntimeRandomizedExtraUSec", "t", bus_property_get_usec, offsetof(Scope, runtime_rand_extra_usec), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("OOMPolicy", "s", bus_property_get_oom_policy, offsetof(Scope, oom_policy), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_SIGNAL("RequestStop", NULL, 0),
SD_BUS_METHOD("Abandon", NULL, NULL, bus_scope_method_abandon, SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_VTABLE_END
Expand Down Expand Up @@ -77,6 +80,9 @@ static int bus_scope_set_transient_property(
if (streq(name, "RuntimeRandomizedExtraUSec"))
return bus_set_transient_usec(u, name, &s->runtime_rand_extra_usec, message, flags, error);

if (streq(name, "OOMPolicy"))
return bus_set_transient_oom_policy(u, name, &s->oom_policy, message, flags, error);

if (streq(name, "PIDs")) {
_cleanup_(sd_bus_creds_unrefp) sd_bus_creds *creds = NULL;
unsigned n = 0;
Expand Down
1 change: 1 addition & 0 deletions src/core/load-fragment-gperf.gperf.in
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,7 @@ Path.TriggerLimitBurst, config_parse_unsigned,
Scope.RuntimeMaxSec, config_parse_sec, 0, offsetof(Scope, runtime_max_usec)
Scope.RuntimeRandomizedExtraSec, config_parse_sec, 0, offsetof(Scope, runtime_rand_extra_usec)
Scope.TimeoutStopSec, config_parse_sec, 0, offsetof(Scope, timeout_stop_usec)
Scope.OOMPolicy, config_parse_oom_policy, 0, offsetof(Scope, oom_policy)
{# The [Install] section is ignored here #}
Install.Alias, NULL, 0, 0
Install.WantedBy, NULL, 0, 0
Expand Down
19 changes: 16 additions & 3 deletions src/core/scope.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ static void scope_init(Unit *u) {
s->timeout_stop_usec = u->manager->default_timeout_stop_usec;
u->ignore_on_isolate = true;
s->user = s->group = NULL;
s->oom_policy = _OOM_POLICY_INVALID;
}

static void scope_done(Unit *u) {
Expand Down Expand Up @@ -194,6 +195,11 @@ static int scope_add_extras(Scope *s) {
if (r < 0)
return r;

if (s->oom_policy < 0)
s->oom_policy = s->cgroup_context.delegate ? OOM_CONTINUE : UNIT(s)->manager->default_oom_policy;

s->cgroup_context.memory_oom_group = s->oom_policy == OOM_KILL;

return scope_add_default_dependencies(s);
}

Expand Down Expand Up @@ -286,11 +292,13 @@ static void scope_dump(Unit *u, FILE *f, const char *prefix) {
"%sScope State: %s\n"
"%sResult: %s\n"
"%sRuntimeMaxSec: %s\n"
"%sRuntimeRandomizedExtraSec: %s\n",
"%sRuntimeRandomizedExtraSec: %s\n"
"%sOOMPolicy: %s\n",
prefix, scope_state_to_string(s->state),
prefix, scope_result_to_string(s->result),
prefix, FORMAT_TIMESPAN(s->runtime_max_usec, USEC_PER_SEC),
prefix, FORMAT_TIMESPAN(s->runtime_rand_extra_usec, USEC_PER_SEC));
prefix, FORMAT_TIMESPAN(s->runtime_rand_extra_usec, USEC_PER_SEC),
prefix, oom_policy_to_string(s->oom_policy));

cgroup_context_dump(UNIT(s), f, prefix);
kill_context_dump(&s->kill_context, f, prefix);
Expand Down Expand Up @@ -635,11 +643,16 @@ static void scope_notify_cgroup_oom_event(Unit *u, bool managed_oom) {
else
log_unit_debug(u, "Process of control group was killed by the OOM killer.");

/* This will probably need to be modified when scope units get an oom-policy */
if (s->oom_policy == OOM_CONTINUE)
return;

switch (s->state) {

case SCOPE_START_CHOWN:
case SCOPE_RUNNING:
scope_enter_signal(s, SCOPE_STOP_SIGTERM, SCOPE_FAILURE_OOM_KILL);
break;

case SCOPE_STOP_SIGTERM:
scope_enter_signal(s, SCOPE_STOP_SIGKILL, SCOPE_FAILURE_OOM_KILL);
break;
Expand Down
2 changes: 2 additions & 0 deletions src/core/scope.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ struct Scope {

char *user;
char *group;

OOMPolicy oom_policy;
};

extern const UnitVTable scope_vtable;
Expand Down
6 changes: 6 additions & 0 deletions src/login/logind-dbus.c
Original file line number Diff line number Diff line change
Expand Up @@ -3970,6 +3970,12 @@ int manager_start_scope(
if (r < 0)
return r;

/* For login session scopes, if a process is OOM killed by the kernel, *don't* terminate the rest of
the scope */
r = sd_bus_message_append(m, "(sv)", "OOMPolicy", "s", "continue");
if (r < 0)
return r;

/* disable TasksMax= for the session scope, rely on the slice setting for it */
r = sd_bus_message_append(m, "(sv)", "TasksMax", "t", UINT64_MAX);
if (r < 0)
Expand Down
3 changes: 3 additions & 0 deletions src/shared/bus-unit-util.c
Original file line number Diff line number Diff line change
Expand Up @@ -2142,6 +2142,9 @@ static int bus_append_scope_property(sd_bus_message *m, const char *field, const
if (STR_IN_SET(field, "User", "Group"))
return bus_append_string(m, field, eq);

if (streq(field, "OOMPolicy"))
return bus_append_string(m, field, eq);

return 0;
}

Expand Down
Loading