Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-123516: Improve JIT memory consumption by invalidating cold executors #123402

Closed
Closed
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
0eac77b
resolve conflict
savannahostrowski Jul 26, 2024
d576296
tests pass except ssl
savannahostrowski Jun 25, 2024
68e95d6
remove file
savannahostrowski Jul 26, 2024
c903af4
this is broken
savannahostrowski Aug 12, 2024
5ca6b7f
gc approach
savannahostrowski Aug 12, 2024
beb4f65
rebase
savannahostrowski Aug 12, 2024
427dbf5
Update has_run to run_count
savannahostrowski Aug 13, 2024
92d5590
update initialized run_count and move invalidate old
savannahostrowski Aug 13, 2024
0cdf638
set threshold to 1`
savannahostrowski Aug 13, 2024
58e7447
move incrementing run count into a new op
savannahostrowski Aug 14, 2024
6c047e4
add invalidation threshold in gc of 10
savannahostrowski Aug 20, 2024
2645023
move back to incremenet
savannahostrowski Aug 20, 2024
7c7ae98
remove print
savannahostrowski Aug 20, 2024
6d6d306
move invalidation to executor creation
savannahostrowski Aug 27, 2024
4d086fe
change threshold
savannahostrowski Aug 27, 2024
d08e45a
new line
savannahostrowski Aug 27, 2024
6315877
update constant
savannahostrowski Aug 27, 2024
622c266
📜🤖 Added by blurb_it.
blurb-it[bot] Aug 27, 2024
e5117b2
Merge branch 'main' into jit-mem-invalidate-10
savannahostrowski Aug 29, 2024
7c6704c
resolve conflict
savannahostrowski Jul 26, 2024
1d72fdd
tests pass except ssl
savannahostrowski Jun 25, 2024
1778185
remove file
savannahostrowski Jul 26, 2024
2506821
this is broken
savannahostrowski Aug 12, 2024
fca6dec
gc approach
savannahostrowski Aug 12, 2024
29436fd
rebase
savannahostrowski Aug 12, 2024
b969b11
Update has_run to run_count
savannahostrowski Aug 13, 2024
a669e0f
update initialized run_count and move invalidate old
savannahostrowski Aug 13, 2024
deb73ec
set threshold to 1`
savannahostrowski Aug 13, 2024
9fa55e8
move incrementing run count into a new op
savannahostrowski Aug 14, 2024
e4a461a
add invalidation threshold in gc of 10
savannahostrowski Aug 20, 2024
d5a2bed
move back to incremenet
savannahostrowski Aug 20, 2024
d232e63
remove print
savannahostrowski Aug 20, 2024
e4a456a
move invalidation to executor creation
savannahostrowski Aug 27, 2024
b7d2d5a
change threshold
savannahostrowski Aug 27, 2024
6dcd2dc
new line
savannahostrowski Aug 27, 2024
219f890
update constant
savannahostrowski Aug 27, 2024
fb7b04d
📜🤖 Added by blurb_it.
blurb-it[bot] Aug 27, 2024
ea397a3
Merge branch 'jit-mem-invalidate-10' of https://github.com/savannahos…
savannahostrowski Aug 29, 2024
310d20c
address pr comments
savannahostrowski Aug 30, 2024
d2f9dc4
refactor invalidatecold
savannahostrowski Aug 30, 2024
d755d56
refactor to was_run bool
savannahostrowski Aug 30, 2024
c9534c0
update blurb
savannahostrowski Aug 30, 2024
61cd3c5
Merge branch 'main' into jit-mem-invalidate-10
savannahostrowski Aug 30, 2024
e5065ad
Update op name to be more reflective of was_run
savannahostrowski Aug 30, 2024
2d09259
fix typo
savannahostrowski Aug 30, 2024
d2e8e29
rename uop
savannahostrowski Aug 30, 2024
a894598
dedent and initialize executors_created
savannahostrowski Aug 31, 2024
758ee03
Merge branch 'main' into jit-mem-invalidate-10
savannahostrowski Sep 1, 2024
1927bfe
address some PR comments
savannahostrowski Sep 1, 2024
8ee0d7f
Update Python/optimizer.c
savannahostrowski Sep 1, 2024
cedd65d
Update Python/optimizer.c
savannahostrowski Sep 1, 2024
0a9b5b6
make was_run uint`
savannahostrowski Sep 1, 2024
180a68e
add comment for JIT_CLEANUP_THRESHOLD
savannahostrowski Sep 1, 2024
7cb9cba
Remove extraneous reset of executors_created
savannahostrowski Sep 1, 2024
8939ecf
Merge branch 'main' into jit-mem-invalidate-10
savannahostrowski Sep 3, 2024
00f03d2
Merge branch 'main' into jit-mem-invalidate-10
savannahostrowski Sep 3, 2024
4981ab2
Merge branch 'jit-mem-invalidate-10' of https://github.com/savannahos…
savannahostrowski Sep 3, 2024
3c59316
condense conditional statements
savannahostrowski Sep 3, 2024
306c3c3
Merge branch 'main' into jit-mem-invalidate-10
savannahostrowski Sep 10, 2024
fe50615
Address PR comments from Brandt and Mark
savannahostrowski Sep 13, 2024
d51817b
Merge branch 'main' into jit-mem-invalidate-10
savannahostrowski Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Include/internal/pycore_interp.h
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ struct _is {
struct callable_cache callable_cache;
_PyOptimizerObject *optimizer;
_PyExecutorObject *executor_list_head;

size_t executors_created;
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
_rare_events rare_events;
PyDict_WatchCallback builtins_dict_watcher;

Expand Down
8 changes: 7 additions & 1 deletion Include/internal/pycore_optimizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ typedef struct {
uint8_t oparg;
uint16_t valid:1;
uint16_t linked:1;
uint16_t chain_depth:14; // Must be big engough for MAX_CHAIN_DEPTH - 1.
bool was_run:1;
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
uint16_t chain_depth:13; // Must be big enough for MAX_CHAIN_DEPTH - 1.
int index; // Index of ENTER_EXECUTOR (if code isn't NULL, below).
_PyBloomFilter bloom;
_PyExecutorLinkListNode links;
Expand Down Expand Up @@ -123,11 +124,16 @@ PyAPI_FUNC(PyObject *) _PyOptimizer_NewUOpOptimizer(void);
#ifdef _Py_TIER2
PyAPI_FUNC(void) _Py_Executors_InvalidateDependency(PyInterpreterState *interp, void *obj, int is_invalidation);
PyAPI_FUNC(void) _Py_Executors_InvalidateAll(PyInterpreterState *interp, int is_invalidation);
PyAPI_FUNC(void) _Py_Executors_InvalidateCold(PyInterpreterState *interp);

#else
# define _Py_Executors_InvalidateDependency(A, B, C) ((void)0)
# define _Py_Executors_InvalidateAll(A, B) ((void)0)
# define _Py_Executors_InvalidateCold(A) ((void)0)

#endif

#define JIT_CLEANUP_THRESHOLD 10
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved

// This is the length of the trace we project initially.
#define UOP_MAX_TRACE_LENGTH 800
Expand Down
41 changes: 21 additions & 20 deletions Include/internal/pycore_uop_ids.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Include/internal/pycore_uop_metadata.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Improved JIT memory consumption by periodically freeing memory used by infrequently-executed code.
This change is especially likely to improve the memory footprint of long-running programs.
4 changes: 4 additions & 0 deletions Python/bytecodes.c
Original file line number Diff line number Diff line change
Expand Up @@ -4844,6 +4844,10 @@ dummy_func(
assert(((_PyExecutorObject *)executor)->vm_data.valid);
}

tier2 op(_SET_EXECUTOR_RUN_STATE, (--)) {
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
current_executor->vm_data.was_run = true;
}

tier2 op(_FATAL_ERROR, (--)) {
assert(0);
Py_FatalError("Fatal error uop executed.");
Expand Down
5 changes: 5 additions & 0 deletions Python/executor_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

53 changes: 53 additions & 0 deletions Python/optimizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,12 @@ _PyOptimizer_Optimize(
if (err <= 0) {
return err;
}

if (++interp->executors_created >= JIT_CLEANUP_THRESHOLD) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could behave quite badly with many executors.

Suppose we have a 1000 executors:
Even in the ideal case, we need to scan all 1000, every time we create 10 new ones.

But what worries me is that we could clear all executors far too easily.
Suppose the VM starts compiling some new region of code, resulting in 20 new executors.
After the first 10, we will mark all executors as cold. After the second 10 we will clear all executors.

I'm not sure what the solution is, but I think a ticker that is independent of the number of executors created would be better. Maybe the GC trigger, or something similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In earlier runs of the benchmarks, there was no impact to performance, even in instances where I tried pretty aggressive thresholds. In discussing this with Brandt, it seemed cheap to re-create in cases where we might be overzealous in invalidation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also worth mentioning that I did experiment with triggering invalidation on gc earlier but in chatting with Brandt, it seemed like better design to have this tied to executor creation (i.e. since the invalidation isn't really at all coupled to gc today).

I'm sure that @pablogsal probably has thoughts here too?

Copy link
Member

@pablogsal pablogsal Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that @pablogsal probably has thoughts here too?

I think it may be sensible to do it in the eval breaker or at any other similar point in the VM execution where we can know it's safe to do. This is equivalent to the GC (as both are triggered there) and still allows us to add custom triggering logic and decuples the run from the GC.

For context: I would still prefer not to tie this with the gc as in general we have been holding off coupling the gc with things for a long time because any bug in the GC or on GC objects is already a very non trivial and hard to debug spooky-action-at-a-distance and we have learn that (1) controlling any side-effects that are not purely derived from the generic algorithm and (2) keeping the GC from having to know what's operating on has been key to keep complexity at bay.

Copy link
Member

@brandtbucher brandtbucher Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the solution is, but I think a ticker that is independent of the number of executors created would be better. Maybe the GC trigger, or something similar?

I think the current approach is probably good enough for now, and we can iterate on the heuristics. The potential quadratic behavior (and wipe-out-everything-ness) of the current approach is a bit scary, but is unlikely to seriously impact most programs. It can also be mitigated by a follow-up PR to change "every ten executors" to "10% more executors" or "20% more JIT memory" or some other ratio-based approach, like the GC uses.

I also think the GC is probably the wrong place for this. We could add a new dedicated eval-breaker bit, but that sort of feels like overkill. The creation of new executors is a known safe point where (a) we expect to spend a bit of time managing JIT stuff and (b) we know JIT memory pressure will increase by some fixed amount. I don't think it would be any better to just set an eval breaker bit here and leave our new JIT code almost immediately to perform this same cleanup.

interp->executors_created = 0;
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
_Py_Executors_InvalidateCold(interp);
}

assert(*executor_ptr != NULL);
if (progress_needed) {
int index = get_index_for_executor(code, start);
Expand Down Expand Up @@ -565,6 +571,7 @@ translate_bytecode_to_trace(
code->co_firstlineno,
2 * INSTR_IP(initial_instr, code));
ADD_TO_TRACE(_START_EXECUTOR, 0, (uintptr_t)instr, INSTR_IP(instr, code));
ADD_TO_TRACE(_SET_EXECUTOR_RUN_STATE, 0, 0, 0);
uint32_t target = 0;

for (;;) {
Expand Down Expand Up @@ -1194,6 +1201,9 @@ make_executor_from_uops(_PyUOpInstruction *buffer, int length, const _PyBloomFil
executor->jit_code = NULL;
executor->jit_side_entry = NULL;
executor->jit_size = 0;
// This is initialized to true so we can prevent the executor
// from being immediately detected as cold and invalidated.
executor->vm_data.was_run = true;
if (_PyJIT_Compile(executor, executor->trace, length)) {
Py_DECREF(executor);
return NULL;
Expand Down Expand Up @@ -1657,6 +1667,49 @@ _Py_Executors_InvalidateAll(PyInterpreterState *interp, int is_invalidation)
OPT_STAT_INC(executors_invalidated);
}
}
interp->executors_created=0;
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
}

void
_Py_Executors_InvalidateCold(PyInterpreterState *interp)
{
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
/* Walk the list of executors */
/* TO DO -- Use a tree to avoid traversing as many objects */
PyObject *invalidate = PyList_New(0);
if (invalidate == NULL) {
goto error;
}

/* Clearing an executor can deallocate others, so we need to make a list of
* executors to invalidate first */
for (_PyExecutorObject *exec = interp->executor_list_head; exec != NULL;) {
assert(exec->vm_data.valid);
_PyExecutorObject *next = exec->vm_data.links.next;
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved

if (!exec->vm_data.was_run) {
unlink_executor(exec);
if (PyList_Append(invalidate, (PyObject *)exec) < 0)
{
goto error;
}
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved

} else {
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
exec->vm_data.was_run = false;
}

exec = next;
}
for (Py_ssize_t i = 0; i < PyList_GET_SIZE(invalidate); i++) {
_PyExecutorObject *exec = (_PyExecutorObject *)PyList_GET_ITEM(invalidate, i);
executor_clear(exec);
}
Py_DECREF(invalidate);
return;
savannahostrowski marked this conversation as resolved.
Show resolved Hide resolved
error:
PyErr_Clear();
Py_XDECREF(invalidate);
// If we're truly out of memory, wiping out everything is a fine fallback:
_Py_Executors_InvalidateAll(interp, 0);
}

#endif /* _Py_TIER2 */
4 changes: 4 additions & 0 deletions Python/optimizer_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Python/pystate.c
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,7 @@ init_interpreter(PyInterpreterState *interp,
#ifdef _Py_TIER2
(void)_Py_SetOptimizer(interp, NULL);
interp->executor_list_head = NULL;
interp->executors_created = 0;
#endif
if (interp != &runtime->_main_interpreter) {
/* Fix the self-referential, statically initialized fields. */
Expand Down
Loading