-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--experimental_output_paths=strip is not effective when actions are scheduled in parallel #21043
Comments
@coeuvre @tjgq If I understand the logic around CC @gregestren |
To be more concrete, this is the test case that we would want to pass: function test_path_stripping_cache_hit_for_parallel_action() {
mkdir rules
cat > rules/defs.bzl <<'EOF'
def _platforms_split_transition_impl(settings, attr):
return [
{"//command_line_option:platforms": "//rules:foo"},
{"//command_line_option:platforms": "//rules:bar"},
]
_platforms_split_transition = transition(
implementation = _platforms_split_transition_impl,
inputs = [],
outputs = ["//command_line_option:platforms"],
)
def _echo_impl(ctx):
out = ctx.actions.declare_file(ctx.label.name)
args = ctx.actions.args()
args.add(out)
ctx.actions.run_shell(
outputs = [out],
arguments = [args],
command = "sleep 2 && echo '{}' > $1".format(ctx.attr.msg, out.path),
execution_requirements = {"supports-path-mapping": "1"},
)
return [
DefaultInfo(files = depset([out])),
]
echo = rule(
_echo_impl,
attrs = {
"msg": attr.string(),
},
)
def _cat_impl(ctx):
out = ctx.actions.declare_file(ctx.label.name)
inputs = [file for src in ctx.attr.src for file in src[DefaultInfo].files.to_list()]
ctx.actions.run_shell(
inputs = inputs,
outputs = [out],
command = "cat {} > {}".format(" ".join([f.path for f in inputs]), out.path),
)
return [
DefaultInfo(files = depset([out])),
]
cat = rule(
_cat_impl,
attrs = {
"src": attr.label(cfg = _platforms_split_transition),
},
)
EOF
cat > rules/BUILD << 'EOF'
platform(name = "foo", parents = ["@local_config_platform//:host"])
platform(name = "bar", parents = ["@local_config_platform//:host"])
EOF
mkdir -p pkg
cat > pkg/BUILD <<'EOF'
load("//rules:defs.bzl", "cat", "echo")
echo(
name = "gen_src",
msg = "Hello, Cache!",
)
cat(
name = "cat",
src = ":gen_src",
)
EOF
bazel build \
--experimental_output_paths=strip \
--remote_executor=grpc://localhost:${worker_port} \
//pkg:cat &> $TEST_log || fail "build failed unexpectedly"
expect_log '2 remote[^ ]'
expect_log '1 remote cache hit'
} |
I think this makes sense, yes. |
Thinking some more about it: is this really worth doing if we assume that the remote executor is able to deduplicate identical concurrent requests on its own? The spec doesn't mandate it, but it does hint at the possibility, and I'd expect a well-optimized implementation to do so. (It's at least true of Google's internal equivalent of remote execution.) On the Bazel side, we can't really save any pre-execution work (we need to calculate the action digest anyway to detect duplicates) nor post-execution work (we don't have the ability to deduplicate downloads of the same blob into many output locations, and even if we did, it would only matter for actions "built with the bytes"). So I think the gains would be solely in reduced (non-blob) network traffic, which I'm not sure is worth the extra complexity. |
That's a good point! Since deduplication within a single build is really just a special case of general deduplication across concurrent builds for a remote executor, it's better positioned to do this than Bazel. However, this still leaves the case of local execution with a remote (or disk) cache. In this situation, I don't see how it would be possible for the cache to transparently deduplicate requests, given that it doesn't have control over whether and when the results of an action that incurred a cache miss will be uploaded. @tjgq Do you maybe see a way to push this kind of deduplication into the cache? If not, would you consider that to be a sensible addition? @DavidANeil I just noticed the "Local execution" category you chose for the issue. Are you using a remote cache together with local execution? |
For local execution, I think the rough outline of "keep a map of futures for in-flight executions, resolve the future on action upload, block on the future when downloading an identical action" might be doable. But it's the sort of thing that I'd have to write to convince myself (for example, it's a little unclear to me what the failure path would look like). |
We have some developers using local execution with remote cache. So, ideally the solution would work for all 3 of those modes. |
When path mapping is enabled, different `Spawn`s in the same build can have identical `RemoteAction.ActionKey`s and can thus provide remote cache hits for each other. However, cache hits are only possible after the first local execution has concluded and uploaded its result to the cache. To avoid unnecessary duplication of local work, the first `Spawn` for each `RemoteAction.ActionKey` is tracked until its results have been uploaded and all other concurrently scheduled `Spawn`s wait for it and then copy over its local outputs. Fixes bazelbuild#21043 Closes bazelbuild#22556. PiperOrigin-RevId: 655097996 Change-Id: I4368f9210c67a306775164d252aae122d8b46f9b
When path mapping is enabled, different `Spawn`s in the same build can have identical `RemoteAction.ActionKey`s and can thus provide remote cache hits for each other. However, cache hits are only possible after the first local execution has concluded and uploaded its result to the cache. To avoid unnecessary duplication of local work, the first `Spawn` for each `RemoteAction.ActionKey` is tracked until its results have been uploaded and all other concurrently scheduled `Spawn`s wait for it and then copy over its local outputs. Fixes #21043 Closes #22556. PiperOrigin-RevId: 655097996 Change-Id: I4368f9210c67a306775164d252aae122d8b46f9b Closes #23060
A fix for this issue has been included in Bazel 7.3.0 RC1. Please test out the release candidate and report any issues as soon as possible. |
Description of the feature request:
--experimental_output_paths=strip
can allow for aexec
andtarget
spawn to have the same input digest, and therefore be cache hits. Unfortunately, because these actions frequently have identical or exactly duplicated input dependency graphs, it isn't uncommon that the two actions are scheduled at the same time, meaning the work is still duplicated.I'm not totally certain on the solution, but my first bad idea is that the spawn strategy should, if this flag is enabled, check to see if any sibling spawns match the input digest of spawn that is about to start executing, and if so somehow "join" the spawn results.
Which category does this issue belong to?
Local Execution
What underlying problem are you trying to solve with this feature?
Our host and target platforms are identical, so all duplicated
[for tool]
actions are pure overhead. This flag seems promising at reducing that overhead, but we still frequently see duplicated actions since the "exec" and "target" dependency chains tend to stay in sync, since one branch falling behind due to chance of the scheduler means it gets cache hits until it catches back up and starts duplicating execution again.The text was updated successfully, but these errors were encountered: