-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Also path map transitive header jar paths with direct classpath optimization #19872
Conversation
Note that there is still one problematic case that this PR isn't meant to address: Since we do not unmap the paths embedded in the To fix that problem, we really need to use a mapping scheme that can't produce collisions, e.g. sorting collidings paths by their hashes and then adding a disambiguating count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In b/303689530 (Google discussion of this issue) @cushon was also concerned about a path-mapped reference being stored in the hjar output. It's stored in a repackaged transitive class which includes a TurbineTransitiveJar class attribute to its original .jar path.
I think that's the source for the jdeps content? And while your PR fixes the .jdeps rewriting, it doesn't change the original reference, which is still embedded in the header jar contents?
I'm curious how this compares to @aranguyen 's idea to make getReduced ClassPath
more resilient. And my idea on not disabling path mapping when the same input appears multiple times. And your related idea of not disabling path mapping by remapping away collisions.
@cushon had an interesting idea that if we make getReducedClassPath
more resilient, maybe we don't even need createFullOutputDeps
. But I think that assumes we can make getReducedClassPath
sufficiently resilient. And it assumes we don't have other consumers of jdeps (which I'm not sure is true).
I do like that you're trying to principally make the jdeps content correct from the source.
I'm adding a lot of complicating thoughts here but I'd like to clearly understand how all these different approaches fit together in the bigger picture.
One other critique is the less ad hoc extra logic we need, the better.
@@ -499,7 +522,8 @@ public void build(JavaToolchainProvider javaToolchain) throws RuleErrorException | |||
// If classPathMode == BAZEL, also make sure to inject the dependencies to be | |||
// available to downstream actions. Else just do enough work to locally create the | |||
// full .jdeps from the .stripped .jdeps produced on the executor. | |||
/* insertDependencies= */ classpathMode == JavaClasspathMode.BAZEL)); | |||
/* insertDependencies= */ classpathMode == JavaClasspathMode.BAZEL, | |||
additionalArtifactsForPathMapping)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, this adds a new reference to the transitive class path to JavaheaderCompileAction
. Does that have any memory consequences in that the reference could live longer? I realize there's already a reference passed to JavaCompileAction
(not sure it's the same reference, since JavaHeaderCompileActionBuilder
only constructs a JavaCompileAction
under certain circumstances). Maybe that reference eliminates this concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified with JOL that the instance size of JavaHeaderCompileAction
is still 72 after this change (it had 7 bytes of padding previously), so the field itself shouldn't result in higher memory usage. @justinhorvitz Is "HotSpot >= 15, 64-bit, COOPS, CCPS" the correct JOL setting to analyze retained size for Bazel?
The nested set it creates an additional reference to is either empty or comes from JavaTargetAttributes#getCompileTimeClassPath()
, which I think is always retained by the corresponding full compilation action here:
bazel/src/main/java/com/google/devtools/build/lib/rules/java/JavaCompilationHelper.java
Lines 331 to 346 in 379ee5f
NestedSet<Artifact> classpath = | |
NestedSetBuilder.<Artifact>naiveLinkOrder() | |
.addTransitive(bootClassPathInfo.auxiliary()) | |
.addTransitive(attributes.getCompileTimeClassPath()) | |
.build(); | |
if (!bootClassPathInfo.auxiliary().isEmpty()) { | |
builder.setClasspathEntries(classpath); | |
builder.setDirectJars( | |
NestedSetBuilder.<Artifact>naiveLinkOrder() | |
.addTransitive(bootClassPathInfo.auxiliary()) | |
.addTransitive(attributes.getDirectJars()) | |
.build()); | |
} else { | |
builder.setClasspathEntries(attributes.getCompileTimeClassPath()); | |
builder.setDirectJars(attributes.getDirectJars()); | |
} |
src/main/java/com/google/devtools/build/lib/rules/java/JavaCompileAction.java
Show resolved
Hide resolved
Thanks for tackling this! Overall this seems like an encouraging approach to me.
(+1, the implementation comment here mentions google/turbine@f9f2dec, which is what I was talking about. I think we're on the same page about the turbine behaviour.)
If I'm following I think either of the 'not disabling path mapping' options are complementary with this PR, and they would help with the "problematic case that this PR isn't meant to address" in #19872 (comment).
That's a good point: there are other consumers of jdeps, so I think we would want |
ca601b8
to
40710ef
Compare
I can add a test for the situation in which this change is relevant after the merge of the general test setup in #18155. |
cf54d1d
to
69b71c9
Compare
@aranguyen did you say that this PR doesn't address the failure you found in Google? |
69b71c9
to
b35c3aa
Compare
It's this commit that I tested a4e1d7b#diff-738efdd95b5eedf99d16e391201cb00e284b0ccf3542506f6c88cb01ffb96830 and saw mixed paths still. I think it's different from this one. @fmeum any idea why the presubmits are failing. I'll work on testing this one soon |
Not yet, I will look into it. |
b35c3aa
to
5601212
Compare
Actually, 7.1.0 may be better since this hasn't been reviewed yet, @fmeum. |
`JavaHeaderCompileAction` can apply an optimization where it compiles the source files only against direct dependencies, making use of the fact that Turbine includes sufficient information about transitive dependencies into the direct header jars in a special directory. In order to ensure that downstream consumers of header compilation action can use its `.jdeps` file regardless of which of these actions actually uses path mapping, all such paths to transitive jars have to be unmapped. With this commit, actions can declare additional artifacts whose paths they want to apply path mapping to. By always passing all transitive jars into the path mapper, even when only the direct jars are inputs to the action.
9b5f71d
to
0c97e29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ization `JavaHeaderCompileAction` can apply an optimization where it compiles the source files only against direct dependencies, making use of the fact that Turbine includes sufficient information about transitive dependencies into the direct header jars in a special directory. In order to ensure that downstream consumers of header compilation action can use its `.jdeps` file regardless of which of these actions actually uses path mapping, all such paths to transitive jars have to be unmapped. With this commit, actions can declare additional artifacts whose paths they want to apply path mapping to. By always passing all transitive jars into the path mapper, even when only the direct jars are inputs to the action, the transitive header jar paths can be unmapped and stripped path collisions between them correctly result in a noop `PathMapper` being used for the current action. Fixes bazelbuild#21091 Closes bazelbuild#19872. PiperOrigin-RevId: 604772306 Change-Id: I21d25785b2e909aace8554251f41b1012185138a
…th optimization (#21227) `JavaHeaderCompileAction` can apply an optimization where it compiles the source files only against direct dependencies, making use of the fact that Turbine includes sufficient information about transitive dependencies into the direct header jars in a special directory. In order to ensure that downstream consumers of header compilation action can use its `.jdeps` file regardless of which of these actions actually uses path mapping, all such paths to transitive jars have to be unmapped. With this commit, actions can declare additional artifacts whose paths they want to apply path mapping to. By always passing all transitive jars into the path mapper, even when only the direct jars are inputs to the action, the transitive header jar paths can be unmapped and stripped path collisions between them correctly result in a noop `PathMapper` being used for the current action. Fixes #21091 Closes #19872. Commit fd196bf PiperOrigin-RevId: 604772306 Change-Id: I21d25785b2e909aace8554251f41b1012185138a Co-authored-by: Fabian Meumertzheim <[email protected]>
This was rolled back due to a significant memory regression (f5d76b1). |
Is there any additional information in the internal version of the rollback commit? It looks like it has some redacted information. I will try to reproduce this on Bazel itself. |
Not much. It's a 0.6% memory regression on a large Java project, which crosses some performance check threshold (I can't remember the threshold). It also found a 3.2% reduction in system time and 60% reduction in @oquenchil might be able to offer more benchmark insight. I'm about to go OOO for a week, but @aranguyen is aware. Conceptually, were we expecting this to have any resource impact if path mapping isn't enabled? The affected build isn't using path mapping at all. |
Thanks for the additional input. Just so that I understand it correctly: The change increased memory by 0.6% while reducing system time by 3.2%? That would certainly be interesting (and unexpected). |
I don't think the benchmark is always reliable, even with "significant" findings. It's a basic story of you have to repeat a huge amount of times to get consistently reliable results. But @oquenchil did repeat the memory profiling a few times, so I believe that result is valid. |
Just from reading the code:
|
Thanks for taking a look!
When I finalized this PR, the instance size was 72 bytes with 4 bytes padding, so this change wouldn't have affected the size. However, in the meantime the I submitted a separate PR (#21290) that removes the
If I traced the flow of this nested set correctly, while it's newly retained by this action, it should already be retained by the sibling
This looks easy to achieve and I will implement it when submitting the rollforward. |
I confirmed that storing the |
Thanks for the additional information! Does the benchmark build involve "full" (non-header) Java compilation actions? If it's mostly just header compilation actions and that's actually a common use case, then I don't see a way to make the approach in this PR work at all. My assumption so far has been that in "real-world builds" there would always be a sibling non-header compilation action that retains this nested set (which ultimately comes from https://cs.opensource.google/bazel/bazel/+/af28cae84e93b5d706c9cd374d1b8e324a3e8ca0:src/main/java/com/google/devtools/build/lib/rules/java/JavaTargetAttributes.java;l=42;bpv=1;bpt=1) in https://cs.opensource.google/bazel/bazel/+/af28cae84e93b5d706c9cd374d1b8e324a3e8ca0:src/main/java/com/google/devtools/build/lib/rules/java/JavaCompilationHelper.java;l=328-343. If the benchmark analyzes the non-header compilation actions and still ends up with additional nested sets, then I'm missing something crucial here. I will benchmark this locally to get a better understanding of what's retained where. |
The benchmark build has 194,776 |
@fmeum I was also able to confirm the regression myself. Since you mentioned the possibility of this approach not working as the regression is definitely a blocker, WDYT about making
|
@aranguyen Yes, that's a great alternative. I would say go ahead with it! I will spend more time understanding why this regression exists in the first place (I can reproduce it locally on Bazel). Worst case I'll learn something, best case we can switch to this approach later. |
@fmeum benchmark came back with no memory regression with the approach i mentioned but there's a bit of a cost in wall time. I'll try and sort that out. Please do let me know how your investigation goes as well. |
I now understand why the actions weren't sharing the |
Submitted a rollforward change: #21401 |
JavaHeaderCompileAction
can apply an optimization where it compiles the source files only against direct dependencies, making use of the fact that Turbine includes sufficient information about transitive dependencies into the direct header jars in a special directory.In order to ensure that downstream consumers of header compilation action can use its
.jdeps
file regardless of which of these actions actually uses path mapping, all such paths to transitive jars have to be unmapped.With this commit, actions can declare additional artifacts whose paths they want to apply path mapping to. By always passing all transitive jars into the path mapper, even when only the direct jars are inputs to the action, the transitive header jar paths can be unmapped and stripped path collisions between them correctly result in a noop
PathMapper
being used for the current action.Fixes #21091