Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlobalOpt: don't speculatively execute initializers of global variables #30445

Merged
merged 5 commits into from
Mar 24, 2020

Conversation

eeckstein
Copy link
Contributor

For example: hoist out of loops where the loop count could be 0.
We did this on purpose. But, if not wrong, it's at least very confusing if the initializer has observable side effects.
Instead let CSE and LICM do the job and handle initializer side-effects correctly.

rdar://problem/60292679

@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
Dictionary4 154 198 +28.6% 0.78x
EqualSubstringSubstring 22 28 +27.3% 0.79x
LessSubstringSubstring 22 28 +27.3% 0.79x
EqualSubstringSubstringGenericEquatable 22 28 +27.3% 0.79x
EqualSubstringString 22 28 +27.3% 0.79x
LessSubstringSubstringGenericComparable 22 28 +27.3% 0.79x
EqualStringSubstring 23 29 +26.1% 0.79x (?)
CharacterPropertiesStashedMemo 650 770 +18.5% 0.84x (?)
Dictionary4OfObjects 190 221 +16.3% 0.86x (?)
Data.append.Sequence.64kB.Count.I 26 30 +15.4% 0.87x (?)
Data.append.Sequence.64kB.Count 26 30 +15.4% 0.87x (?)
CharIteration_punctuated_unicodeScalars_Backwards 600 680 +13.3% 0.88x (?)
StringComparison_longSharedPrefix 313 350 +11.8% 0.89x
 
Improvement OLD NEW DELTA RATIO
SortIntPyramid 450 360 -20.0% 1.25x (?)
RangeOverlapsClosedRange 100 86 -14.0% 1.16x (?)
ObjectiveCBridgeStringHash 80 74 -7.5% 1.08x

Code size: -O

Regression OLD NEW DELTA RATIO
IterateData.o 2032 2064 +1.6% 0.98x
RC4.o 3927 3975 +1.2% 0.99x
RangeOverlaps.o 6072 6136 +1.1% 0.99x
LazyFilter.o 8389 8477 +1.0% 0.99x
 
Improvement OLD NEW DELTA RATIO
Phonebook.o 9963 9191 -7.7% 1.08x
CharacterProperties.o 22897 22465 -1.9% 1.02x
ChaCha.o 14361 14137 -1.6% 1.02x

Performance: -Osize

Regression OLD NEW DELTA RATIO
DataCreateMedium 4100 6300 +53.7% 0.65x
LessSubstringSubstring 22 28 +27.3% 0.79x
EqualSubstringString 22 28 +27.3% 0.79x (?)
LessSubstringSubstringGenericComparable 22 28 +27.3% 0.79x
EqualSubstringSubstring 23 29 +26.1% 0.79x (?)
EqualStringSubstring 23 29 +26.1% 0.79x
EqualSubstringSubstringGenericEquatable 23 29 +26.1% 0.79x
StringComparison_longSharedPrefix 314 350 +11.5% 0.90x (?)
Calculator 136 149 +9.6% 0.91x (?)
CharIteration_ascii_unicodeScalars 1720 1880 +9.3% 0.91x (?)
CharIteration_tweet_unicodeScalars 3440 3720 +8.1% 0.92x (?)
Chars2 3100 3350 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ChaCha 49 40 -18.4% 1.22x
ArraySetElement 325 283 -12.9% 1.15x (?)
ArrayPlusEqualSingleElementCollection 470 423 -10.0% 1.11x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
LazyFilter.o 8019 8107 +1.1% 0.99x
 
Improvement OLD NEW DELTA RATIO
Phonebook.o 9478 9062 -4.4% 1.05x

Performance: -Onone

Regression OLD NEW DELTA RATIO
EqualSubstringSubstringGenericEquatable 26 32 +23.1% 0.81x (?)
LessSubstringSubstringGenericComparable 26 32 +23.1% 0.81x (?)
EqualSubstringSubstring 27 33 +22.2% 0.82x (?)
LessSubstringSubstring 27 33 +22.2% 0.82x (?)
EqualStringSubstring 28 33 +17.9% 0.85x
EqualSubstringString 28 33 +17.9% 0.85x

Code size: -swiftlibs

Regression OLD NEW DELTA RATIO
libswiftStdlibUnittest.dylib 327680 331776 +1.2% 0.99x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@atrick
Copy link
Contributor

atrick commented Mar 17, 2020

@eeckstein sorry if this is confusing but it's important that programmers don't make global initializers dependent on any program side effects. The language semantics that we decided on were pretty clear: global initializers can run any time between the start of the program and the first point of use. If that decision needs to be revisited that should go through the core team

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 7a60bdc4eaf35daa0ed2a5b6270781b4044d912c

@atrick
Copy link
Contributor

atrick commented Mar 17, 2020

@eeckstein I can elaborate on the forums, but these are my main points of concern in somewhat arbitrary order:

  1. Is there a legitimate need for programmers to rely on the order of global initializers? Is there any case where this is useful and isn't better expressed with a different strategy?

  2. Is there, or will there ever be a performance need to hoist global initializers that have side effects (write to memory or release objects)? The concern is not the cost of accessing the global; it is the cost of arbitrary side effects in critical code paths creating a hard optimization barrier. We know from experience that Java's lazy initialization semantics have been a major barrier to optimization unless you have JIT that performs the optimization after initialization.

  3. If we stop aggresively optimizing global initializers, and instead respect potential side effects, then we are allowing programs to rely on this behavior. Realistically, I'm afraid there is no going back from this decision later. That's why I think this PR is a language decision.

  4. If we declare that relying on global initializer order is unspecified, then any change in behavior caused by the optimizer should be possible to detect as a program error with diagnostics or should fall within unspecified behavior (e.g. side effects can be observed in a different order). We should never need to resort to an undiagnosed runtime crash or undefined behavior.

  5. Originally, when this was openly discussed, we wanted to provide diagnostics to raise a program error whenever global initializer order could result in a cyclic dependence. Is it possible for a such a diagnostic to be complete? If not, what would we plan to do for undiagnosed cases?

  6. When the current optimizer hoists an unreachable global initializer call within a function that may be called by the same initializer, it may introduce a reentrant call to swift_once, which crashes. This needs to be fixed, but can be done in various ways: (a) diagnose the programmer error statically (b) avoid introducing the reentrant call (c) when introducing a potentially reentrant call, guard it with a reentrance check. If the check fails, emit a runtime diagnostic.

@eeckstein
Copy link
Contributor Author

@swift-ci test linux

@eeckstein
Copy link
Contributor Author

@atrick I see this more from a pragmatic view. It's always a win to make swift more predictable, easy to use and safe. When people e.g. develop and test in debug mode and they see a mysterious crash or just a different behavior in the archived product, it does not make their life easier.

Regarding performance, we have several language features where we trade performance against safety and predictability: starting from ARC up to runtime exclusivity checking. I doubt that hoisting global inits speculatively will ever be such a significant performance win, compared to what we lose with all this other overhead.
And even if it is, we can introduce such an optimization later. This change, which just makes an optimization more conservative, is clearly not a language decision.

Beside all this, the PR is also a code cleanup, because optimizaing global inits is now done by CSE and LICM (where it belongs). A lot of code is removed from GlobalOpt.

@jckarter
Copy link
Contributor

jckarter commented Mar 19, 2020

Regarding performance, we have several language features where we trade performance against safety and predictability: starting from ARC up to runtime exclusivity checking. I doubt that hoisting global inits speculatively will ever be such a significant performance win, compared to what we lose with all this other overhead.
And even if it is, we can introduce such an optimization later. This change, which just makes an optimization more conservative, is clearly not a language decision.

Well, we chose ARC and exclusivity checking knowing that we could optimize them away in important cases, just like we did with the rule about global initializers.

I remember hoisting of global initializers being a significant performance win in the Swift 0.x days, but that might've been before we inlined the fast path check on Apple platforms, or had GlobalOpt do optimization of constants into static initializers. If possible, you might want to also run the benchmarks on Linux or Windows, where (unless something's changed) I don't think we assume dispatch_once ABI for global initializers, so global accessors would still pay the full cost of a call on every access.

@jckarter
Copy link
Contributor

jckarter commented Mar 19, 2020

Looking at the example from rdar://problem/60292679, it looks like the situation is like this:

for _ in [] {
  _ = globalVariable
}

and the problem is that we're hoisting the global access out of a loop that never runs, triggering an initialization that would not have happened otherwise. If the loop ran even once, then their code would've crashed with or without optimization. We could argue that hoisting the global access out of the dead loop violates the "initializers run some time before the access" rule, because the access never actually happens in that case. OTOH, if we interpret the rule that way, that makes hoisting much less powerful if we say we need a trip count > 0 to do the hoist.

@atrick
Copy link
Contributor

atrick commented Mar 19, 2020

@jckarter I am not at all concerned about the overhead of checking for initialization. That falls within the ARC/exclusivity range of safety checks. The issue is optimizing other code within the same loop as a global access. This PR places the requirement on the compiler to analyze all the side effects of the global initializer before performing any optimization that we may want to apply to that loop. The existing model only requires the optimizer to analyze access to an already initialized global.

I think we can all strongly agree that the current model has unresolved problems. We can talk about two separate issues

  1. Unreachable code inside an initializer. For early initialization to be practical, we really need a rule that all code in an initializer is potentially reachable. This is a pretty strange and nonintuitive rule. But if we rely on the compiler to prove reachability (trip count > 0), that makes the compiler's ability to do the optimization unpredictable, so we end up with a weird performance cliff.

One other thing to note, even with this PR, the crash-on-reentrant-initializer problem still exists! So we'll need to introduce a diagnostic for that one way or another. The only question is whether we consider initialization in unreachable code to be reentrant.

  1. General global initialization order. This PR implements as-if fully ordered lazy initialization. Do we actually want programmers relying on that? If not, is it possible to diagnose programs that depend on initializer order? The current situation is bad because we resort to unspecified behavior that changes from -Onone to -O.

@jckarter
Copy link
Contributor

I don't think we want to guarantee initializer ordering, beyond the formal dependencies among the initializer expressions themselves.

@jckarter
Copy link
Contributor

@atrick With loops and side effects, we might still be able to take advantage of the fact that the side effects of the initializer happen at most once by doing some loop rotation and/or partial unrolling, if we were able to turn:

while (condition) {
  auto &global = accessor();
  body(&global);
}

into:

if (condition) {
  // Hoist the accessor into the first trip
  auto &global = accessor();

  do {
    body(&global);
  } while (condition);
}

That gets more complicated if the access within the loop is also conditional, though.

@eeckstein
Copy link
Contributor Author

eeckstein commented Mar 20, 2020

@atrick You are right, there are 2 separate issues:

Point 1 is what I called "execute an initializer speculatively". Preventing this is not so much of a performance problem for 0-trip count loops, because the compiler already does the loop rotation (what @jckarter mentioned) today. So in most loops we can hoist the initializer call, even if the original loop can have a 0 trip count (except the global is inside a condition in the loop).

It's also worth mentioning that all this only affects global with non-trivial types (well, and also enums, but that's another story), because globals with trivial types, e.g. Int, are allocated on the data segment and don't have an initializer anyway. We didn't do that yet in the Swift 0.x days, so I think this is the reason why aggressively hoisting global initializers was very important at that time.

@atrick
Copy link
Contributor

atrick commented Mar 20, 2020

@eeckstein I agree on all those points, but I'm not sure if your comments still apply to globals defined in another module. We should be able to continue optimizing them when library evolution is disabled, or when they are declared @_fixed_layout (not sure if we support @frozen on globals yet). I'm not sure if that just continues to work for constant initializers after this PR. It definitely will need some work for non-constant initializers

@eeckstein
Copy link
Contributor Author

I chatted with @atrick about this: it should be no problem. With cross-module optimization initializers can be serialized and therefore analyzed in the calling module.

…rites

Detect stores to globals which get the address from an addressor call (and not from global_addr).
This kind of SIL is currently not generated, so most likely this bug does not show up. But as soon as we CSE addressor calls, it can be a real problem.
For example: hoist out of loops where the loop count could be 0.
We did this on purpose. But, if not wrong, it's at least very confusing if the initializer has observable side effects.
Instead let CSE and LICM do the job and handle initializer side-effects correctly.

rdar://problem/60292679
Global initializers are executed only once.
Therefore it's possible to hoist such an initializer call to a loop pre-header - in case there are no conflicting side-effects in the loop before the call.
Also, the call must post-dominate the loop pre-header. Otherwise it would be executed speculatively.
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

@eeckstein eeckstein merged commit 0873622 into swiftlang:master Mar 24, 2020
@eeckstein eeckstein deleted the globalopt branch March 24, 2020 14:09
@groue
Copy link

groue commented Apr 13, 2020

Thank you @eeckstein, this change solves actual issues: https://forums.swift.org/t/is-it-possible-to-prevent-inlining-and-tail-call-optimization/21680/11

groue added a commit to groue/GRDB.swift that referenced this pull request Dec 12, 2021
groue added a commit to groue/GRDB.swift that referenced this pull request Dec 12, 2021
…ized release builds"

This reverts commit c62ccf1 (aimed at fixing #755).

We can do this because swiftlang/swift#30445 is present in the Swift 5.3 release https://github.com/apple/swift/releases/tag/swift-5.3-RELEASE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants