GlobalOpt: don't speculatively execute initializers of global variables #30445

eeckstein · 2020-03-17T15:44:29Z

For example: hoist out of loops where the loop count could be 0.
We did this on purpose. But, if not wrong, it's at least very confusing if the initializer has observable side effects.
Instead let CSE and LICM do the job and handle initializer side-effects correctly.

rdar://problem/60292679

eeckstein · 2020-03-17T15:44:44Z

@swift-ci test

eeckstein · 2020-03-17T15:44:53Z

@swift-ci benchmark

swift-ci · 2020-03-17T16:15:00Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
Dictionary4	154	198	+28.6%	0.78x
EqualSubstringSubstring	22	28	+27.3%	0.79x
LessSubstringSubstring	22	28	+27.3%	0.79x
EqualSubstringSubstringGenericEquatable	22	28	+27.3%	0.79x
EqualSubstringString	22	28	+27.3%	0.79x
LessSubstringSubstringGenericComparable	22	28	+27.3%	0.79x
EqualStringSubstring	23	29	+26.1%	0.79x (?)
CharacterPropertiesStashedMemo	650	770	+18.5%	0.84x (?)
Dictionary4OfObjects	190	221	+16.3%	0.86x (?)
Data.append.Sequence.64kB.Count.I	26	30	+15.4%	0.87x (?)
Data.append.Sequence.64kB.Count	26	30	+15.4%	0.87x (?)
CharIteration_punctuated_unicodeScalars_Backwards	600	680	+13.3%	0.88x (?)
StringComparison_longSharedPrefix	313	350	+11.8%	0.89x

Improvement	OLD	NEW	DELTA	RATIO
SortIntPyramid	450	360	-20.0%	1.25x (?)
RangeOverlapsClosedRange	100	86	-14.0%	1.16x (?)
ObjectiveCBridgeStringHash	80	74	-7.5%	1.08x

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
IterateData.o	2032	2064	+1.6%	0.98x
RC4.o	3927	3975	+1.2%	0.99x
RangeOverlaps.o	6072	6136	+1.1%	0.99x
LazyFilter.o	8389	8477	+1.0%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
Phonebook.o	9963	9191	-7.7%	1.08x
CharacterProperties.o	22897	22465	-1.9%	1.02x
ChaCha.o	14361	14137	-1.6%	1.02x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
DataCreateMedium	4100	6300	+53.7%	0.65x
LessSubstringSubstring	22	28	+27.3%	0.79x
EqualSubstringString	22	28	+27.3%	0.79x (?)
LessSubstringSubstringGenericComparable	22	28	+27.3%	0.79x
EqualSubstringSubstring	23	29	+26.1%	0.79x (?)
EqualStringSubstring	23	29	+26.1%	0.79x
EqualSubstringSubstringGenericEquatable	23	29	+26.1%	0.79x
StringComparison_longSharedPrefix	314	350	+11.5%	0.90x (?)
Calculator	136	149	+9.6%	0.91x (?)
CharIteration_ascii_unicodeScalars	1720	1880	+9.3%	0.91x (?)
CharIteration_tweet_unicodeScalars	3440	3720	+8.1%	0.92x (?)
Chars2	3100	3350	+8.1%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ChaCha	49	40	-18.4%	1.22x
ArraySetElement	325	283	-12.9%	1.15x (?)
ArrayPlusEqualSingleElementCollection	470	423	-10.0%	1.11x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
LazyFilter.o	8019	8107	+1.1%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
Phonebook.o	9478	9062	-4.4%	1.05x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
EqualSubstringSubstringGenericEquatable	26	32	+23.1%	0.81x (?)
LessSubstringSubstringGenericComparable	26	32	+23.1%	0.81x (?)
EqualSubstringSubstring	27	33	+22.2%	0.82x (?)
LessSubstringSubstring	27	33	+22.2%	0.82x (?)
EqualStringSubstring	28	33	+17.9%	0.85x
EqualSubstringString	28	33	+17.9%	0.85x

Code size: -swiftlibs

Regression	OLD	NEW	DELTA	RATIO
libswiftStdlibUnittest.dylib	327680	331776	+1.2%	0.99x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

atrick · 2020-03-17T16:27:36Z

@eeckstein sorry if this is confusing but it's important that programmers don't make global initializers dependent on any program side effects. The language semantics that we decided on were pretty clear: global initializers can run any time between the start of the program and the first point of use. If that decision needs to be revisited that should go through the core team

swift-ci · 2020-03-17T17:25:12Z

Build failed
Swift Test Linux Platform
Git Sha - 7a60bdc4eaf35daa0ed2a5b6270781b4044d912c

atrick · 2020-03-17T21:31:51Z

@eeckstein I can elaborate on the forums, but these are my main points of concern in somewhat arbitrary order:

Is there a legitimate need for programmers to rely on the order of global initializers? Is there any case where this is useful and isn't better expressed with a different strategy?
Is there, or will there ever be a performance need to hoist global initializers that have side effects (write to memory or release objects)? The concern is not the cost of accessing the global; it is the cost of arbitrary side effects in critical code paths creating a hard optimization barrier. We know from experience that Java's lazy initialization semantics have been a major barrier to optimization unless you have JIT that performs the optimization after initialization.
If we stop aggresively optimizing global initializers, and instead respect potential side effects, then we are allowing programs to rely on this behavior. Realistically, I'm afraid there is no going back from this decision later. That's why I think this PR is a language decision.
If we declare that relying on global initializer order is unspecified, then any change in behavior caused by the optimizer should be possible to detect as a program error with diagnostics or should fall within unspecified behavior (e.g. side effects can be observed in a different order). We should never need to resort to an undiagnosed runtime crash or undefined behavior.
Originally, when this was openly discussed, we wanted to provide diagnostics to raise a program error whenever global initializer order could result in a cyclic dependence. Is it possible for a such a diagnostic to be complete? If not, what would we plan to do for undiagnosed cases?
When the current optimizer hoists an unreachable global initializer call within a function that may be called by the same initializer, it may introduce a reentrant call to swift_once, which crashes. This needs to be fixed, but can be done in various ways: (a) diagnose the programmer error statically (b) avoid introducing the reentrant call (c) when introducing a potentially reentrant call, guard it with a reentrance check. If the check fails, emit a runtime diagnostic.

eeckstein · 2020-03-18T07:35:14Z

@swift-ci test linux

eeckstein · 2020-03-18T12:44:29Z

@atrick I see this more from a pragmatic view. It's always a win to make swift more predictable, easy to use and safe. When people e.g. develop and test in debug mode and they see a mysterious crash or just a different behavior in the archived product, it does not make their life easier.

Regarding performance, we have several language features where we trade performance against safety and predictability: starting from ARC up to runtime exclusivity checking. I doubt that hoisting global inits speculatively will ever be such a significant performance win, compared to what we lose with all this other overhead.
And even if it is, we can introduce such an optimization later. This change, which just makes an optimization more conservative, is clearly not a language decision.

Beside all this, the PR is also a code cleanup, because optimizaing global inits is now done by CSE and LICM (where it belongs). A lot of code is removed from GlobalOpt.

jckarter · 2020-03-19T15:14:57Z

Regarding performance, we have several language features where we trade performance against safety and predictability: starting from ARC up to runtime exclusivity checking. I doubt that hoisting global inits speculatively will ever be such a significant performance win, compared to what we lose with all this other overhead.
And even if it is, we can introduce such an optimization later. This change, which just makes an optimization more conservative, is clearly not a language decision.

Well, we chose ARC and exclusivity checking knowing that we could optimize them away in important cases, just like we did with the rule about global initializers.

I remember hoisting of global initializers being a significant performance win in the Swift 0.x days, but that might've been before we inlined the fast path check on Apple platforms, or had GlobalOpt do optimization of constants into static initializers. If possible, you might want to also run the benchmarks on Linux or Windows, where (unless something's changed) I don't think we assume dispatch_once ABI for global initializers, so global accessors would still pay the full cost of a call on every access.

jckarter · 2020-03-19T15:18:45Z

Looking at the example from rdar://problem/60292679, it looks like the situation is like this:

for _ in [] {
  _ = globalVariable
}

and the problem is that we're hoisting the global access out of a loop that never runs, triggering an initialization that would not have happened otherwise. If the loop ran even once, then their code would've crashed with or without optimization. We could argue that hoisting the global access out of the dead loop violates the "initializers run some time before the access" rule, because the access never actually happens in that case. OTOH, if we interpret the rule that way, that makes hoisting much less powerful if we say we need a trip count > 0 to do the hoist.

atrick · 2020-03-19T16:03:52Z

@jckarter I am not at all concerned about the overhead of checking for initialization. That falls within the ARC/exclusivity range of safety checks. The issue is optimizing other code within the same loop as a global access. This PR places the requirement on the compiler to analyze all the side effects of the global initializer before performing any optimization that we may want to apply to that loop. The existing model only requires the optimizer to analyze access to an already initialized global.

I think we can all strongly agree that the current model has unresolved problems. We can talk about two separate issues

Unreachable code inside an initializer. For early initialization to be practical, we really need a rule that all code in an initializer is potentially reachable. This is a pretty strange and nonintuitive rule. But if we rely on the compiler to prove reachability (trip count > 0), that makes the compiler's ability to do the optimization unpredictable, so we end up with a weird performance cliff.

One other thing to note, even with this PR, the crash-on-reentrant-initializer problem still exists! So we'll need to introduce a diagnostic for that one way or another. The only question is whether we consider initialization in unreachable code to be reentrant.

General global initialization order. This PR implements as-if fully ordered lazy initialization. Do we actually want programmers relying on that? If not, is it possible to diagnose programs that depend on initializer order? The current situation is bad because we resort to unspecified behavior that changes from -Onone to -O.

jckarter · 2020-03-19T16:20:14Z

I don't think we want to guarantee initializer ordering, beyond the formal dependencies among the initializer expressions themselves.

jckarter · 2020-03-19T20:06:56Z

@atrick With loops and side effects, we might still be able to take advantage of the fact that the side effects of the initializer happen at most once by doing some loop rotation and/or partial unrolling, if we were able to turn:

while (condition) {
  auto &global = accessor();
  body(&global);
}

into:

if (condition) {
  // Hoist the accessor into the first trip
  auto &global = accessor();

  do {
    body(&global);
  } while (condition);
}

That gets more complicated if the access within the loop is also conditional, though.

eeckstein · 2020-03-20T10:29:39Z

@atrick You are right, there are 2 separate issues:

Point 1 is what I called "execute an initializer speculatively". Preventing this is not so much of a performance problem for 0-trip count loops, because the compiler already does the loop rotation (what @jckarter mentioned) today. So in most loops we can hoist the initializer call, even if the original loop can have a 0 trip count (except the global is inside a condition in the loop).

It's also worth mentioning that all this only affects global with non-trivial types (well, and also enums, but that's another story), because globals with trivial types, e.g. Int, are allocated on the data segment and don't have an initializer anyway. We didn't do that yet in the Swift 0.x days, so I think this is the reason why aggressively hoisting global initializers was very important at that time.

atrick · 2020-03-20T19:59:56Z

@eeckstein I agree on all those points, but I'm not sure if your comments still apply to globals defined in another module. We should be able to continue optimizing them when library evolution is disabled, or when they are declared @_fixed_layout (not sure if we support @frozen on globals yet). I'm not sure if that just continues to work for constant initializers after this PR. It definitely will need some work for non-constant initializers

eeckstein · 2020-03-23T10:20:44Z

I chatted with @atrick about this: it should be no problem. With cross-module optimization initializers can be serialized and therefore analyzed in the calling module.

…rites Detect stores to globals which get the address from an addressor call (and not from global_addr). This kind of SIL is currently not generated, so most likely this bug does not show up. But as soon as we CSE addressor calls, it can be a real problem.

For example: hoist out of loops where the loop count could be 0. We did this on purpose. But, if not wrong, it's at least very confusing if the initializer has observable side effects. Instead let CSE and LICM do the job and handle initializer side-effects correctly. rdar://problem/60292679

It's needed to hoist global_init calls.

Global initializers are executed only once. Therefore it's possible to hoist such an initializer call to a loop pre-header - in case there are no conflicting side-effects in the loop before the call. Also, the call must post-dominate the loop pre-header. Otherwise it would be executed speculatively.

eeckstein · 2020-03-24T09:31:42Z

@swift-ci smoke test

eeckstein · 2020-03-24T09:33:22Z

@swift-ci smoke test

groue · 2020-04-13T17:11:36Z

Thank you @eeckstein, this change solves actual issues: https://forums.swift.org/t/is-it-possible-to-prevent-inlining-and-tail-call-optimization/21680/11

…ized release builds" This reverts commit c62ccf1. swiftlang/swift#30445 is present in the Swift 5.3 release https://github.com/apple/swift/releases/tag/swift-5.3-RELEASE

…ized release builds" This reverts commit c62ccf1 (aimed at fixing #755). We can do this because swiftlang/swift#30445 is present in the Swift 5.3 release https://github.com/apple/swift/releases/tag/swift-5.3-RELEASE

eeckstein added 5 commits March 23, 2020 14:41

CSE: remove redundant global initializer calls

71eebaa

SILOptimizer: add a LICM pass at mid-level in the pass pipeline.

e0c4fa2

It's needed to hoist global_init calls.

eeckstein force-pushed the globalopt branch from 7a60bdc to 3ad7d54 Compare March 23, 2020 15:10

eeckstein merged commit 0873622 into swiftlang:master Mar 24, 2020

eeckstein deleted the globalopt branch March 24, 2020 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GlobalOpt: don't speculatively execute initializers of global variables #30445

GlobalOpt: don't speculatively execute initializers of global variables #30445

eeckstein commented Mar 17, 2020

eeckstein commented Mar 17, 2020

eeckstein commented Mar 17, 2020

swift-ci commented Mar 17, 2020

atrick commented Mar 17, 2020

swift-ci commented Mar 17, 2020

atrick commented Mar 17, 2020

eeckstein commented Mar 18, 2020

eeckstein commented Mar 18, 2020

jckarter commented Mar 19, 2020 •

edited

Loading

jckarter commented Mar 19, 2020 •

edited

Loading

atrick commented Mar 19, 2020

jckarter commented Mar 19, 2020

jckarter commented Mar 19, 2020

eeckstein commented Mar 20, 2020 •

edited

Loading

atrick commented Mar 20, 2020

eeckstein commented Mar 23, 2020

eeckstein commented Mar 24, 2020

eeckstein commented Mar 24, 2020

groue commented Apr 13, 2020

GlobalOpt: don't speculatively execute initializers of global variables #30445

GlobalOpt: don't speculatively execute initializers of global variables #30445

Conversation

eeckstein commented Mar 17, 2020

eeckstein commented Mar 17, 2020

eeckstein commented Mar 17, 2020

swift-ci commented Mar 17, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

atrick commented Mar 17, 2020

swift-ci commented Mar 17, 2020

atrick commented Mar 17, 2020

eeckstein commented Mar 18, 2020

eeckstein commented Mar 18, 2020

jckarter commented Mar 19, 2020 • edited Loading

jckarter commented Mar 19, 2020 • edited Loading

atrick commented Mar 19, 2020

jckarter commented Mar 19, 2020

jckarter commented Mar 19, 2020

eeckstein commented Mar 20, 2020 • edited Loading

atrick commented Mar 20, 2020

eeckstein commented Mar 23, 2020

eeckstein commented Mar 24, 2020

eeckstein commented Mar 24, 2020

groue commented Apr 13, 2020

jckarter commented Mar 19, 2020 •

edited

Loading

jckarter commented Mar 19, 2020 •

edited

Loading

eeckstein commented Mar 20, 2020 •

edited

Loading