Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate tuple allocations in branching let binding rhs #11407

Merged
merged 6 commits into from
Jun 16, 2021

Conversation

kerams
Copy link
Contributor

@kerams kerams commented Apr 10, 2021

Implements #8953.

[<System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)>]
let u () = 3

let rec z () =
  let a, b =
    if System.Diagnostics.Debugger.IsAttached then
      "".ToString ()
      System.DateTime.Now
      1, u ()
    elif System.Diagnostics.Debugger.IsAttached then
      let x, y = 100, 100
      x, y
    else
      if 1 / 0 = 1 then
        2, 2
      else
        match 1 / 0 with
        | 1 -> if 2 / 3 = 1 then 5, 6 else 6, 5
        | 2 -> 6, 6
        | 3 -> u (), 7
        | _ -> 8, z ()
  
  a + b

becomes

public static int z()
{
    int num;
    int num2;
    if (Debugger.IsAttached)
    {
        "".ToString();
        DateTime now = DateTime.Now;
        num = 1;
        num2 = A.u();
    }
    else if (Debugger.IsAttached)
    {
        num = 100;
        num2 = 100;
    }
    else if (1 / 0 == 1)
    {
        num = 2;
        num2 = 2;
    }
    else
    {
        switch (1 / 0)
        {
        case 1:
            if (2 / 3 == 1)
            {
                num = 5;
                num2 = 6;
            }
            else
            {
                num = 6;
                num2 = 5;
            }
            break;
        case 2:
            num = 6;
            num2 = 6;
            break;
        case 3:
            num = A.u();
            num2 = 7;
            break;
        default:
            num = 8;
            num2 = A.z();
            break;
        }
    }
    return num + num2;
}

instead of

public static int z()
{
    object obj;
    if (Debugger.IsAttached)
    {
        "".ToString();
        DateTime now = DateTime.Now;
        obj = new Tuple<int, int>(1, u());
    }
    else if (Debugger.IsAttached)
    {
        obj = new Tuple<int, int>(100, 100);
    }
    else if (1 / 0 == 1)
    {
        obj = new Tuple<int, int>(2, 2);
    }
    else
    {
        switch (1 / 0)
        {
            default:
                obj = new Tuple<int, int>(8, z());
                break;
            case 1:
                obj = ((2 / 3 != 1) ? new Tuple<int, int>(6, 5) : new Tuple<int, int>(5, 6));
                break;
            case 2:
                obj = new Tuple<int, int>(6, 6);
                break;
            case 3:
                obj = new Tuple<int, int>(u(), 7);
                break;
        }
    }
    Tuple<int, int> tuple = (Tuple<int, int>)obj;
    return tuple.Item1 + tuple.Item2;
}

More tests coming soon.

src/fsharp/Optimizer.fs Outdated Show resolved Hide resolved
@kerams
Copy link
Contributor Author

kerams commented Apr 10, 2021

As mentioned in the comments, the optimization does not kick in with a first class tuple use:

let z (tuple: int * int) =
    let a, b =
        if cond () then
            tuple
        else
            1, 2
    a + b

It's doable but I feel a lot more code would be required, plus I'm not sure if it's desirable in the first place.

@kerams kerams marked this pull request as draft April 10, 2021 18:11
@kerams kerams marked this pull request as ready for review April 13, 2021 15:20
@kerams
Copy link
Contributor Author

kerams commented Apr 13, 2021

Ideas for other interesting tests, corner cases, would be appreciated.

@kerams
Copy link
Contributor Author

kerams commented May 1, 2021

Any feedback? I don't know when the fsc feature cutoff point happens (or if there is such a thing), but it would be a shame if this didn't make it into the next F# compiler version.

@cartermp cartermp requested review from dsyme and TIHan May 4, 2021 22:58
Copy link
Contributor

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code generally looks good, and the tests are great. Would like to get @TIHan or @dsyme or @vzarytovskii to take a look as well.

dsyme
dsyme previously requested changes May 6, 2021
Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great

The only change I'd really like is to delay the generation of the val refs etc. until we detemine they will be needed.

Also maybe an assert in IlxGen.fs that temporaries of the given name have always been eliminated by Optimizer.fs?

| _ -> None
| _ -> None

let argTys = destRefTupleTy g v.Type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the potential to execute very often - once for each tuple-typed let. I guess the tuple type acts as a good filter but I think it would be great to delay this execution of computing vrefs until we know it's really going to be necessary, i.e. we actually have a Expr.Op at the end of every branch, or at least one branch.

Copy link
Contributor Author

@kerams kerams May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would basically wrapping these 4 lines in a Lazy be an acceptable solution? Annoyingly, the lazy value would have to be a 4-element tuple since argTys, ves and binds are referred to further on too, not just used to create vrefs for the dive function. Or is it literally just vrefs that you want delayed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsyme, it's ready for another look.

src/fsharp/Optimizer.fs Outdated Show resolved Hide resolved


let MakeMutableStructuralBindingForTupleElement (v: Val) i (arg: Expr) argTy =
let name = sprintf "%s_%d_%s" v.LogicalName i PrettyNaming.tempTupleElementAssignmentTargetName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, these temporaries should always be removed within Optimizer.fs, even in debug code?

Copy link
Contributor Author

@kerams kerams May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Given this:

let x () =
    let a, b =
        sideEffect ()
        if cond () then
            let v = "yep"
            let v2 = if cond () then 1 else 3
            v, v2
        else
            "", f ()
    a, b

no tuples are created on the right hand side, but those temporaries will still be used to create a tuple in the let body. Yes, the optimization is a bit pointless here, but I reckon it would be more expensive to detect whether the temporaries will be eliminated in the end, rather than just rewrite the right hand side.

public static Tuple<string, int> x()
{
    TupleElimination.sideEffect();
    string item;
    int item2;
    if (TupleElimination.cond())
    {
        int num = (!TupleElimination.cond()) ? 3 : 1;
        item = "yep";
        item2 = num;
    }
    else
    {
        item = "";
        item2 = TupleElimination.f();
    }
    return new Tuple<string, int>(item, item2);
}

Copy link
Contributor Author

@kerams kerams May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not pointless at all. This is what happens today.

The more I think about it, I realize the temporaries are never removed by the optimizer (that must be why I had not originally appendend temp to the function name, like in MakeStructuralBindingTemp), so the prefix in tempTupleElementAssignmentTargetName is not the best. Maybe you had something else on your mind?

@@ -2586,7 +2586,7 @@ and AddValEqualityInfo g m (v: ValRef) info =
// when their address is passed to the method call. Another exception are mutable variables
// created for tuple elimination in branching tuple bindings because they are assigned to
// exactly once.
if v.IsMutable && not (v.IsCompilerGenerated && (v.DisplayName.StartsWith(PrettyNaming.outArgCompilerGeneratedName) || IsMutableStructuralBindingForTupleElement v)) then
if v.IsMutable && not (IsMutableStructuralBindingForTupleElement v) && not (v.IsCompilerGenerated && v.DisplayName.StartsWith(PrettyNaming.outArgCompilerGeneratedName)) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the not a && not b is irking me, does inverting the conditionals / branch looks better?

Also wondering about repeated calls to v.IsCompilerGenerated that now occurs (one is introduced, unless mistaken).

As you are stress testing the inliner and bool expression optimizer, maybe you should also stress test the final reviewer by rewriting the whole condition from scratch to make it more readable 🙂.

Suggestion, since both current branches are very small expressions that could be copied:

In the comment, it says

an exception can be made for outArg values...

this would be one simple if condition, then it says

Another exception are mutable variables
created for tuple elimination in branching tuple bindings because they are assigned to
exactly once.

This would be another one, the final case is just returning the default case.

Sorry if bikeshedding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does inverting the conditionals / branch looks better

not (a || b)? Hm, I don't feel strongly either way.

Also wondering about repeated calls to v.IsCompilerGenerated

I know, but it needs to be a part of IsMutableStructuralBindingForTupleElement, otherwise If the caller does not also check IsCompilerGenerated, then it could lead to bugs.

So what you're suggesting at the end is this?

if v.IsMutable && not (IsMutableStructuralBindingForTupleElement v) then
    info
elif v.IsMutable && not (v.IsCompilerGenerated && v.DisplayName.StartsWith(PrettyNaming.outArgCompilerGeneratedName)) then
    info
else
    { info with Info = MakeValueInfoForValue g m v info.Info }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerams, thanks a lot for the feedback.

not (a || b)? Hm, I don't feel strongly either way.

if that's the end result, the best (to me) would be if a || b then and inverting the branches expressions then (I don't like "not" for complex conditions, the same way I don't like isNull, I'd prefer isNotNull and not using "not" 🙂).

I know, but it needs to be a part of IsMutableStructuralBindingForTupleElement, otherwise If the caller does not also check IsCompilerGenerated, then it could lead to bugs.

Understood, it fits well with when I was proposing to extract a function, and it makes sense to have the redundant check to properly encapsulate the assumption.

It may also surface as a comment if you can expand on the deeper reason, to avoid someone trying to remove duplicate check to do it hastily.

So what you're suggesting at the end is this?

for the unsuspecting next maintainer dealing with this part (which seems sensitive for tuple elision, but it, hmm, escapes me how it actually works), I feel your proposed edit, with maybe a re-interleaving of the keywords (such as "outArg values", "temp mutable for tuples that can be elided") from the larger text above, would do a great job at improving the very expansive conditional in the code, and shouldn't add runtime cost, just a duplicate branch with info.

So I see potential small wins for the clarity of the code, if this is not bikeshedding and you feel it makes it easier for next person following your journey, I wish to understand better the flow of Val objects and do the work you do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which seems sensitive for tuple elision, but it, hmm, escapes me how it actually works

let a, b =
    if System.Diagnostics.Debugger.IsAttached then
        1, 3
    else
        3, 4
a + b

enters TryRewriteBranchingTupleBinding in the form of

let patternInput = if System.Diagnostics.Debugger.IsAttached then 1, 3 else 3, 4
let b = patternInput.Item2
let a = patternInput.Item1
a + b

and exits as

let mutable patternInput_0_tupleElem = null
let mutable patternInput_1_tupleElem = null

let patternInput =
    if System.Diagnostics.Debugger.IsAttached then
        patternInput_0_tupleElem <- 1
        patternInput_1_tupleElem <- 3
    else
        patternInput_0_tupleElem <- 3
        patternInput_1_tupleElem <- 4

    patternInput_0_tupleElem, patternInput_1_tupleElem

let b = patternInput.Item2
let a = patternInput.Item1
a + b

As you can see, there's still a let binding with a tuple (I copied that part from ExpandStructuralBindingRaw, the 4 lines in the else branch). It's only further on in OptimizeExprOpReductionsAfter and TryOptimizeTupleFieldGet that this tuple is eliminated. If you step through the code, you'll realize it's only possible when the equivalence of patternInput_0_tupleElem and patternInput.Item1 is established based on the former's ExprValueInfo. Had I not changed AddValEqualityInfo to account for this mutable tupleElem, there'd be no equality information (because normally you can't, for obvious reasons, equate a mutable variable with another one) and the result would look like:

public static int z()
{
	int item;
	int item2;
	if (Debugger.IsAttached)
	{
		item = 1;
		item2 = 3;
	}
	else
	{
		item = 3;
		item2 = 4;
	}
	Tuple<int, int> tuple = new Tuple<int, int>(item, item2);
	return tuple.Item1 + tuple.Item2;
}

If, on the other hand, I'd used a regular local in MakeMutableStructuralBindingForTupleElement instead of mkMutableCompGenLocal, patternInput_0_tupleElem would be equated with that initial null (0) forever, and the whole function would through a series of inlining and eliminations be reduced to this:

public static int z()
{
	if (Debugger.IsAttached)
	{
	}
	return 0 + 0;
}

@kerams kerams requested a review from dsyme May 15, 2021 09:48
@Happypig375
Copy link
Member

Bump @dsyme

let [<Literal>] suffixForVariablesThatMayNotBeEliminated = "$cont"

/// Indicates a ValRef generated to facilitate tuple eliminations
let [<Literal>] suffixForTupleElementAssignmentTarget = "tupleElem"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably name this $tupleElem instead of tupleElem because the user can never name a val with a $.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TIHan, it works for me when double-ticked. The check is guarded with IsCompilerGenerated, so it should be fine? outArg does not use $ either. Let me know if you insist and I will of course change it;).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine then. I just want to make sure. :) Thank you for your work on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My pleasure.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good and tests are great.

One minor comment on the suffix name.

@cartermp cartermp merged commit e1c785d into dotnet:main Jun 16, 2021
@kerams kerams deleted the tuples2 branch June 16, 2021 17:39
@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

Apologies for the late second review

I'm concerned about the interaction with closure capture. The code:

let f(a,b,c) =
    let x,y = 
        if c then
            printfn "hello"
            a,b
        else
            b,a
    (fun () -> x + y)

will now promote x and y to be mutable. Closure capture now promotes them to be heap-allocated reference cells. That seems highly problematic, leading to additional allocations.

We need to find a right approach to this that doesn't implicitly do this kind of thing

I will revert this until we've sorted this out.

@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

While investigating I also noticed a significant change in the generated release code for

let f(a,b) =
    let x,y = printfn "hello"; b*a,a*b
    (fun () -> x + y)

Here the compiler eliminates the tuple binding and we end up capturing values names patternInput1 and patternInput2 instead of x and y which may degrade the debug experience. It's not actually wrong and may be unrelated to this change

dsyme pushed a commit to dsyme/fsharp that referenced this pull request Jun 17, 2021
@dsyme dsyme mentioned this pull request Jun 17, 2021
@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

@kerams I suppose the right approach is to change

let x,y = ... e1,e2

to

let mutable x$tmp
let mutable y$tmp
... 
 x$tmp <- e1
 y$tmp <- e2
let x = x$tmp // not mutable
let y = y$tmp // not mutable

I'll see if I can make this change (I had actually thought the code was doing this)

@kerams
Copy link
Contributor Author

kerams commented Jun 17, 2021

@dsyme, I think I'll need to recheck IsMutableStructuralBindingForTupleElement introduced here in order to prevent that mutable capture. Can you point me to the area of the compiler where closures are created?

The second problem was most likely introduced (or rather surfaced in, because you get those pattern inputs in sharplab, without any of these PRs) in #11296, but I'm not sure how to fix that. (And I'm afraid the above will suffer from this too.)

@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

I see, yes, the problem may be simply with IsMutableStructuralBindingForTupleElement

I'll take a deep look in about half an hour. I'd rather not revert of course, we should find a good fix

@kerams
Copy link
Contributor Author

kerams commented Jun 17, 2021

Type of those Vals is ref<int> when they enter IlxGen, so I think we could either unwrap it in GenFreevar if it is a compiler generated tupleElem or somehow make sure the type does not become a ref cell in the first place.

@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

@kerams Have you got time to chat on slack or teams or something?

@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

The capture boxing happens in autobox.fs

I'm trying to understand why IsMutableStructuralBindingForTupleElement was needed

@kerams
Copy link
Contributor Author

kerams commented Jun 17, 2021

It's needed in AddValEqualityInfo to assign equality information even though these vals are mutable. Just saw your fix PR. all good now?

You also found something in #11267, to which I replied.

@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

On the question of the elimination of good user-facing values, yes, we have a problem

This:

let testFunction(a,b) =
    let x,y = printfn "hello"; b*a,a*b
    (fun () -> x + y)

is becoming this:

let mutable patternInput1 = <default>
let mutable patternInput2 = <default>
printfn "hello"; 
patternInput1 <- b*a
patternInput2 <- a*b
let patternInput = (patternInput1, patternInput2)
let x = patternInput1 
let y = patternInput2
(fun () -> x + y)

and the ValValue propagation is indeed being done for let x = patternInput1. It's valid to do it as an optimization but we are getting poor names as a result, and the x and y won't be visible in the debugger.

What happens if we just avoid doing this and have the multiple locals? Does assembly code get any worse?

@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

@kerams I think we should have a rule that whenever we have

let f() =
    ...
    let someCompilerGeneratedThing = expr
    let someUserValue = someCompilerGeneratedThing 
    ...someUserValue...

and we use ValValue information to eliminate someUserValue in favour of someCompilerGeneratedThing then we should adjust the name of someCompilerGeneratedThing to be someUserValue and mark it as not compiler generated.

This is always valid and will always improve the debugging experience for release code

@dsyme
Copy link
Contributor

dsyme commented Jun 17, 2021

@kerams Proposed fix for the second problem is at #11690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants