[PERF] large number of duplicated hard/soft references kills the algorithm #2036

0x53A · 2018-07-25T14:33:35Z

EDIT:

Repro is here:

https://github.com/0x53A/fake-loop

You can increase the complexity in

let t_targets =
    TargetGroup [
        for i in 0 .. 9 ->
            Target.create (sprintf "Target_%i" i) ignore
    ]

with 8 it takes 3 seconds, with 9, it takes 16 seconds for me. 10 didn't finish after a few minutes.

You can fix it by setting

let degradeAlgo = false

in _targetEx.fsx, then it will deduplicate the references on the script side.

The text was updated successfully, but these errors were encountered:

matthid · 2018-07-28T20:18:54Z

Yes we at work have a use-case as well where we dynamically create targets and indeed we noticed a performance degradation where fake calculates something for like a minute before starting the script (with like >50 targets) so any fix/pr is welcome ;). This is just another case on top of that but yes the stuff is way slower than it needs to be - I guess there are faster algorithms available...

matthid · 2018-10-14T11:34:35Z

Hm I'm not sure I fixed this with the linked commit (not sure why) but I cannot reproduce this with the following test:

type TestTarget =
| TestTarget of string
| TestTargetGroup of TestTarget list
    member x.TargetNames = seq {
        match x with
        | TestTarget s -> yield s
        | TestTargetGroup g -> yield! g |> Seq.collect (fun g -> g.TargetNames)
    }

module TestTarget =
    let create name body =
        Target.create name body
        TestTarget name

//test

    Fake.ContextHelper.fakeContextTestCase "basic performance #2036" <| fun _ ->
        // Increase counter to 1000  and run with dotTrace
        let counter = 1000

        let all_Pipelines = Dictionary<string,TestTarget>(System.StringComparer.OrdinalIgnoreCase :> IEqualityComparer<string>)

        let soft = HashSet()
        let hard = HashSet()
        
        // set this to TRUE to add duplicated references to FAKE.
        let degradeAlgo = true

        /// The last target is the name of the pipeline
        let SetPipelineRelations (targets:TestTarget list) : unit =
            let targetNames = targets |> Seq.collect (fun t -> t.TargetNames)
            let last = targetNames |> Seq.last
            all_Pipelines.[last] <- TestTargetGroup targets
            for (a,b) in targetNames |> Seq.pairwise do
                if soft.Add(a,b) || degradeAlgo then
                    a ?=> b |> ignore
                if hard.Add(a,last) || degradeAlgo then
                    a ==> last |> ignore

        let CreatePipeline name (targets: TestTarget list) : unit =    
            let p = TestTarget.create name ignore
            SetPipelineRelations [ yield (TestTargetGroup targets); yield p ]

        let t_targets =
            TestTargetGroup [
                for i in 0 .. counter ->
                    TestTarget.create (sprintf "Target_%i" i) ignore
            ]
            
        [ t_targets ] |> CreatePipeline "Run.1"
        Target.runOrDefaultWithArguments "Run.1"

which is running in 2 seconds. Unfortunately looking at the code:

    let internal checkIfDependencyCanBeAddedCore fGetDependencies targetName dependentTargetName =
        let target = get targetName
        let dependentTarget = get dependentTargetName

        let rec checkDependencies dependentTarget =
              fGetDependencies dependentTarget
              |> List.iter (fun dep ->
                   if String.Equals(dep, targetName, StringComparison.OrdinalIgnoreCase) then
                      failwithf "Cyclic dependency between %s and %s" targetName dependentTarget.Name
                   checkDependencies (get dep))

        checkDependencies dependentTarget
        target,dependentTarget

I still think this is not really optimal because if might go through the same targets multiple times but if I cannot reproduce I'll probably leave it unchanged rather than introducing bugs.

matthid · 2018-10-14T11:36:11Z

With let counter = 10000 we get a stackoverflow which might be worth fixing...

0x53A · 2018-10-14T12:21:54Z

Edit: ah, the issue is not duplicated refs, but just a large number, so that doesn't help. Sorry.

I peeked at the PR, and this still adds the duplicated references, right? It just improves the scanning algorithm.

The list could be replaced with a HashSet of string*string (or a Set to keep it immutable), that should fix the issue at the root.

matthid · 2018-10-14T13:00:55Z

@0x53A Thanks for the tip, yes there are a lot of ways to improve this. Everything is implemented in a very very naive way. I'm trying to figure out what you mean after I managed to make the stuff tail recursive (to no longer stack-overflow)

Any idea why

        let rec visitDependenciesAux fGetDependencies previousDependencies = function
            | (visited, level, targetName) :: workLeft ->
                let target = get targetName
                let isVisited =
                    visited
                    |> Seq.exists (fun t -> String.Equals(t, targetName, StringComparison.OrdinalIgnoreCase))
                if isVisited then
                    visitDependenciesAux fGetDependencies previousDependencies workLeft
                else
                    let deps =
                        fGetDependencies target
                        |> Seq.map (fun (t) -> (String.toLower targetName::visited), level + 1, t)
                        |> Seq.toList
                    let newVisitedDeps = target :: previousDependencies
                    visitDependenciesAux fGetDependencies newVisitedDeps (deps @ workLeft)
            | _ -> previousDependencies |> List.rev

Is not considered tail recursive by the compiler (I hate that part of F#)

matthid · 2018-10-14T13:05:52Z

It works when I remove the fGetDependencies parameter and write the code directly into the body. I have no idea why, but hey it works

0x53A · 2018-10-14T13:11:19Z

No clue. F# really needs a tail-rec keyword so the compiler errors out hard when it can't compile it tail recursive.

You could disassemble both variants and compare them. Also, afaik the jitter sometimes ignores the .tail prefix, so maybe it wasn't even F#s fault ...

Async has trampolining, right?

So another quick-fix with a slight performance penalty could have been to just wrap it in async { } and do a let! binding at the recursion point.

Then execute it with RunSynchronously.

I'm not 100% sure, but I think that should also take care of stackoverflows.

matthid · 2018-10-14T13:42:16Z

@0x53A Just to confirm. Is that what you suggest (ie check the data before adding stuff)?

    /// Adds the dependency to the front of the list of dependencies.
    /// [omit]
    let internal dependencyAtFront targetName dependentTargetName =
        let target,_ = checkIfDependencyCanBeAdded targetName dependentTargetName

        let hasDependency =
           target.Dependencies
           |> Seq.exists (fun d -> String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase))
        if not hasDependency then
            getTargetDict().[targetName] <- 
                { target with 
                    Dependencies = dependentTargetName :: target.Dependencies
                    SoftDependencies =
                        target.SoftDependencies
                        |> List.filter (fun d -> not (String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase)))
                }

    /// Appends the dependency to the list of soft dependencies.
    /// [omit]
    let internal softDependencyAtFront targetName dependentTargetName =
        let target,_ = checkIfDependencyCanBeAdded targetName dependentTargetName

        let hasDependency =
           target.Dependencies
           |> Seq.exists (fun d -> String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase))
        let hasSoftDependency =
           target.SoftDependencies
           |> Seq.exists (fun d -> String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase))
        match hasDependency, hasSoftDependency with
        | true, _ -> ()
        | false, true -> ()
        | false, false ->
            getTargetDict().[targetName] <- { target with SoftDependencies = dependentTargetName :: target.SoftDependencies }

matthid · 2018-10-14T13:51:18Z

At least there seems to be no more significant difference between let degradeAlgo = false and true anymore, but it's still quite slow (at least for 10000 ~> 1min ) but I think that is enough for a first improvement. Also it is a bit unusual in regular use-cases to have such a dense graph

0x53A · 2018-10-14T13:53:47Z

Yes that is what I meant.

This looks like a great improvement, in my original repro just 10 killed it, and now 10000 is (barely) usable.

Thank you!

* Fixes various Stackoverflows by making stuff tail recursive * fix traverse algorithms by using visited hashsets * Make sure to skip duplicated dependencies * Other improvements seen in the profiler (ie String.Equals instead of toLower) Fixes #2036

0x53A changed the title ~~Infinite? loop in runOrDefaultWithArguments~~ Infinite? loop in runOrDefaultWithArguments -> visitDependenciesAux Jul 25, 2018

0x53A changed the title ~~Infinite? loop in runOrDefaultWithArguments -> visitDependenciesAux~~ large number of duplicated hard/soft references kills the algorithm Jul 25, 2018

matthid added help wanted bug labels Aug 4, 2018

matthid changed the title ~~large number of duplicated hard/soft references kills the algorithm~~ [PERF] large number of duplicated hard/soft references kills the algorithm Oct 13, 2018

matthid added a commit that referenced this issue Oct 13, 2018

Start to fix some perf problems, references #2036

5de5378

matthid mentioned this issue Oct 14, 2018

Fix perf of Fake.Core.Target #2156

Merged

matthid removed the help wanted label Oct 14, 2018

matthid closed this as completed in #2156 Oct 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF] large number of duplicated hard/soft references kills the algorithm #2036

[PERF] large number of duplicated hard/soft references kills the algorithm #2036

0x53A commented Jul 25, 2018 •

edited

Loading

matthid commented Jul 28, 2018

matthid commented Oct 14, 2018

matthid commented Oct 14, 2018

0x53A commented Oct 14, 2018 •

edited

Loading

matthid commented Oct 14, 2018

matthid commented Oct 14, 2018

0x53A commented Oct 14, 2018

matthid commented Oct 14, 2018

matthid commented Oct 14, 2018

0x53A commented Oct 14, 2018

[PERF] large number of duplicated hard/soft references kills the algorithm #2036

[PERF] large number of duplicated hard/soft references kills the algorithm #2036

Comments

0x53A commented Jul 25, 2018 • edited Loading

matthid commented Jul 28, 2018

matthid commented Oct 14, 2018

matthid commented Oct 14, 2018

0x53A commented Oct 14, 2018 • edited Loading

matthid commented Oct 14, 2018

matthid commented Oct 14, 2018

0x53A commented Oct 14, 2018

matthid commented Oct 14, 2018

matthid commented Oct 14, 2018

0x53A commented Oct 14, 2018

0x53A commented Jul 25, 2018 •

edited

Loading

0x53A commented Oct 14, 2018 •

edited

Loading