Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] large number of duplicated hard/soft references kills the algorithm #2036

Closed
0x53A opened this issue Jul 25, 2018 · 10 comments · Fixed by #2156
Closed

[PERF] large number of duplicated hard/soft references kills the algorithm #2036

0x53A opened this issue Jul 25, 2018 · 10 comments · Fixed by #2156
Labels

Comments

@0x53A
Copy link
Contributor

0x53A commented Jul 25, 2018

EDIT:

Repro is here:

https://github.com/0x53A/fake-loop

You can increase the complexity in

let t_targets =
    TargetGroup [
        for i in 0 .. 9 ->
            Target.create (sprintf "Target_%i" i) ignore
    ]

with 8 it takes 3 seconds, with 9, it takes 16 seconds for me. 10 didn't finish after a few minutes.

You can fix it by setting

let degradeAlgo = false

in _targetEx.fsx, then it will deduplicate the references on the script side.

@0x53A 0x53A changed the title Infinite? loop in runOrDefaultWithArguments Infinite? loop in runOrDefaultWithArguments -> visitDependenciesAux Jul 25, 2018
@0x53A 0x53A changed the title Infinite? loop in runOrDefaultWithArguments -> visitDependenciesAux large number of duplicated hard/soft references kills the algorithm Jul 25, 2018
@matthid
Copy link
Member

matthid commented Jul 28, 2018

Yes we at work have a use-case as well where we dynamically create targets and indeed we noticed a performance degradation where fake calculates something for like a minute before starting the script (with like >50 targets) so any fix/pr is welcome ;). This is just another case on top of that but yes the stuff is way slower than it needs to be - I guess there are faster algorithms available...

@matthid matthid changed the title large number of duplicated hard/soft references kills the algorithm [PERF] large number of duplicated hard/soft references kills the algorithm Oct 13, 2018
@matthid
Copy link
Member

matthid commented Oct 14, 2018

Hm I'm not sure I fixed this with the linked commit (not sure why) but I cannot reproduce this with the following test:

type TestTarget =
| TestTarget of string
| TestTargetGroup of TestTarget list
    member x.TargetNames = seq {
        match x with
        | TestTarget s -> yield s
        | TestTargetGroup g -> yield! g |> Seq.collect (fun g -> g.TargetNames)
    }

module TestTarget =
    let create name body =
        Target.create name body
        TestTarget name

//test

    Fake.ContextHelper.fakeContextTestCase "basic performance #2036" <| fun _ ->
        // Increase counter to 1000  and run with dotTrace
        let counter = 1000

        let all_Pipelines = Dictionary<string,TestTarget>(System.StringComparer.OrdinalIgnoreCase :> IEqualityComparer<string>)

        let soft = HashSet()
        let hard = HashSet()
        
        // set this to TRUE to add duplicated references to FAKE.
        let degradeAlgo = true

        /// The last target is the name of the pipeline
        let SetPipelineRelations (targets:TestTarget list) : unit =
            let targetNames = targets |> Seq.collect (fun t -> t.TargetNames)
            let last = targetNames |> Seq.last
            all_Pipelines.[last] <- TestTargetGroup targets
            for (a,b) in targetNames |> Seq.pairwise do
                if soft.Add(a,b) || degradeAlgo then
                    a ?=> b |> ignore
                if hard.Add(a,last) || degradeAlgo then
                    a ==> last |> ignore

        let CreatePipeline name (targets: TestTarget list) : unit =    
            let p = TestTarget.create name ignore
            SetPipelineRelations [ yield (TestTargetGroup targets); yield p ]

        let t_targets =
            TestTargetGroup [
                for i in 0 .. counter ->
                    TestTarget.create (sprintf "Target_%i" i) ignore
            ]
            
        [ t_targets ] |> CreatePipeline "Run.1"
        Target.runOrDefaultWithArguments "Run.1"

which is running in 2 seconds. Unfortunately looking at the code:

    let internal checkIfDependencyCanBeAddedCore fGetDependencies targetName dependentTargetName =
        let target = get targetName
        let dependentTarget = get dependentTargetName

        let rec checkDependencies dependentTarget =
              fGetDependencies dependentTarget
              |> List.iter (fun dep ->
                   if String.Equals(dep, targetName, StringComparison.OrdinalIgnoreCase) then
                      failwithf "Cyclic dependency between %s and %s" targetName dependentTarget.Name
                   checkDependencies (get dep))

        checkDependencies dependentTarget
        target,dependentTarget

I still think this is not really optimal because if might go through the same targets multiple times but if I cannot reproduce I'll probably leave it unchanged rather than introducing bugs.

@matthid
Copy link
Member

matthid commented Oct 14, 2018

With let counter = 10000 we get a stackoverflow which might be worth fixing...

@0x53A
Copy link
Contributor Author

0x53A commented Oct 14, 2018

Edit: ah, the issue is not duplicated refs, but just a large number, so that doesn't help. Sorry.


I peeked at the PR, and this still adds the duplicated references, right? It just improves the scanning algorithm.

The list could be replaced with a HashSet of string*string (or a Set to keep it immutable), that should fix the issue at the root.

@matthid
Copy link
Member

matthid commented Oct 14, 2018

@0x53A Thanks for the tip, yes there are a lot of ways to improve this. Everything is implemented in a very very naive way. I'm trying to figure out what you mean after I managed to make the stuff tail recursive (to no longer stack-overflow)

Any idea why

        let rec visitDependenciesAux fGetDependencies previousDependencies = function
            | (visited, level, targetName) :: workLeft ->
                let target = get targetName
                let isVisited =
                    visited
                    |> Seq.exists (fun t -> String.Equals(t, targetName, StringComparison.OrdinalIgnoreCase))
                if isVisited then
                    visitDependenciesAux fGetDependencies previousDependencies workLeft
                else
                    let deps =
                        fGetDependencies target
                        |> Seq.map (fun (t) -> (String.toLower targetName::visited), level + 1, t)
                        |> Seq.toList
                    let newVisitedDeps = target :: previousDependencies
                    visitDependenciesAux fGetDependencies newVisitedDeps (deps @ workLeft)
            | _ -> previousDependencies |> List.rev           

Is not considered tail recursive by the compiler (I hate that part of F#)

@matthid
Copy link
Member

matthid commented Oct 14, 2018

It works when I remove the fGetDependencies parameter and write the code directly into the body. I have no idea why, but hey it works

@0x53A
Copy link
Contributor Author

0x53A commented Oct 14, 2018

No clue. F# really needs a tail-rec keyword so the compiler errors out hard when it can't compile it tail recursive.

You could disassemble both variants and compare them. Also, afaik the jitter sometimes ignores the .tail prefix, so maybe it wasn't even F#s fault ...


Async has trampolining, right?

So another quick-fix with a slight performance penalty could have been to just wrap it in async { } and do a let! binding at the recursion point.

Then execute it with RunSynchronously.

I'm not 100% sure, but I think that should also take care of stackoverflows.

@matthid
Copy link
Member

matthid commented Oct 14, 2018

@0x53A Just to confirm. Is that what you suggest (ie check the data before adding stuff)?

    /// Adds the dependency to the front of the list of dependencies.
    /// [omit]
    let internal dependencyAtFront targetName dependentTargetName =
        let target,_ = checkIfDependencyCanBeAdded targetName dependentTargetName

        let hasDependency =
           target.Dependencies
           |> Seq.exists (fun d -> String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase))
        if not hasDependency then
            getTargetDict().[targetName] <- 
                { target with 
                    Dependencies = dependentTargetName :: target.Dependencies
                    SoftDependencies =
                        target.SoftDependencies
                        |> List.filter (fun d -> not (String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase)))
                }

    /// Appends the dependency to the list of soft dependencies.
    /// [omit]
    let internal softDependencyAtFront targetName dependentTargetName =
        let target,_ = checkIfDependencyCanBeAdded targetName dependentTargetName

        let hasDependency =
           target.Dependencies
           |> Seq.exists (fun d -> String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase))
        let hasSoftDependency =
           target.SoftDependencies
           |> Seq.exists (fun d -> String.Equals(d, dependentTargetName, StringComparison.OrdinalIgnoreCase))
        match hasDependency, hasSoftDependency with
        | true, _ -> ()
        | false, true -> ()
        | false, false ->
            getTargetDict().[targetName] <- { target with SoftDependencies = dependentTargetName :: target.SoftDependencies }

@matthid
Copy link
Member

matthid commented Oct 14, 2018

At least there seems to be no more significant difference between let degradeAlgo = false and true anymore, but it's still quite slow (at least for 10000 ~> 1min ) but I think that is enough for a first improvement. Also it is a bit unusual in regular use-cases to have such a dense graph

@0x53A
Copy link
Contributor Author

0x53A commented Oct 14, 2018

Yes that is what I meant.


This looks like a great improvement, in my original repro just 10 killed it, and now 10000 is (barely) usable.

Thank you!

matthid added a commit that referenced this issue Oct 14, 2018
* Fixes various Stackoverflows by making stuff tail recursive
* fix traverse algorithms by using visited hashsets
* Make sure to skip duplicated dependencies
* Other improvements seen in the profiler (ie String.Equals instead of toLower)

Fixes #2036
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants