Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf enhancement in function bodies #415

Closed
1 task
borkdude opened this issue Sep 19, 2020 · 12 comments
Closed
1 task

Perf enhancement in function bodies #415

borkdude opened this issue Sep 19, 2020 · 12 comments

Comments

@borkdude
Copy link
Collaborator

borkdude commented Sep 19, 2020

See joinr@a542ab2.

Master

JVM

$ clj -J-Dclojure.compiler.direct-linking=true
Clojure 1.10.1
user=> (require '[sci.core :as sci] :reload-all)
nil
user=> (time (sci/eval-string "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"))
"Elapsed time: 2274.280637 msecs"
1000000
user=> (time (sci/eval-string "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"))
"Elapsed time: 2221.901377 msecs"
1000000

GraalVM native-image:

===> multitime results
1: tmp/sci-master "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
            Mean        Std.Dev.    Min         Median      Max
real        1.719       0.016       1.690       1.720       1.742
user        1.639       0.016       1.613       1.639       1.659
sys         0.072       0.003       0.067       0.072       0.076

Patch

TODO:

  • tests are not passing

JVM (Faster than master)

sci (joinr-a542ab2fb30e54f117607273339794d5add42f49) $ clj -J-Dclojure.compiler.direct-linking=true
Clojure 1.10.1
user=> (require '[sci.core :as sci] :reload-all)
nil
user=> (time (sci/eval-string "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"))
"Elapsed time: 2072.601562 msecs"
1000000
user=> (time (sci/eval-string "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"))
"Elapsed time: 2001.853154 msecs"
1000000

GraalVM native-image (not significantly faster than master)

===> multitime results
1: tmp/sci-joinr-a542ab2fb30e54f117607273339794d5add42f49 "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
            Mean        Std.Dev.    Min         Median      Max
real        1.696       0.029       1.645       1.695       1.739
user        1.615       0.027       1.571       1.614       1.658
sys         0.074       0.004       0.066       0.074       0.081

cc @joinr

a542ab2fb30e54f117607273339794d5add42f49.patch.txt

borkdude added a commit that referenced this issue Sep 19, 2020
borkdude added a commit that referenced this issue Sep 19, 2020
@borkdude
Copy link
Collaborator Author

@joinr Note that identical? doesn't always yield true for "same" symbols:

user=> (identical? (symbol "&") '&)
false

That change broke some tests.

@joinr
Copy link

joinr commented Sep 19, 2020

Ah, good catch. It's safer on keywords. Also would fail for strings. I don't think that's a big loss though, although analyzing the params to drop the need for that check would be nice.

@borkdude
Copy link
Collaborator Author

borkdude commented Sep 19, 2020

I tried a vector (indexed) based approach in the loop that binds param names to runtime args. I didn't really help.

(defn parse-fn-args+body
  [^clojure.lang.Associative ctx interpret eval-do*
   {:sci.impl/keys [fixed-arity var-arg-name ^clojure.lang.Indexed params body] :as _m}
   fn-name macro? with-meta?]
  (let [min-var-args-arity (when var-arg-name fixed-arity)
        body-count (count body)
        param-count (count params)
        return (if (= 1 body-count)
                 (let [fst (first body)]
                   #(interpret % fst))
                 #(eval-do* % body))
        f (fn run-fn [& args]
            (let [;; tried making bindings a transient, but saw no perf improvement (see #246)
                  bindings (.get ^java.util.Map ctx :bindings)
                  arg-count (count args)
                  ;;max-idx (dec arg-count)
                  args ^clojure.lang.Indexed (vec args)
                  bindings
                  (loop [idx 0
                         ret bindings]
                    (if (< idx param-count)
                      (let [fp (try (.nth params idx)
                                    (catch Exception _
                                      (throw-arity fn-name macro? args)))]
                        (if (= '& fp)
                          (assoc ret (try (.nth params (inc idx))
                                          (catch Exception _e
                                            (throw-arity fn-name macro? args)))
                                 (subvec args idx))
                          (recur (inc idx)
                                 (assoc ret fp (try (.nth args idx)
                                                    (catch Exception _e
                                                      (throw-arity fn-name macro? args)))))))
                      (do
                        (when (> arg-count idx)
                          (throw-arity fn-name macro? args))
                        ret)))
                  ctx (#?(:clj .assoc
                          :cljs -assoc) ctx :bindings bindings)
                  ret (return ctx)
                  ;; m (meta ret)
                  recur? (instance? Recur ret)]
              (if recur?
                (let [recur-val (t/getVal ret)]
                  (if min-var-args-arity
                    (let [[fixed-args [rest-args]]
                          [(subvec recur-val 0 min-var-args-arity)
                           (subvec recur-val min-var-args-arity)]]
                      (recur (into fixed-args rest-args)))
                    (recur recur-val)))
                ret)))]
    (if with-meta?
      (with-meta
        f
        (if min-var-args-arity
          {:sci.impl/min-var-args-arity min-var-args-arity}
          {:sci.impl/fixed-arity fixed-arity}))
      f)))

@borkdude
Copy link
Collaborator Author

borkdude commented Sep 19, 2020

@joinr I think I applied most of your patches to the fns.cljc namespace on master. I didn't really see a speed-up with the fast-second , probably because that path is only relevant to var-args fns, so I have to write a test case for that.

@borkdude
Copy link
Collaborator Author

borkdude commented Sep 19, 2020

Some thoughts:

  1. It might already be safe to emit functions from analyzer (instead of eval-ing them in interpreter.cljc) if the function isn't a closure (the body only depends on args or vars, but not bindings outside of its body)
  2. Making bindings mutable (a ThreadLocal stack with mutable binding maps?) might make all functions safe to emit from the analyzer since they can simply be closures over the context and when run, push bindings onto the mutable bindings thing, and, at the end of their bodies, pop those. (see Consider mutable bindings map for perf #416)

@joinr
Copy link

joinr commented Sep 20, 2020

regard 1., I think that's good. 2 is interesting. I was wondering if we actually needed a persistent map for bindings, or if that was there for convenience. If not, then it's plausible that you follow more standard stack frame stuff like you mentioned. That would likely be faster. There's probably quit a bit of improvement in just pushing args into the right place efficiently.

@borkdude
Copy link
Collaborator Author

Yes, the bindings map is just convenience, no other reason.

@joinr
Copy link

joinr commented Sep 20, 2020

Note: some of these ideas may incur obvious portability concerns. I have a myopic bias toward jvm/native-image in this case; I'm not really concerned with cljs. That could be a broader consideration as well (might have missing analogues in cljs).

@borkdude
Copy link
Collaborator Author

I think using a mutable map / stack in JS is of no concern since it's single threaded.

@borkdude
Copy link
Collaborator Author

borkdude commented Dec 19, 2020

@joinr I made changes to fns.cljc. The above example is now almost twice as fast in GraalVM on current master:

$ time ./sci "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000
./sci    0.94s  user 0.09s system 99% cpu 1.035 total

This is not only due to changes in fns.cljc, but there were additional performance enhancements made.

What I mainly did in fns.cljc:

Pre-create a per-arity function (with known params because the arity is known) that uses nth on the params. This gave significant speed up compared to using first and rest (about 10% for the loop example).

@joinr
Copy link

joinr commented Dec 20, 2020

Very nice. Might be able to check to see if direct method dispatch to a hinted .nth gets you even more.

@borkdude
Copy link
Collaborator Author

@joinr Indeed, I'm using:

(defmacro nth-2
  [c i]
  (?
   :clj `(.nth ~(with-meta c {:tag 'clojure.lang.Indexed}) ~i)
   :cljs `(~'-nth ~c ~i))) 

to inline these things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants