diff --git a/CHANGELOG.md b/CHANGELOG.md index 16b04bb..cb1ab6e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,10 @@ # Change Log All notable changes to this project will be documented in this file. This change log follows the conventions of [keepachangelog.com](http://keepachangelog.com/). +## [0.1.0-alpha.24] - 2021-10-23 +- improve linter +- update docs + ## [0.1.0-alpha.23] - 2021-10-22 - enable `cat` - fix cljs diff --git a/README.md b/README.md index 30eba5c..d604e56 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Clojure's [threading macros](https://clojure.org/guides/threading_macros) (the ` Place the following in the `:deps` map of your `deps.edn` file: ```clojure ... - net.clojars.john/injest {:mvn/version "0.1.0-alpha.23"} + net.clojars.john/injest {:mvn/version "0.1.0-alpha.24"} ... ``` ### clj-kondo @@ -30,7 +30,7 @@ To try it in a repl right now with `criterium` and `net.cgrand.xforms`, drop thi ```clojure clj -Sdeps \ '{:deps - {net.clojars.john/injest {:mvn/version "0.1.0-alpha.23"} + {net.clojars.john/injest {:mvn/version "0.1.0-alpha.24"} criterium/criterium {:mvn/version "0.4.6"} net.cgrand/xforms {:mvn/version "0.19.2"}}}' ``` @@ -273,45 +273,8 @@ Just over 6 seconds. Much better. Now let's try the parallel `=>>` version: Just over 3 seconds. Much, much better! Again, in local dev your times may look a bit different. On my Macbook Pro here, those times are `11812.604504`, `5096.267348` and `933.940569` msecs. So, in other words, `=>>` can sometimes be 5 times faster than `x>>` and 10 times faster than `->>`, depending on the shape of your workloads and the number of cores you have available. -### `|>>` Parallel Pipeline -`injest` also provides `|>>` - a parallel, transducing thread macro based on Clojure's [`pipeline`](https://clojuredocs.org/clojure.core.async/pipeline). In general, `=>>` should be preferred for most workloads, but `|>>` is available for edge cases where it is more efficient. -Instead of dividing work into execution groups, a parallelization value of 2 plus the number of available cores are passed to `pipeline` and `core.async` manages everything else under the hood. The thread-overhead costs for `|>>` are different than `=>>` though, so only use it on sequences with heavy workloads. -```clojure -(|>> (range 100) - (repeat 10) - (map x>>work) - (map x>>work) - (map x>>work) - (map x>>work) - (map x>>work) - (map x>>work) - last - count - time) -; "Elapsed time: 3057.507032 msecs" -;|> 234 -``` -`|>>` actually beat out `=>>` here, but `=>>` usually wins - your milage may vary. Whatever you do, don't use `|>>` on massive sequences with very small workloads on each item. This causes a traffic jam: -```clojure -;; don't run this -(|>> (range 10000000) - (map inc) - (filter odd?) - (mapcat #(do [% (dec %)])) - (partition-by #(= 0 (mod % 5))) - (map (partial apply +)) - (map (partial + 10)) - (map #(do {:temp-value %})) - (map :temp-value) - (filter even?) - (apply +) - time) -;; takes 3 minutes :/ -``` -Whereas `=>>` will complete in about 10 seconds. Worse than `x>>` for the same sequence and workload, but at least it's within the ballpark of usability. And `=>>` just has better execution semantics when used in chains with other transducers. So use `|>>` with caution and lots of repl'ing. - -For a more in depth comparative analysis of `|>>` and `=>>` check out the [shootout](https://github.com/johnmn3/injest/blob/main/docs/shootout.md) docs. +> There is also a parallel thread macro (`|>>`) that uses `core.async/pipeline` for parallelization. It's still available for folks interested in improving it, but is not recomended for general use. `=>>` performs better in most cases. A soon-to-be-updated analysis ([shootout.md](https://github.com/johnmn3/injest/blob/main/docs/shootout.md)) compares the differences between `|>>` and `=>>`. ## Clojurescript In Clojurescript we don't yet have parallel thread macro implementations but for `x>>` the performance gains are even more pronounced than in Clojure. On my macbook pro, with an initial value of `(range 1000000)` in the above thread from our first example, the default threading macro `->>` produces: ```clojure @@ -379,9 +342,9 @@ Even better! ``` In Clojurescript, you can add another Clojure (`*.clj`) namespace to your project and register there with the `regxf!` function and explicitly namespaced symbols. ```clojure -(i.s/regxf! 'net.cgrand.xforms/reduce) +(i.s/regxf! 'my.cljs.xforms.library/sliding-window) ``` -Or, if a transducer library like `net.cgrand.xforms` exports the same namespaces and names for both Clojure and Clojurescript, you can just `(i.s/reg-xf! x/reduce)` in a Clojure namespace in your project and then it will be available to the `x>>` threads in both your Clojure and Clojurescript namespaces. +Or, if a transducer library like `net.cgrand.xforms` exports the same namespaces and names for both Clojure and Clojurescript, you can just `(i.s/reg-xf! x/reduce)` in a Clojure namespace in your project and then it will be available to the `x>>`/`=>>` threads in both your Clojure and Clojurescript namespaces. # Caveats It should be noted as well: diff --git a/build.clj b/build.clj index 2e3f75f..a1d0872 100644 --- a/build.clj +++ b/build.clj @@ -3,7 +3,7 @@ (:require [org.corfield.build :as bb])) (def lib 'net.clojars.john/injest) -(def version "0.1.0-alpha.23") +(def version "0.1.0-alpha.24") ;; clojure -T:build ci ;; clojure -T:build deploy diff --git a/src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/config.edn b/src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/config.edn index 2463799..56c5f43 100644 --- a/src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/config.edn +++ b/src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/config.edn @@ -1,8 +1,28 @@ -{:lint-as {injest.path/+> clojure.core/-> +{:lint-as {injest.core/x> clojure.core/-> + injest.core/x>> clojure.core/->> + injest.core/=> clojure.core/-> + injest.core/=>> clojure.core/->> + injest.core/|> clojure.core/-> + injest.core/|>> clojure.core/->> + + injest.path/+> clojure.core/-> injest.path/+>> clojure.core/->> - injest.path/x> clojure.core/-> + injest.path/x> clojure.core/-> injest.path/x>> clojure.core/->> - injest.path/=> clojure.core/-> + injest.path/=> clojure.core/-> injest.path/=>> clojure.core/->> - injest.path/|> clojure.core/-> - injest.path/|>> clojure.core/->>}} + injest.path/|> clojure.core/-> + injest.path/|>> clojure.core/->>} + + :hooks {:macroexpand {injest.path/+> injest.path/+> + injest.path/+>> injest.path/+>> + injest.path/x> injest.path/+> + injest.path/x>> injest.path/+>> + injest.path/=> injest.path/+> + injest.path/=>> injest.path/+>>}} + + :linters {:injest.path/+> {:level :error} + :injest.path/+>> {:level :error} + :unused-binding {:level :off}} + + } diff --git a/src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/injest/path.clj b/src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/injest/path.clj new file mode 100644 index 0000000..cf2af1c --- /dev/null +++ b/src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/injest/path.clj @@ -0,0 +1,42 @@ +(ns injest.path) + +(def protected-fns #{`fn 'fn 'fn* 'partial}) + +(defn get-or-nth [m-or-v aval] + (if (associative? m-or-v) + (get m-or-v aval) + (nth m-or-v aval))) + +(defn path-> [form x] + (cond (and (seq? form) (not (protected-fns (first form)))) + (with-meta `(~(first form) ~x ~@(next form)) (meta form)) + (or (string? form) (nil? form) (boolean? form)) + (list x form) + (int? form) + (list 'injest.path/get-or-nth x form) + :else + (list form x))) + +(defn path->> [form x] + (cond (and (seq? form) (not (protected-fns (first form)))) + (with-meta `(~(first form) ~@(next form) ~x) (meta form)) + (or (string? form) (nil? form) (boolean? form)) + (list x form) + (int? form) + (list 'injest.path/get-or-nth x form) + :else + (list form x))) + +(defn +> + [x & forms] + (loop [x x, forms forms] + (if forms + (recur (path-> (first forms) x) (next forms)) + x))) + +(defn +>> + [x & forms] + (loop [x x, forms forms] + (if forms + (recur (path->> (first forms) x) (next forms)) + x))) \ No newline at end of file