Skip to content

Commit

Permalink
Improve linter and update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
johnmn3 committed Oct 23, 2021
1 parent 8e2a367 commit 9fa4ee5
Show file tree
Hide file tree
Showing 5 changed files with 77 additions and 48 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Change Log
All notable changes to this project will be documented in this file. This change log follows the conventions of [keepachangelog.com](http://keepachangelog.com/).

## [0.1.0-alpha.24] - 2021-10-23
- improve linter
- update docs

## [0.1.0-alpha.23] - 2021-10-22
- enable `cat`
- fix cljs
Expand Down
47 changes: 5 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Clojure's [threading macros](https://clojure.org/guides/threading_macros) (the `
Place the following in the `:deps` map of your `deps.edn` file:
```clojure
...
net.clojars.john/injest {:mvn/version "0.1.0-alpha.23"}
net.clojars.john/injest {:mvn/version "0.1.0-alpha.24"}
...
```
### clj-kondo
Expand All @@ -30,7 +30,7 @@ To try it in a repl right now with `criterium` and `net.cgrand.xforms`, drop thi
```clojure
clj -Sdeps \
'{:deps
{net.clojars.john/injest {:mvn/version "0.1.0-alpha.23"}
{net.clojars.john/injest {:mvn/version "0.1.0-alpha.24"}
criterium/criterium {:mvn/version "0.4.6"}
net.cgrand/xforms {:mvn/version "0.19.2"}}}'
```
Expand Down Expand Up @@ -273,45 +273,8 @@ Just over 6 seconds. Much better. Now let's try the parallel `=>>` version:
Just over 3 seconds. Much, much better!

Again, in local dev your times may look a bit different. On my Macbook Pro here, those times are `11812.604504`, `5096.267348` and `933.940569` msecs. So, in other words, `=>>` can sometimes be 5 times faster than `x>>` and 10 times faster than `->>`, depending on the shape of your workloads and the number of cores you have available.
### `|>>` Parallel Pipeline
`injest` also provides `|>>` - a parallel, transducing thread macro based on Clojure's [`pipeline`](https://clojuredocs.org/clojure.core.async/pipeline). In general, `=>>` should be preferred for most workloads, but `|>>` is available for edge cases where it is more efficient.

Instead of dividing work into execution groups, a parallelization value of 2 plus the number of available cores are passed to `pipeline` and `core.async` manages everything else under the hood. The thread-overhead costs for `|>>` are different than `=>>` though, so only use it on sequences with heavy workloads.
```clojure
(|>> (range 100)
(repeat 10)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
(map x>>work)
last
count
time)
; "Elapsed time: 3057.507032 msecs"
;|> 234
```
`|>>` actually beat out `=>>` here, but `=>>` usually wins - your milage may vary. Whatever you do, don't use `|>>` on massive sequences with very small workloads on each item. This causes a traffic jam:
```clojure
;; don't run this
(|>> (range 10000000)
(map inc)
(filter odd?)
(mapcat #(do [% (dec %)]))
(partition-by #(= 0 (mod % 5)))
(map (partial apply +))
(map (partial + 10))
(map #(do {:temp-value %}))
(map :temp-value)
(filter even?)
(apply +)
time)
;; takes 3 minutes :/
```
Whereas `=>>` will complete in about 10 seconds. Worse than `x>>` for the same sequence and workload, but at least it's within the ballpark of usability. And `=>>` just has better execution semantics when used in chains with other transducers. So use `|>>` with caution and lots of repl'ing.

For a more in depth comparative analysis of `|>>` and `=>>` check out the [shootout](https://github.com/johnmn3/injest/blob/main/docs/shootout.md) docs.
> There is also a parallel thread macro (`|>>`) that uses `core.async/pipeline` for parallelization. It's still available for folks interested in improving it, but is not recomended for general use. `=>>` performs better in most cases. A soon-to-be-updated analysis ([shootout.md](https://github.com/johnmn3/injest/blob/main/docs/shootout.md)) compares the differences between `|>>` and `=>>`.
## Clojurescript
In Clojurescript we don't yet have parallel thread macro implementations but for `x>>` the performance gains are even more pronounced than in Clojure. On my macbook pro, with an initial value of `(range 1000000)` in the above thread from our first example, the default threading macro `->>` produces:
```clojure
Expand Down Expand Up @@ -379,9 +342,9 @@ Even better!
```
In Clojurescript, you can add another Clojure (`*.clj`) namespace to your project and register there with the `regxf!` function and explicitly namespaced symbols.
```clojure
(i.s/regxf! 'net.cgrand.xforms/reduce)
(i.s/regxf! 'my.cljs.xforms.library/sliding-window)
```
Or, if a transducer library like `net.cgrand.xforms` exports the same namespaces and names for both Clojure and Clojurescript, you can just `(i.s/reg-xf! x/reduce)` in a Clojure namespace in your project and then it will be available to the `x>>` threads in both your Clojure and Clojurescript namespaces.
Or, if a transducer library like `net.cgrand.xforms` exports the same namespaces and names for both Clojure and Clojurescript, you can just `(i.s/reg-xf! x/reduce)` in a Clojure namespace in your project and then it will be available to the `x>>`/`=>>` threads in both your Clojure and Clojurescript namespaces.
# Caveats
It should be noted as well:

Expand Down
2 changes: 1 addition & 1 deletion build.clj
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
(:require [org.corfield.build :as bb]))

(def lib 'net.clojars.john/injest)
(def version "0.1.0-alpha.23")
(def version "0.1.0-alpha.24")

;; clojure -T:build ci
;; clojure -T:build deploy
Expand Down
30 changes: 25 additions & 5 deletions src/clj-kondo/clj-kondo.exports/net.clojars.john/injest/config.edn
Original file line number Diff line number Diff line change
@@ -1,8 +1,28 @@
{:lint-as {injest.path/+> clojure.core/->
{:lint-as {injest.core/x> clojure.core/->
injest.core/x>> clojure.core/->>
injest.core/=> clojure.core/->
injest.core/=>> clojure.core/->>
injest.core/|> clojure.core/->
injest.core/|>> clojure.core/->>

injest.path/+> clojure.core/->
injest.path/+>> clojure.core/->>
injest.path/x> clojure.core/->
injest.path/x> clojure.core/->
injest.path/x>> clojure.core/->>
injest.path/=> clojure.core/->
injest.path/=> clojure.core/->
injest.path/=>> clojure.core/->>
injest.path/|> clojure.core/->
injest.path/|>> clojure.core/->>}}
injest.path/|> clojure.core/->
injest.path/|>> clojure.core/->>}

:hooks {:macroexpand {injest.path/+> injest.path/+>
injest.path/+>> injest.path/+>>
injest.path/x> injest.path/+>
injest.path/x>> injest.path/+>>
injest.path/=> injest.path/+>
injest.path/=>> injest.path/+>>}}

:linters {:injest.path/+> {:level :error}
:injest.path/+>> {:level :error}
:unused-binding {:level :off}}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
(ns injest.path)

(def protected-fns #{`fn 'fn 'fn* 'partial})

(defn get-or-nth [m-or-v aval]
(if (associative? m-or-v)
(get m-or-v aval)
(nth m-or-v aval)))

(defn path-> [form x]
(cond (and (seq? form) (not (protected-fns (first form))))
(with-meta `(~(first form) ~x ~@(next form)) (meta form))
(or (string? form) (nil? form) (boolean? form))
(list x form)
(int? form)
(list 'injest.path/get-or-nth x form)
:else
(list form x)))

(defn path->> [form x]
(cond (and (seq? form) (not (protected-fns (first form))))
(with-meta `(~(first form) ~@(next form) ~x) (meta form))
(or (string? form) (nil? form) (boolean? form))
(list x form)
(int? form)
(list 'injest.path/get-or-nth x form)
:else
(list form x)))

(defn +>
[x & forms]
(loop [x x, forms forms]
(if forms
(recur (path-> (first forms) x) (next forms))
x)))

(defn +>>
[x & forms]
(loop [x x, forms forms]
(if forms
(recur (path->> (first forms) x) (next forms))
x)))

0 comments on commit 9fa4ee5

Please sign in to comment.