Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chain #2371

Merged
merged 19 commits into from
Aug 15, 2018
Merged

Add Chain #2371

merged 19 commits into from
Aug 15, 2018

Conversation

LukaJCB
Copy link
Member

@LukaJCB LukaJCB commented Aug 8, 2018

Adds Catenable from fs2 and fixes #2358

Still needs:

  • tests
  • law tests
  • license info
  • benchmarks
  • Eq, PartialOrder, Order, Coflatmap instances? (could add those in a later PR)
  • Docs (couls also be added in a later PR)

}

/** Applies the supplied function to each element, left to right. */
private final def foreach(f: A => Unit): Unit = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this method private, though I think we could potentially just rename it to unsafeForeach instead? WDYT?

@martin-g
Copy link

martin-g commented Aug 8, 2018

A checkbox for Documentation is missing in the description!

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 8, 2018

I'm not super clear how the whole lincensing business works, but as far as I can tell, both fs2 and cats are under MIT, so the code shouldn't have to be relicensed, correct?
Do I need to add Pavel, Fabio and Michael as copyright holders somewhere? Or maybe you three can give consent to adding the code here directly?
Not sure what the best way forward is.

cc/ @mpilquist @SystemFw @pchlupacek

@codecov-io
Copy link

codecov-io commented Aug 8, 2018

Codecov Report

Merging #2371 into master will decrease coverage by 0.16%.
The diff coverage is 90.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2371      +/-   ##
==========================================
- Coverage   95.01%   94.84%   -0.17%     
==========================================
  Files         349      350       +1     
  Lines        6000     6249     +249     
  Branches      227      281      +54     
==========================================
+ Hits         5701     5927     +226     
- Misses        299      322      +23
Impacted Files Coverage Δ
...rc/main/scala/cats/laws/discipline/Arbitrary.scala 99.04% <100%> (+0.07%) ⬆️
core/src/main/scala/cats/data/Chain.scala 90.45% <90.45%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edb49b5...bc66361. Read the comment docs.

@pchlupacek
Copy link

I think Catenable is work primarily of @mpilquist and @pchuisano, so them is likely who has to say ok/nay. Didn't follow discussion on inclusion in cats.effects of this, however is Catenable really related to effects somehow? I see it more like data structure...

@pchlupacek
Copy link

ah sorry. this is no effects. My bad :-)

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 8, 2018

@pchlupacek I just mentioned you, because you three showed up in the git history of Catenable 😄

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 8, 2018

On another note, I added some of the benchmarks that used to be on fs2 as well as adding some other new ones, and I gotta say again, using Catenable as a Monoid to accumulate things is just sooooo much faster, it's over 6x as fast as List and close to 8x as fast as Vector.

@pchlupacek
Copy link

ah thanks, so happy to contribute then :-)

@non
Copy link
Contributor

non commented Aug 8, 2018

@LukaJCB I want to benchmark this against chain (https://github.com/non/chain), but I'm +1 to having something like this available. It's a great optimization.

@johnynek
Copy link
Contributor

johnynek commented Aug 8, 2018

also note @non's chain has .iterator:

https://github.com/non/chain/blob/master/src/main/scala/chain/Chain.scala#L42

Also, there are some optimizations around accepting batches (which may or may not be a win, but I think the benchmarks were pretty good).

@mpilquist
Copy link
Member

@non Chain and Catenable are remarkably similar. I wonder if Catenable would benefit from a constructor that can store a collection (it currently only has Singleton/Empty/Append).

@mpilquist
Copy link
Member

My 2 cents on licensing: change COPYING.md from copyright holder of Erik to "Cats Contributors" and ensure Pavel and Paul are added to AUTHORS.md. I think that's sufficient (IANAL).

@SystemFw
Copy link
Contributor

SystemFw commented Aug 8, 2018

As usual I have no clue wrt licensing. I'm ok with Michael's suggestion.

@non
Copy link
Contributor

non commented Aug 8, 2018

I support @mpilquist's suggestion of changing the copyright to Cats Contributors.

I'd just check first with Pavel and Paul to be sure they are OK being added to AUTHORS.md (in the past I think there have been folks that did not want to be listed there).

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 8, 2018

I'd be happy to benchmark against chain and implement iterator :)

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 8, 2018

Initial benchmarks for accumulating using Monoid:

[info] Benchmark                                   Mode  Cnt   Score   Error  Units
[info] CollectionMonoidBench.accumulateCatenable  thrpt   30  62.235 ± 6.139  ops/s
[info] CollectionMonoidBench.accumulateChain      thrpt   30  33.871 ± 4.103  ops/s
[info] CollectionMonoidBench.accumulateList       thrpt   30  12.936 ± 0.941  ops/s
[info] CollectionMonoidBench.accumulateVector     thrpt   30   9.771 ± 0.418  ops/s

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 9, 2018

Full benchmarks including Chain:

[info] Benchmark                               Mode  Cnt          Score         Error  Units
[info] CatenableBench.consLargeCatenable      thrpt   10  113150019.152 ± 3342982.445  ops/s
[info] CatenableBench.consLargeChain          thrpt   10   75615549.374 ± 1542368.424  ops/s
[info] CatenableBench.consLargeList           thrpt   10  162166698.183 ± 6688862.050  ops/s
[info] CatenableBench.consLargeVector         thrpt   10   10788098.777 ±  268263.627  ops/s
[info] CatenableBench.consSmallCatenable      thrpt   10  116035335.260 ± 5234126.018  ops/s
[info] CatenableBench.consSmallChain          thrpt   10   73844803.237 ± 7120152.778  ops/s
[info] CatenableBench.consSmallList           thrpt   10  166106231.811 ± 6037123.988  ops/s
[info] CatenableBench.consSmallVector         thrpt   10   18912142.135 ±  775105.464  ops/s
[info] CatenableBench.createSmallCatenable    thrpt   10   22430120.585 ±  449322.743  ops/s
[info] CatenableBench.createSmallChain        thrpt   10    6538088.330 ±  330873.584  ops/s
[info] CatenableBench.createSmallList         thrpt   10    7441943.968 ±  225077.133  ops/s
[info] CatenableBench.createSmallVector       thrpt   10    9356653.768 ±  281691.453  ops/s
[info] CatenableBench.createTinyCatenable     thrpt   10  127095122.315 ± 3397762.281  ops/s
[info] CatenableBench.createTinyChain         thrpt   10  108455585.645 ± 3665828.817  ops/s
[info] CatenableBench.createTinyList          thrpt   10   12708962.873 ±  336107.111  ops/s
[info] CatenableBench.createTinyVector        thrpt   10   12666436.324 ±  340151.822  ops/s
[info] CatenableBench.foldLeftLargeCatenable  thrpt   10         54.336 ±       5.370  ops/s
[info] CatenableBench.foldLeftLargeChain      thrpt   10        124.684 ±       3.216  ops/s
[info] CatenableBench.foldLeftLargeList       thrpt   10        178.047 ±       5.444  ops/s
[info] CatenableBench.foldLeftLargeVector     thrpt   10         76.381 ±       5.950  ops/s
[info] CatenableBench.foldLeftSmallCatenable  thrpt   10   10277331.023 ±  444900.838  ops/s
[info] CatenableBench.foldLeftSmallChain      thrpt   10   44045260.071 ± 2125730.481  ops/s
[info] CatenableBench.foldLeftSmallList       thrpt   10   47137875.177 ± 1280739.046  ops/s
[info] CatenableBench.foldLeftSmallVector     thrpt   10   13642885.095 ±  582515.502  ops/s
[info] CatenableBench.mapLargeCatenable       thrpt   10         25.476 ±       6.104  ops/s
[info] CatenableBench.mapLargeChain           thrpt   10         51.077 ±       2.371  ops/s
[info] CatenableBench.mapLargeList            thrpt   10         60.507 ±      25.288  ops/s
[info] CatenableBench.mapLargeVector          thrpt   10         63.319 ±       4.614  ops/s
[info] CatenableBench.mapSmallCatenable       thrpt   10    5777927.632 ±  411001.990  ops/s
[info] CatenableBench.mapSmallChain           thrpt   10    9679226.316 ±  561463.001  ops/s
[info] CatenableBench.mapSmallList            thrpt   10   22511388.394 ±  186068.790  ops/s
[info] CatenableBench.mapSmallVector          thrpt   10   10811821.822 ±  499033.452  ops/s

Also implemented an iterator :)

@johnynek
Copy link
Contributor

Whatever is faster.

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 13, 2018

Shall we merge this one, then? :)

foldLeft(nil: Chain[B])((acc, a) => acc ++ f(a))

/** Applies the supplied function to each element and returns a new Chain from the concatenated results */
final def flatMapIterator[B](f: A => Chain[B]): Chain[B] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these iterator variants were just temporary for benchmarking?

Thanks for doing these. Interesting to see which ones were faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want two flatMaps? Can we just keep the faster one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, woops, that's on me, I did just want to keep the faster ones 😄

result
}

final def findIterator(f: A => Boolean): Option[A] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just keep the faster find?

m
}

final def groupByIterator[B](f: A => B)(implicit B: Order[B]): SortedMap[B, Chain[A]] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just keep the faster one.

johnynek
johnynek previously approved these changes Aug 13, 2018
@LukaJCB LukaJCB changed the title Add Catenable Add Chain Aug 13, 2018
@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 14, 2018

@kailuowang @mpilquist @non Any additional comments or are we ready to merge? :)


implicit def catsDataEqForChain[A](implicit A: Eq[A]): Eq[Chain[A]] = new Eq[Chain[A]] {
def eqv(x: Chain[A], y: Chain[A]): Boolean =
(x eq y) || x.toList === y.toList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worthwhile to optimize the second part as x.iterator.sameElements(y.iterator)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would ignore Eq[A] no?

I would just copy and paste the four here:
https://github.com/typelevel/cats/blob/master/kernel/src/main/scala/cats/kernel/instances/list.scala#L28

(Eq, Order, Hash, PartialOrder) and use similar matching. We can follow up, especially if we write the implicits not that just convert to list, then make a PR to fix them to be fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got a fairly fast implementation now using iterator :)

Copy link
Contributor

@kailuowang kailuowang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thanks so much!

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 15, 2018

Going to merge now :)

@LukaJCB LukaJCB merged commit 21aa369 into typelevel:master Aug 15, 2018
@mpilquist
Copy link
Member

🎉 Thanks for doing this! So.... when can we get this in a released version? I'd really like to have it for FS2 1.0, which is scheduled for early September.

@SystemFw
Copy link
Contributor

Thanks for all the work Luka!

Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor comments

if (rights.isEmpty) {
c = null
} else {
c = rights.last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is not being tested which worries me. Can we think of a way to exercise this part?


/** Creates a Chain from the specified elements. */
def apply[A](as: A*): Chain[A] =
as match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t it be better to just do fromSeq(as) here?

I notice currently the second branch of the case match is untested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chain(1, 2, 3, 4) will exercise the WrappedArray case. Now that we have a Wrap constructor, it's probably better to replace this specialization with fromSeq.

case 1 => A.arbitrary.map(Chain.one)
case 2 => A.arbitrary.flatMap(a1 => A.arbitrary.flatMap(a2 =>
Chain.concat(Chain.one(a1), Chain.one(a2))))
case n => Chain.fromSeq(Range.apply(0, n)).foldLeft(Gen.const(Chain.empty[A])) { (gen, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know what the fromSeq is doing here because we are just folding it. I don’t love this generator because it does not generate any internal Wrap nodes and is always left associated.

Could we do something like: for size n we either wrap a seq or create a two random chains of size a and b such that a + b == n and we combine those.

If we do this, I think we can get the coverage up nice and high.

@kailuowang
Copy link
Contributor

@mpilquist okay, let's start preparing for a 1.3 release.

@johnynek
Copy link
Contributor

Can we address my comments? This PR did decrease our overall coverage and has a few untested branches.

@johnynek
Copy link
Contributor

I can make a PR to address these. @LukaJCB has done plenty already.

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 15, 2018

Sorry, I guess I merged too soon, I can also address these 👍

@LukaJCB LukaJCB mentioned this pull request Aug 15, 2018
@LukaJCB LukaJCB deleted the catenable branch August 15, 2018 17:52
@kailuowang kailuowang added this to the 1.3 milestone Aug 16, 2018
@LukaJCB LukaJCB mentioned this pull request Aug 17, 2018
catostrophe pushed a commit to catostrophe/cats that referenced this pull request Sep 15, 2018
* Add Catenable

* Add law tests and simple arbitrary and eq instances

* More tests

* Add benchmarks

* Add chain benchmarks

* Add Iterator

* More chain benchmarks

* Add Paul and Pavel to Authors and change COPYING to Cats Contributors

* More Tests

* Add reverse, groupBy and zipWith

* Add Collection Wrapping optimization

* Add traverse and foldRight implementations that don't convert to List

* Rename to Chain; add reverseIterator

* More efficient implementations

* Use Vector for reversing

* Remove redundant benchmarking methods

* Format scaladoc consistently

* Rename snoc and cons to append and prepend for consistency

* Add proper Eq, PartialOrder, Order and Coflatmap instances
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move fs2 Catenable into cats.data
9 participants