Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[barray] adopt the barray module #38

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

[barray] adopt the barray module #38

wants to merge 11 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 22, 2017

This is WIP, but I think we can do a quick review for it.

A few things in this change:

  1. I added the iterator module to replace the sequence module. The OCaml
    community seems have not reached an aggrement on how functional
    iterator would look like, we've got gen, seq, Core.Sequence etc. I
    picked this because it is purely functional and have a monadic
    interface, while Sequence is side-effect based. Dave suggested we use
    this as well.
  2. Adopted the barray module, added a Collection interface

Things to do:

  1. add unit tests for barray

This is WIP, but I think we can do a quick review for it.

A few things in this change:
1. I added the iterator module to replace the sequence module. The OCaml
community seems have not reached an aggrement on how functional
iterator would look like, we've got `gen`, `seq`, `Core.Sequence` etc. I
picked this because it is purely functional and have a monadic
interface, while `Sequence` is side-effect based. Dave suggested we use
this as well.
2. Adopted the barray module, added a Collection interface

Things to do:
1. add unit tests for barray
@ghost ghost requested review from bordoley and glennsl April 22, 2017 13:43
@@ -46,7 +46,7 @@ val append : 'a array -> 'a array -> 'a array
val concat : 'a array list -> 'a array
(** Same as [Array.append], but concatenates a list of arrays. *)

val sub : 'a array -> int -> int -> 'a array
val slice : 'a array -> int -> int -> 'a array

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to call this method slice, I think it should honor the same semantics as JS. See the immutable.re implementaiton: https://github.com/facebookincubator/immutable-re/blob/master/src/Immutable.rei#L2098

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the array should be the last argument to enable chaining.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented slice

(** [Array.to_list a] returns the list of all the elements of [a]. *)

val of_list : 'a list -> 'a array
val fromList : 'a list -> 'a array

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd recommend not including functions that can't be implemented efficiently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do some benchmark and see how efficient they can be, I am more inclined to keep them because I don't want to force the user to write these functions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that you can't make this efficient at scale. You either have to do an O(N) scan of the list to get its count, or you have to do multiple array copies as you discover the need for a bigger array. Either option is inefficient. The only reason OCaml developers do this is that they don't have better datastructures available. Immutable.re solves this problem by providing Vector, Deque and Stack, all that have O(1) count operations.

@@ -76,24 +76,23 @@ val mapi : (int -> 'a -> 'b) -> 'a array -> 'b array
function is applied to the index of the element as first argument,
and the element itself as second argument. *)

val fold_left : ('a -> 'b -> 'a) -> 'a -> 'b array -> 'a
val foldLeft : ('a -> 'b -> 'a) -> 'a -> 'b array -> 'a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same terminology as Immutable.re (reduce, reduceReversed). I had a conversation with @jordwalke and he made a compelling argument against using left/right.

(** [Array.fold_right f a x] computes
[f a.(0) (f a.(1) ( ... (f a.(n-1) x) ...))],
where [n] is the length of the array [a]. *)

external make_float: int -> float array = "caml_make_float_vect"
external makeFloat: int -> float array = "caml_make_float_vect"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these specialized arrays make sense in JS? Are they compiled to JS TypedArrays?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it currently does, but I seem to recall it having been mentioned as a possibility at least.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glennsl means here
We don't have conditional build (Native vs JS) yet, so let's leave it like this for now.

@@ -113,59 +112,42 @@ val fast_sort : ('a -> 'a -> int) -> 'a array -> unit

(* The following is for system use only. Do not call directly. *)

external unsafe_get : 'a array -> int -> 'a = "%array_unsafe_get"
external unsafe_set : 'a array -> int -> 'a -> unit = "%array_unsafe_set"
external unsafeGet : 'a array -> int -> 'a = "%array_unsafe_get"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these actually necessary in JS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the automatic undefined -> option conversion makes this obsolete?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are only targeting JS then Barray.get can make use of Js.undefined_to_opt, but since we will support native later and Immu.re is more native sided, I would not introduce too much BS specific stuff.

@@ -0,0 +1,16 @@

module type VectorLike = sig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pull this module (collection.ml) out of core and move it into containers. It's unlikely Imm.re will ever depend on this, it would be better to define core as the common shared subset, and containers vs. imm.re can be a user choice.

@@ -5,3 +5,9 @@ module type S = sig

val compare:t Comparator.t
end

module type C = sig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid adding this definition. I intentionally removed it from Imm.re, because the definition needs to be considered within the scope of modular implicits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thanks for the review, sorry I was somehow busy today and just had the chance to look at it. Could you elaborate on the design if we have modular explicit? I am just curious how it would look like and why it would affect adding this interface. Thanks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of an implicit module signature for a parameterized type from the modular implicit paper:

implicit module Show_list { A : Show } = struct
type t = A . t list
let show l = "["^ String . concat ", " ( List . map A . show l ) ^"]"
end

The paper is here: http://www.lpw25.net/ml2014.pdf

For the most part, I think we should avoid defining any interfaces that result in unwieldy apis that will be resolved in a consistent fashion by using modular implicits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi yeah, I have read the paper, I agree this interface is a bit unusual. I took the idea from Core, but it seems Haskell does not have the notation, nor any other language I can recall. I will remove it then.

@@ -2,4 +2,9 @@
module type S = sig
type t
val equals : t Equality.t
end

module type C = sig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here. I would not implement this until we have modular implicits.

@@ -0,0 +1,41 @@

type +'a node =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this module Sequence, include the Sequence type definition in core, but move your implementation of the combinators into containers.

Copy link
Member

@glennsl glennsl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterator design seems pretty good to me. I prefer its explicitness over the simpler and structural function that raises on exception.

val get_safe : 'a t -> int -> 'a option
(** [get_safe a i] returns [Some a.(i)] if [i] is a valid index
@since 0.18 *)
val getSafe : 'a t -> int -> 'a option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOption might be a better name here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would encourage to follow the approach of always making the defaults safe, eg. get always returns an option, and then you can add getOrRaise for efficiency.

val fold : ('a -> 'b -> 'a) -> 'a -> 'b t -> 'a

val foldi : ('a -> int -> 'b -> 'a) -> 'a -> 'b t -> 'a
val foldi : f:('a -> int -> 'b -> 'a) -> acc:'a -> 'b t -> 'a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be reducei. Or reduceWithIndex to be really verbose about it?

(** Fold left on array, with index *)

val fold_while : ('a -> 'b -> 'a * [`Stop | `Continue]) -> 'a -> 'b t -> 'a
val foldWhile : ('a -> 'b -> 'a * [`Stop | `Continue]) -> 'a -> 'b t -> 'a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduceWhile

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can avoid the need for two functions, by using Imm.re's definition of reduce:

let reduce: while_::('acc => a => bool)? => ('acc => a => 'acc) => 'acc => t => 'acc;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also much prefer the while check returning a boolean

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to val reduceWhile : ('a -> 'b -> 'a * bool) -> 'a -> 'b t -> 'a where the boolean value indicate whether we should stop or not (true to stop, false to continue).

@since 0.8 *)

val iter : ('a -> unit) -> 'a t -> unit
indicated by the accumulator *)

val iteri : (int -> 'a -> unit) -> 'a t -> unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEachi or forEachWithIndex

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEachWithIndex

@since 0.8 *)

val iter : ('a -> unit) -> 'a t -> unit
indicated by the accumulator *)

val iteri : (int -> 'a -> unit) -> 'a t -> unit

val blit : 'a t -> int -> 'a t -> int -> int -> unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mutates, so we might want to remove it, or at least mark it very clearly as being mutating.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have the minimal mutable subset of array in core (blit and pals), and then in containers you can add whatever you like for containers to containers itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a bit weird, took me a while to understand what it does.


val pp_i: ?sep:string -> (int -> 'a printer) -> 'a t printer
(** Print an array, giving the printing function both index and item *)
val toIterator: 'a t -> 'a Iterator.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An iterator is really just a "view" of or interface to the data structure, not itself a separate data structure, so I don't think toIterator is a good name for this. getIterator or makeIterator, or perhaps just iterator would be more fitting I think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used toSequence in immutable.re.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to toSequence

(** Map each element into another value, or discard it *)

val flat_map : ('a -> 'b t) -> 'a t -> 'b array
val flatMap : ('a -> 'b t) -> 'a t -> 'b array
(** Transform each element into an array, then flatten *)

val (>>=) : 'a t -> ('a -> 'b t) -> 'b t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the intention to remove non-alpha methods from the library to promote using the actual words (bind, pure etc.)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, random symbols are pretty hostile to newcomers, and they can easily be defined in your own projects or in an auxillary library.


val except_idx : 'a t -> int -> 'a list
val exceptIndex : 'a t -> int -> 'a list
(** Remove given index, obtaining the list of the other elements *)

val (--) : int -> int -> int t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems pretty damn useful, and even kind of intuitive, but an ordinary range function is probably still better.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest js slice which is not too hard to implement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range can be just an alias of --^, or the other way around.

module type VectorLike = sig
type 'a t

val mem: 'a Equality.t -> 'a t -> 'a -> bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used this recently, and struggled to find this, so I should be able to remember what this does. But even seeing the name and type signature I still could not. Lol, that's how bad this name is. contains is a nice name, though javascript has named it includes for some silly reason.

@@ -0,0 +1,16 @@

module type VectorLike = sig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of this name, or this interface. It doesn't seem very well defined. It seems kind of like an Iterable, but an Iterable would not have a property-like length function, but instead use a verb like count to imply that an action needs to be performed. It also should not have an isEmpty function, but perhaps instead an any function (which is probably a better name for exists if the predicate is optional)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this module for now.

@glennsl
Copy link
Member

glennsl commented Apr 22, 2017

I also pretty much concur with everything @bordoley said

@mransan
Copy link

mransan commented May 11, 2017

A couple of general comment (around the same theme)

  • "fromList is inefficient": this is all relative and really up to the user to decide whether it meets its performance requirement of not. The documentation should specify the complexity of the operation but I believe we should still have such function even though there exist more efficient (and more complex) solution to the problem. What if I am writing a quick and simple program where performance does not matter (unit test, little script ...)? (It's perfectly fine btw to advertise better solutions in the documentation but the decision is eventually left to the user).
  • usage of unsafeGet/Set, I very much like those options to be available. There are many use cases that are perfectly fine using those unsafe API (generated code, algorithm where the boundary are check ahead of accessing the elements, etc). [@@undefined_to_opt] is still not as efficient as an unsafe access.
  • Can we rename collection to bcollection.

@ghost
Copy link
Author

ghost commented May 12, 2017

Hey sorry for the delay, was on holiday for a while, I think this is ready for another review, I cleaned up the interface and added more unit tests. @glennsl @bordoley

val flatMap : ('a -> 'b t) -> 'a t -> 'b t
(** Transform each element into an t, then flatten *)

val (--) : int -> int -> int t
Copy link

@mransan mransan May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put operators in 'Infix' or 'Ops' sub modules. The motivation for this is that operators are only useful when the module they are defined in is opened. However opening a module is usually a bad practice so in order to limit the scope of what is open operator can be placed in a sub module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will put them in a Infix submodule.

val compare:t Comparator.t
end

val compare: t Comparator.t
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this file does not start with a "b" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to not add b when the module name does not conflict with the standard library. Do you think it's better to have a b for all the modules?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only name directly conflicts with stdlib will be prepended.

Copy link

@mransan mransan May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might very well conflicts with other people file names. It's much safer (and consistent) to start all files with the same prefix. If a user once to change the name in its program he can simply use module aliases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a top level namespace file called Collection, so say if the user wants to use Sequence he will probably will have to alias Collection.Sequence. But I think for consistency I will change all the modules to start with a B.

@@ -0,0 +1,41 @@

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this file as well starts with "b"


(* The following is for system use only. Do not call directly. *)

external unsafe_get : 'a array -> int -> 'a = "%array_unsafe_get"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have them, as getOrRaise

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe_get don't raise exception in the standard library, no boundary check is performed, and behavior is left undefined if out of bound.

Copy link
Author

@ghost ghost May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val unsafeGet : int -> 'a t -> 'a

val unsafeSet : int -> 'a -> 'a t -> unit

Added them back with this signature for chained application style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this unsafeGetUnchecked, so it's a bit less vague?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would discourage exposing this in the API personally. There are are uses for unsafe access, but at that point, I'd assume any user would be advance enough to provide their own implementation around the ffi interface.

Copy link
Member

@glennsl glennsl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a lot of bad formatting in the tests, and very inconsistent use of piped vs. non-piped forms

If the value of [x] is a floating-point number, then the maximum
size is only [Sys.max_array_length / 2].*)

val init : int -> (int -> 'a) -> 'a t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be named makeWithInit perhaps?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stuck with 'init' in imm.re. I don't think it matters much though either way in this case.

val concat : 'a t list -> 'a t
(** Same as [Barray.append], but concatenates a list of arrays. *)

val slice : int -> int -> 'a t -> 'a t option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were the labels removed here? I think labels are even more useful here because it helps distinguish start/end from start/length.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree! As a general rule if you have more than one function parameter of a type then you should use label.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to figure out what labels should be preserved and what should be removed, there is no guideline on this as far as I can see. Perhaps we should have two modules, Array and ArrayLabeled as the stdlib/ocaml-containers? What's your opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @mransan's rule is good. If the order is painfully obvious it might be ok to make an exception, but otherwise i'd just follow that rule.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

immutable.re uses "start" and "end_"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike having two apis, one labelled one not. Smells of lack of decisiveness on the behalf of the API writer and is harder to learn.

I generally used name labels when they are needed to clarify the intent of the API signature. In this case, there is no way to know what the the various integer arguments do without labels.


val map : ('a -> 'b) -> 'a t -> 'b t

val mapi : (int -> 'a -> 'b) -> 'a t -> 'b t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapWithIndex


val count: ('a -> bool) -> 'a t -> int

val iter : ('a -> unit) -> 'a t -> unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEach

| Cons (x, r) -> (match p x with
| Some c -> Cons (c, filterMap p r)
| None -> filterMap p r ()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

begin/end would be better than parentheses here IMO


type +'a node =
| Nil
| Cons of 'a * 'a t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here is inconsistent with the rest of the pipes in this file


let return x () = Cons(x, empty)

let rec map f i () = match i () with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of having match (or anything other than just function really) at the end of the line here. When I see a function declaration followed by a line starting with a pipe, I expect the last argument of the function to be implicitly pattern matched, ie. = function instead of = match ... with

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, why match ... with is bad here? I understand you mentioned it a couple of times before but I failed to grasp why this is a bad pattern. Second, I tried a bit, and found this one really hard to be rewritten using only function since we have to match against i ().

Copy link
Member

@glennsl glennsl May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't mean it's bad to use match .. with here, just that the formatting is unfortunate because it looks very similar to this:

let rec map f i () = function
| Nil -> Nil
| Cons (x, r) -> Cons (f x, map f r)

I think it's bad practice more generally to place code of any significance at what is effectively a different indent level, but match ... with is especially unfortunate due to this similarity. So all I'm really asking is that you format it like this instead:

let rec map f i () =
  match i () with
  | Nil -> Nil
  | Cons (x, r) -> Cons (f x, map f r)


let isEmpty a = length a = 0

external unsafe_sub : 'a array -> int -> int -> 'a array = "caml_array_sub"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be camelCased too, even if it's not exposed.

let length = Array.length

let get i a =
if i>=0 && i<Array.length a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent spacing around the infix operators.

@glennsl
Copy link
Member

glennsl commented May 13, 2017

@mransan:

"fromList is inefficient": this is all relative and really up to the user to decide whether it meets its performance requirement of not. The documentation should specify the complexity of the operation but I believe we should still have such function even though there exist more efficient (and more complex) solution to the problem. What if I am writing a quick and simple program where performance does not matter (unit test, little script ...)? (It's perfectly fine btw to advertise better solutions in the documentation but the decision is eventually left to the user).

I think it's good design to make performance costs explicit, since it makes the implications apparent when reading the code. It's usually considered bad practice to optimize prematurely, and most people don't read the documentation for basic stuff like this anyway, so it's important to have this information at read-time, not just write-time.

There might be better solutions than inconvenience of course, but I don't think documentation is a good substitute for poor design, and I don't think it's a good idea to optimize the design for scripting either.

@mransan
Copy link

mransan commented May 14, 2017

@glennsl "fromList" is not inefficient, it's just a convertor between 2 containers. If we're discussing 2 implementations differences of that function and one is more performant than another I would agree, but here we're talking whether converting a list to an array should be exposed at all.

A simple case could be that you use an API (3rd party) that returns a list but then you need to perform a one-time algorithm which require random access. Why not simply converting one to the other using the fromList function ? Are there better alternatives ? If so, why the fromList is not implemented in terms of them ?

@glennsl
Copy link
Member

glennsl commented May 14, 2017

@mransan Conversion between any two containers is supported via sequence (xs |> Array.toSequence |> List.fromSequence). What you're requesting is that array and list should be special cased, which strongly implies that there is a more efficient conversion between them (in the absence of some other obvious reason for special treatment).

@ghost
Copy link
Author

ghost commented May 14, 2017

I lean towards not having fromList now since we have an iterator class Sequence which should play this role.

@mransan
Copy link

mransan commented May 15, 2017

@glennsl @dorafmon, so fromSequence is:

let fromSequence i =
  of_list @@ Sequence.foldLeft (fun acc item -> acc @ [item]) [] i

So now I need to convert my list to a sequence which gets converted back to a list in order to
call the ofList function that has been removed from the interface.
a) it's slower
b) it's more complex to understand (ie new comers need to also understand sequence concept even for the 2 most popular containers in OCaml!)

I agree the unification of all containers to a general concept as a sequence is elegant and theoretically pleasing, but sometimes I just think it defeats the purpose of simplicity.

@bordoley
Copy link

bordoley commented May 15, 2017

@mransan there is no reason that implementation can't be changed not to convert to a list. It's basically impossible to implement Array.fromList/Seq, efficiently because it's always going to involve numerous o(n) copies. Another option is to require the caller pass the size as an argument to the Array.from* function, and raise an exception if you can't pull enough items from the list/seq to fill the array.

@ghost
Copy link
Author

ghost commented May 17, 2017

@mransan @bordoley If we are only considering the BS case, then we can implement fromSequence like this:

let fromSequence i =
    let open Bsequence in
    let arr = ref [||] in
    let rec aux i =
        match i () with
        | Nil -> !arr
        | Cons(x, r) -> ignore @@ Js.Array.push x !arr; aux r  
    in aux i

JS offers variable length array and thus it made it this possible. We only need 1 copy of the array in total.

@ghost
Copy link
Author

ghost commented May 17, 2017

A summary of outstanding issues and what I am thinking:

  • @glennsl mentioned the formatting issue, I am using ocp-indent and I have checked in the configuration file. I think we should leave the formatting up to the autoformatter. @glennsl could you check and see if you see any particular issue we should address?
  • @mransan mentioned we should prefix modules with b. There are a few modules that do not follow this guideline. I will change it in a seperate PR.

Apart from these two can we get this PR merged if no other objection? I think the interface is nice now and I the tests should provide at least basic coverage.

Thanks a lot!

@bordoley
Copy link

@dorafmon: Under the covers JSArray is effectively a C++ vector. There are optimizations for small arrays (and you can implement these yourself as well) but pushing into large arrays can also cause significant amounts of copying. You also sacrifice some array optimizations. This article has more details (old but still mostly accurate): https://gamealchemist.wordpress.com/2013/05/01/lets-get-those-javascript-arrays-to-work-fast/ See item 4.

For large datasets, it's likely that pushing into an immutable vector such as those implemented by Imm.re/Immutable.js will be faster than pushing into a native js array.

@ghost
Copy link
Author

ghost commented May 18, 2017

@bordoley out of curiosity I assume the immutable vector has a O(1) conversion to arrays?

@bordoley
Copy link

@dorafmon I'll eat my words a little. I ran some perf tests last night, and indeed Array.push is fast (in absolute time) for pushing large number of items (faster than I thought it would be), beating even push into a Transient Vector by 30 to 50%. I tested with Arrays where n = 7M on node. This surprised me. I looked into the BS stdlib, and I suspect that I could optimize some array CopyOnWriteArray functions to use native implementations that could achieve equal or better performance.

That said push is still not "efficient" in absolute terms as pre-allocating a single array and then inserting elements into it should be faster since it avoids generating garbage.

Imm.re vectors do not have an O(1) conversion to array. However, it's possible to implement an conversion via one allocation, and O(N) copies. This is due to the fact that Imm.re vectors have an O(1) count operation.

@mransan
Copy link

mransan commented May 18, 2017

@bordoley efficiency is relative and depends on the context.

Take this example: you call a third party API (no control over it) which returns a list, you then want to transform it into an array to make an expensive computation which required random access of the elements. So you must convert a 'a list to 'a array. What are the choices:

  • use fromList: does the minimum amount of allocations and has linear complexity. (yes 2 iterations one to get length and one to copy).
  • use to/fromSequence: duplicate the list into another list (linear number of allocation)
  • use Immuntable List/Array/Vector as intermediate data structure to get to an array: no matter what you will need to allocate more stuff.

So now in this example (which by the way is very common), the most efficient solution is a plain old fromList. Maybe I miss something :) let me know!

Copy link
Member

@glennsl glennsl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dorafmon The formatting of the tests are still completely borked.

I really don't understand why you insist on using ocp-indent, since to me it seems to cause more problems than it solves. Whatever you use is fine by me as long as the end result is good, but right now the tests are pretty bad.

.ocp-indent Outdated
# Number of spaces used in all base cases, for example:
# let foo =
# ^^bar
base = 4
Copy link
Member

@glennsl glennsl May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4? Isn't 2 spaces eessentially the standard indentation for OCaml projects? I don't think I've seen a project using 4 spaces yet.


val unsafeGet : int -> 'a t -> 'a

val unsafeSet : int -> 'a -> 'a t -> unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on whether or not to include these, but if they're included I'd like them to be named unsafeGetUnchecked and unsafeSetUnchecked


let empty () = Nil

let return x () = Cons(x, empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed this make in bopt and bresult. On the fence about that, but I don't think return is a very good name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, return reflects that a sequence is monadic, but I guess it is okay to rename it something else.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point. While I understand wanting to avoid a lot of the category theory nomenclature, it might be good to keep the basics to it's clear what the structures being used actually are? return, pure or something like that are pretty standard in FP.

Copy link
Member

@glennsl glennsl Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for who? Should we optimize for the few that have already demonstrated a great amount of perseverance by learning category theory, or the many who have not?

As an example, to call an external function that accepts a nullable value, e.g external f : int Js.null -> unit = ... you currently have to write (and read) f (Js.Null.return 0). That's not a great experience and severely undermines what's otherwise a damn good FFI system. Most also seem to get along just fine with the option monad without needing a return function.

That's not to say it wouldn't be good to have a standard monad "interface"/module signature, but I don't see great benefit in that interface adopting standard terminology, compared to the benefit of avoiding incomprehensible terminology.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I think we might be agreed then. I don't really care about the terminology, but I think the interface is very valuable, both the beginners and more experienced devs alike.

As long as there is a standard to adhere to that is consistent inside this repo, I can mentally replace bind and return with make and flatMap or andThen or whatever the best name is decided to be.

match i () with
| Nil -> Nil
| Cons (x, r) -> begin
match p x with
Copy link
Member

@glennsl glennsl May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much indentation. I also think I'd prefer begin on the same line as match:

begin match p x with
| Some c -> ..
| None -> ...
end

Maybe that'll fix it for ocp-indent.

| Nil -> acc
| Cons (x, next) -> foldLeft trans (trans acc x) next

let rec iter f i =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEach

| Nil -> Nil
| Cons (this, next) -> append (trans this) (flatMap trans next)

let rec foldLeft trans acc iter =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce

@ghost
Copy link
Author

ghost commented Jun 4, 2017

@bordoley @mransan Did some benchmarking, the code is here. So tired fromList and fromSequence on a large array and not surprisingly fromSequence is slower (around 5 times usually on my Mac). I used node's profiler and think the reason is that we grow the array too often.
So I modified the code a bit so it allocate a large array beforehand and then assign values to the array. But this version turns out to be even slower. I think we should include fromList, toList as the conversion happens a lot and in this case fromSeq/toSeq seems still suffers from performance.

@ghost
Copy link
Author

ghost commented Jun 4, 2017

@glennsl I reformated the code a bit, especially the test. Is it better now?

@glennsl
Copy link
Member

glennsl commented Jun 7, 2017

@dorafmon The last half or so of barray_test is better. Not as good as before you started messing with it, but good enough. bsequence_test, int_test and the first half of barray_test is still pretty bad.

There's also lots of remaining issues with 4-space indentation, even if you've fixed the config the code itself hasn't been reformatted.

@ghost
Copy link
Author

ghost commented Jun 11, 2017

@glennsl I overhauled the whole formatting a bit. Should be a bit better.

Copy link
Member

@glennsl glennsl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better, but there's still some issues and a few files have gotten much worse for some reason.

);
let _ = (try Bchar.getDigitOrRaise '1'
with
| Failure _ -> 1) in ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look very good.

Either

let _ = (
  try
    Bchar.getDigitOrRaise '1'
  with
  | Failure _ -> 1) in ();

or

let _ = (
  try Bchar.getDigitOrRaise '1'
  with Failure _ -> 1) in ();

or perhaps using begin/end instead of parens would be better

);
runPerfTest "Sequence.foldLeft" (fun _ ->
Bsequence.reduce (fun acc x -> x::acc) [] it
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have actually gotten worse... Not sure why that would be

(if this < that
then Ordering.Less
else if this > that then Ordering.Greater else Ordering.Equal :
Ordering.t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very inconsistent.

(fun x ->
fun y -> if (comparator x y) == Ordering.equal then true else false :
'a Equality.t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is explicitly curried, it would probably format better if it was not. Also weird spacing before the arrows.

|> Expect.toEqual '0');
test "fromIntOrRaise - 256" (fun () ->
(fun () ->
256 |> fromIntOrRaise)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the next test) should have the body of the lambda indented one more level

[|1 ; 2; 3|] |> Barray.toSequence
|> Barray.fromSequence
|> Expect.toEqual [|1; 2; 3|]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tests from this to the end of the file are still off

test "neg - 42" (fun () ->
42 |> neg
|> Expect.toEqual (~-42)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again there's suddenly an explosion of indentation in this entire file...

(** Transform each element into an t, then flatten *)

module Infix : sig
val (--) : int -> int -> int t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file still has too much indentation. There's not very many places where it actually shows though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants