-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Seq Composer #1570
[WIP] Seq Composer #1570
Conversation
@KevinRansom - I'm having grief trying to the IL output of the error that I'm getting. My machine isn't running all the tests (dying earlier? Why? I don't know, I don't care for this machine very much though...) so I can't see the output of the SeqExpressionSteppingTest1 error that is occurring. Anyway, any assistance (pointing at me to RTFM is acceptable!) would be appreciated. |
Hey Libo, any chance you want to collaborate on this? It started inspired a bit by #1528 It's not currently building fully due to ILCompare coming up with a mismatch, but I think this is just a technicality (i.e. crankin' the handle). I'll like to get as much of the Seq.* functions over as possible to this framework. I'm pretty busy, so not sure how much I'll be able to achieve in the near future. First I guess I would just like some eyeballs on it to see it's not completely insane, secondly I'd like to add whatever other functions would head over ("choose" is an obvious next.) Anyway, let us know. |
I would be interested and I'll try to look at this more closely today. That said @dsyme's post in the other fusing |
My interpretation of @dsyme's response to #1525 was that he wasn't interested in the compiler performing these kinds of optimizations. Now my interpretation could be wrong, so let us know Don! Anyway, assuming I'm not wrong, and he's still happy for library improvements, this is a pure fsharp.core change. No compilation changes required! I.e. this is very similar to what you proposed, just expanded upon to make it more flexible to all functions, rather than just collapsing map functions. Have a look and see what you think. |
|
||
abstract Composer : SeqComponent<'U,'V> -> SeqComponent<'T,'V> | ||
|
||
abstract ComposeMap<'S> : Map<'S,'T> -> SeqComponent<'S,'U> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComposeWithMap
might be a better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky naming; they way the functions work out is that it is ComposeMapWith as map is "first". That might be better... Maybe...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the midst of watching the AFL grand final so not devoting a huge number of neurones to this, but I think i now agree with your original suggestion...
inherit SeqComponent<'T,'T>() | ||
|
||
override this.Composer (next:SeqComponent<'T,'V>) : SeqComponent<'T,'V> = | ||
next.ComposeFilter this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's second.ComposeFilter first
in most of the other ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
second.ComposeSkip first | ||
|
||
override second.ComposeSkip (first:Skip<'T>) : SeqComponent<'T,'T> = | ||
if System.Int32.MaxValue - second.SkipCount > first.SkipCount then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this condition would make more sense as Int32.MaxValue > first.SkipCount + second.SkipCount
to match expression in the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done so we don't overflow
|
||
member __.Filter :'T->bool = filter | ||
|
||
and MapFilter<'T,'U> (map:'T->'U, filter:'U->bool) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be called MapThenFilter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Suck down my repository and start modifying the branch. I'll take in any pull requests you make on that and then they should get transferred back to this main repository. Go for your life and change whatever you like. I can always modify back of I disagree. |
Can someone tell me what the reason why So we end up with code like:
Which only calls
Anyway, I assume that I can't change this unintuitive implementation, although no mention of this behaviour is in the documentation? Anyone? Anyone? And if I can't change it, which I guess I can't, can we at least maybe mark Hmmm... This should probably be posted where someone will actually read it... |
@manofstick looks fun! If I get some free time I may dive into this, thanks for the heads up |
// type ISeqEnumerable<'T> = | ||
// abstract member Fold<'State> : folder:('State->'T->'State) -> state:'State -> 'State | ||
|
||
module SeqComposer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you thought about using a rec module to eliminate all of the and types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rec module - shiny new feature I haven't used :-)
Personally I don't mind the 'and' syntax, but I'll give it a burl!
| Finished = 2 | ||
| InProcess = 3 | ||
|
||
type SeqComposedEnumerator<'T,'U>(enumerator:IEnumerator<'T>, t2u:SeqComponent<'T,'U>) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So .... this is a reference type ... have you considered a struct enumerator, it might reduce the gc somewhat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I'm creating a fair bit of rubbish already; each step in creating the pipeline creates 2 or 3 objects (1 new Enumerable, 1 Factory object and, after the first object, 1 composing object to tie types together) and every time GetEnumerator is called in traverses the factory objects and creates a component object for each level of the pipeline. So unfortunately it's pretty pretty garbage hungry. The Linq libraries recycle the first (and probably only) call to GetEnumerator to reuse the IEnumerable object; possibly something like that could be done to lesson the garbage.
OK, my factory change is in; I reran the test from here (I'm modifying that comment to keep track of times in a very superficial way) and it drops the time by ~5% for the 64 bit version; a little less for the 32 bit, but it has also opened up some other possibilities for some more optimizations. |
Update in how this is coming along... Given the following small example:
We get the following results on my machine
So about double the throughput of the original, and a reasonable amount quicker than the Linq implementation, I think we're going along pretty nicely! !!! _So a call out to anyone who can help finish off the whole of the surface area_ !!! |
I was curious to how this interacts with the state machine compilation of sequences in the compiler? https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/LowerCallsAndSeqs.fs#L68 |
@7sharp9 |
@7sharp9 |
@manofstick I was just thinking out loud as usual, just wondered if any change here would cause AST difference in the seq compilation stuff. I would help if I knew how to, and also if I wasn't working on a little addin for vscode to show F# signatures for dlls's. |
No worries! Talk is good; these PRs are usually quite lonely places and I usually end up just talking to myself (maybe somebody listens I'm not sure; make it a bit hard to know if I'm just off in total crazy land of not!) And if you do find yourself bored and looking for something to do then you know where to look :-) |
Do you want to be informed when intermittent failures occur within the build? Now it might be my fault, but it could also just be a race condition, which is what the test says it's about, but I assume that's bad that it failed; I.e. whatever steps that are being taken to avoid the race condition are failing. Anyway, as said, maybe this will occur again and is my fault:
|
Given your previous work on Seq related things I am wondering if have any time to help either review or contribute to this PR. Basically it's a more efficient pipeline combining Seq calls which results in better performance. |
We should probably discuss which functions we want to actually implement. I think I picked many of the easier ones, some of the remaining ones may be more difficult. Do we even want to implement the functions that consume |
@liboz My plan was to add an iter at the enumerable level and then try to implement everything off that, including fold which had really just been done as a proof of concept. But maybe this won't be suitable performance. My plan before that was to implant concat by tweaking my append implementation. I haven't really given too much thought beyond those; but I guess I would like to get 100% coverage as leaving the composer more expensive than remaining within. |
Fairly often we are getting a build error, but not of our making:
|
Is there a reason you remove the extra machinery for Map2? On my computer it makes ~5% difference in speeds (~700 vs ~660 for 64bit). This was my test script:
|
When I was adding map3 I thought I would just simplify; but oops, I should have removed the other one - as when piping it is the second argument that is added to the chain (or just left them both in).. maybe can you reinstate please? I'm off on looking-after-the-kids duty... |
- also renamed delayed as delay to match seq
sorts/rev/permute
- and moved creating Value comparer to a helper function
- more consistency formatting
Thanks for looking into this, I'm closing this for now. If you get any ideas about how to proceed feel free to re-open it. Kevin |
This is just IMO but I felt this PR to be very important. I realize this is a complex and risky PR but I am of the opinion that the performance benefits of this PR is so significant that it should be scheduled for inclusion somehow. I suggested above that this maybe shouldn't be a drop-in-replacement of PS. |
@KevinRansom were waiting for you or @dsyme to review it but neither of you got around to it. The checklist at the top of this PR isn't accurate, the entire Seq module has been implemented in terms of composers. Is there something preventing this from being a candidate for 4.2? |
@cloudRoutine Kevin |
Sorry, time, time, time, what has become of me... :-) (Been a little distracted up in dotnet/coreclr#8963 - which funnily enough, was due to @mrange and this PR... But even then still struggling to find time between work and family...) Anyway, this PR is "pretty good", but does still need polishing a bit, but definitely in the stage where a review on the official side would be warranted/nice/necessary... |
@manofstick @cloudRoutine @dsyme Please don't feel pressured by my polling these PR's from time to time I'm just trying to figure out which ones people still care about. There is nothing like closing one to encourage people to remind me how important they are :-) ... For this one because it so big and central to the language and library we need a couple of plans ... 1. Review additional public APIs. --- There are a ton of new public APIs in FSharp.Core ... well about 20. The fsharplang repo is where that kind of thing typically happens. 2. We need to determine whether this is a suitable plug-in replacement for the Seq module. We probably need someone in the community to put together a suite of Seq tests and using the current Seq module and then replace it with this ... and verify incompatibilities. @manofstick , @cloudRoutine you shouldn't feel like you need to be the developer to do this testing, indeed what we do here is teamwork so I would hope someone who wants to get started on contributing to F# could lead that effort. I will mark this as F# 4.2 for now ... |
@cloudRoutine My comments here still pretty much apply.
But the performance gains are good. So I'll wait until it's green if that's ok and then I'll take a look. But it's really a monster PR in one of the most critical modules w.r.t. correctness :) p.s. There are whitespace changes, e.g. seq.fsi it would be good if they can be removed and the diff minimized |
@manofstick I've been trying to rebase and squash this PR and it's been a real disaster. I'm encountering conflicts on lots of files that are unrelated to the work on this PR, and trying to disentangle everything has gone quite poorly. As most of the history is churn related to the implementation and deals with constructs and approaches that were ultimately abandoned or replaced it might be easier to make a new PR off of the current master with the same basic changes to have a sane history, reasonable amount of commits, etc. Or we can just merge master straight in, resolve that the past is history so it might as well stay a mystery and keep on trucking, but I don't feel great about that. |
I'd be in favor of the "create a new PR from the current state and close this one" approach, though I'd say that this one shouldn't be closed until the new PR has been able to be tagged "Candidate for F# 4.2". The series of commits in this PR has quite a bit of value as a history of how the Seq Composer feature was developed and how the design shifted over time, but it's too long to do anything useful with now. I'd say let this PR become a historical archive, and create a new PR to carry on with development and code reviews. |
OK; well I'll spawn a new branch from current master and go from there. I agree it will make things a lot cleaner; although it does make me a little sad to remove other peoples contributions from the source tree. I'll try and get a baseline green build up this weekend; but that depends on how nice my children are to me... So everyone who's involved happy with that? |
I don't mind losing a few commits. the functionality is all that matters to me 😉 |
@smoothdeveloper has rebased this over at #2526. That is where I'm heading to see if I can bash this sucker into a green build... (I'm assuming this is only generated il output and surface area checks; which to me should really constitute a different layer of "testing", as they shouldn't stop don/kevin/etc's reviews [and review isn't even really the right world; input would be better, which I think I had requested at earlier stages, but anyway, hack on!] - but that's a whole different issue!) |
Closing because #2587 exists |
A Seq replacement somewhat similar (but better!) performance to System.Linq extension methods. Will attempt to replicate all functionality under the new method, but not required. i.e. compatible with existing implementation.
Seq module
functions that produceseq
s - will at least have a look at them all to determine if they can be reasonably moved over.allPairs
append
cache
cast
choose
chunkBySize
collect
concat
countBy
delay
distinct
distinctBy
splitInto
empty
except
filter
where
groupBy
indexed
init
initInfinite
map
map2
map3
mapFold
mapFoldBack
mapi
mapi2
ofArray
ofList
pairwise
permute
readonly
replicate
rev
scan
scanBack
singleton
skip
skipWhile
sort
sortWith
sortBy
sortDescending
sortByDescending
tail
take
takeWhile
truncate
unfold
windowed
zip
zip3
Seq functions that consume
seq
s (undecided if will provide special implementations or not, as performance is really improved within the chain of theseq
s rather than just the tail. But maybe.)average
averageBy
compareWith
contains
exists
exists2
find
findBack
findIndex
findIndexBack
fold
fold2
foldBack
foldBack2
forall
forall2
head
tryHead
last
tryLast
exactlyOne
isEmpty
item
iter
iteri
iter2
iteri2
length
max
maxBy
min
minBy
nth
pick
reduce
reduceBack
sum
sumBy
toArray
toList
tryFind
tryFindBack
tryFindIndex
tryItem
tryFindIndexBack
tryPick