-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/exp/xiter: new package with iterator adapters #61898
Comments
The duplication of each function is the first thing that catches the eye. Are there thoughts on why this is acceptable? |
What about an adapter that converts an |
Some typos: EqualFunc2, Map2, Merge2, and MergeFunc2 lack the 2 suffixes on their actual names. They're all correct in the corresponding documentation. |
May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation. |
For |
I'd actually prefer Edit: I just realized that if |
This proposal has been added to the active column of the proposals project |
The more I think about it, the more that I think that API design for this should wait until after a decision is made on #49085. Multiple other languages have proven over and over that a left-to-right chained syntax is vastly superior ergonomically to simple top-level functions for iterators. For example, compare nonNegative := xiter.Filter(
xiter.Map(
bufio.Lines(r),
parseLine,
),
func(v int) bool { return v >= 0 },
) vs. nonNegative := bufio.Lines(r).
Map(parseLine).
Filter(func(v int) bool { return v >= 0 }) Go's a little weird because of the need to put the lines := bufio.Lines(r)
intlines := xiter.Map(lines, parseLine)
nonNegative := xiter.Filter(func(v int) bool { return v >= 0 }) That works, but it clutters up the local namespace and it's significantly harder to edit. For example, if you decide you need to add a new step in the chain, you have to make sure that all of the variables for each iterator match up in the previous and succeeding calls. |
What type does |
You would probably have to wrap the base iterator like:
|
Sorry. I should have stuck a comment in. I was just coming up with some hypothetical function that would give an Not necessarily. The transformative and sink functions on iterators could just be defined as methods on |
I was wrong, it’s not an interface. |
Why do some functions take the names := xiter.Map(func (p Person) string {
return p.Name
}, people) // "people" gets lost
// vs
names := xiter.Map(people, func (p Person) string {
return p.Name
}) |
@DeedleFake There won't be a "decision" on #49085 anytime soon. There are good reasons not to do it yet, but we also don't want to say it never happens. The issue exists to reflect that state. What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"? |
No iterators, definitely. I've done fine without them for over a decade. I can wait a bit longer. If a bad implementation goes in, I'll never get a good version. Plus, I can just write my own implementation of whatever iterator functions I need as long as |
Neither chaining nor functional programming has ever been a decisive or recommended technique in Go. Instead, iteration—specifically, procedural 'for' loops—has always been a core technique since the language's inception. The iterator proposals aim to enhance this core approach. While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment. |
I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so. |
Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for
Edit: The way this proposal is phrased does actually imply that they may be heavily reevaluated enough in That issue has only been open for 2 years. I think assuming that it'll take a decade to solve is a bit unfair. Yes, a One of my favorite things about Go is how slow and methodical it (usually) is in introducing new features. I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics. One of the purposes of that approach is to try avoid having to fix it later. Adding those functions in the proposed manner will almost definitely necessitate that later fix, and I very much would like to avoid that if at all possible. |
Java Streams and .NET LINQ build on a standardized iterator system, but they are more than that. Both languages had a generic iterator system before. Iterators are useful without chaining or functional programming.
That would be this very proposal, and it comes with a caveat: "... or perhaps not. There are concerns about how these would affect idiomatic Go code. " This means that not everyone who has read these proposals in advance believes that this part is a good idea. |
Maybe chaining leads to too much of a good thing. It becomes more tempting to write long, hard-to-read chains of functions. You're less likely to do that if you have to nest calls. As an analogy, Go has |
Re #49085, generic methods either require (A) dynamic code generation or (B) terrible speed or (C) hiding those methods from dynamic interface checks or (D) not doing them at all. We have chosen option (D). The issue remains open like so many suggestions people have made, but I don't see a realistic path forward where we choose A, B, or C, nor do I see a fifth option. So it makes sense to assume generic methods are not going to happen and do our work accordingly. |
@DeedleFake The issue is not lack of understanding what a lack of parameterized methods means. It's just that, as @rsc said, wanting them doesn't make them feasible. The issue only being 2 years old is deceptive. The underlying problem is actually as old as Go and one of the main reasons we didn't have generics for most of that. Which you should consider, when you say
We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma. Not having parametric methods is a pretty direct consequence of that decision. |
Well, I tried. If that's the decision then that's the decision. I'm disappointed, but I guess I'll just be satisfied with what I do like about the current proposal, even if it has, in my opinion, some fairly major problems. Sorry for dragging this a bit off-topic there. |
Hope that it's not noise: I wondered if naming it the |
I have produced github.com/jub0bs/iterutil, more for practice and as a showcase than anything else. I don't really use it much yet. I do like https://pkg.go.dev/github.com/jub0bs/iterutil#SortedFromMap, though; it's more efficient than simply chaining |
@DeedleFake Not to nitpick, or anything, but
I would argue that "it is used often" is pretty much the definition of "useful". |
It can't be more efficient since it ranges over |
@DmitriyMV Indeed! That is a typo. Thanks. This is what I had in mind: func SortedFromMap[M ~map[K]V, K cmp.Ordered, V any](m M) iter.Seq2[K, V] {
return func(yield func(K, V) bool) {
ks := keys(m)
slices.Sort(ks)
for _, k := range ks {
if !yield(k, m[k]) {
return
}
}
}
}
func keys[K comparable, V any](m map[K]V) []K {
ks := make([]K, 0, len(m))
for k := range m {
ks = append(ks, k)
}
return ks
} That should be more efficient than simply ranging over |
Another way to do SortedFromMap would be to use a heap, so you don't have to pay the full cost of sorting if you end up breaking the loop before going through all items. |
I looked through my code and the main themes I saw lined up almost precisely with @DeedleFake's #61898 (comment).
Consider counting the number of elements in a slice that have value x. You can do this with something like:
This is much harder to read than left-to-right (pseudo-code): The difficulty of composition has led in practice to a proliferation of helpers that simply combine two or more steps.
func Mapped[Slice ~[]E, E, T any](s Slice, f func(E) T) []T {
return slices.Collect(Map(f, slices.Values(s)))
} |
To be honest, it should read like this:
I thought this is what we as the Go community have agreed on in the past 15 years. |
I wouldn't say that this is settled, but this is what I was talking about when I said,
I do see a lot of advantage in the for-loop heavy style because everyone needs to know how to use for-loops anyway, and typically the clarity from using declarative names like "map" and "filter" ends up getting eaten up in other ways, like the length of callback signatures. It's not clear which way we want Go to evolve. |
The main benefit of iterators over a manual Before this, in order to modify some data source and pass it to something, either that thing would need to know about the data source, breaking the abstraction, or I would need to allocate a slice to store the modified data in. Now I can apply some adapters and pass the resulting |
For loops can be clear and concise. Functional programming style can be clear and concise. I would like to be able to reach for either. The discussion here highlights how far we still are from the latter in Go. Iterators are wonderful and I'm glad we have them. But without lambdas and pipes (or their moral equivalents) we are not able to use them comfortably and composably. |
That's a good idea! I'll see if I can get that to work. |
Spitballing, thinking about what we can do with the tools we have... One way to move from inside-out toward RTL (not as good as LTR!) and slightly finesse the argument order problem is to return functions that transform iterators. For example: // Map converts f into an iterator mapper.
func Map[In, Out any](f func(In) Out) func(seq iter.Seq[In]) iter.Seq[Out] {
return func(seq iter.Seq[In]) iter.Seq[Out] {
return func(yield func(Out) bool) {
for in := range seq {
if !yield(f(in)) {
return
}
}
}
}
}
// Limit returns a function that wraps seq to stop after n values.
func Limit[V any](n int) func(seq iter.Seq[V]) iter.Seq[V] {
return func(seq iter.Seq[V]) iter.Seq[V] {
return func(yield func(V) bool) {
if n <= 0 {
return
}
for v := range seq {
if !yield(v) {
return
}
if n--; n <= 0 {
break
}
}
}
}
} This adds a layer of indirection and leads to a LISP problem (lots of irritating silly parenthesis), but does slightly improve composability. I wouldn't advocate for this as-is, but throwing it out there in case it sparks new ideas... |
And switching from forest to trees...
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
In #53987 (comment), @JeremyLoy proposed an iterator-based addition to (what become) slices.Chunk. I already needed it in multiple places, so I propose adding it. It would allow passing for d := range xiter.Chunk(docs, 1000) {
db.InsertMany(d) // sends a single insert command with up to 1000 documents
} instead of less effective for d := range docs {
db.InsertOne(r) // sends a single insert command with a single document
} In those examples, Just like // Chunk returns an iterator over consecutive slices of up to n elements of seq.
// All but the last slice will have size n.
// All slices are clipped to have no capacity beyond the length.
// If seq is empty, the sequence is empty: there is no empty slice in the sequence.
// Chunk panics if n is less than 1.
func Chunk[E any](seq iter.Seq[E], n int) iter.Seq[[]E] {
if n < 1 {
panic("cannot be less than 1")
}
return func(yield func([]E) bool) {
var batch []E
for e := range seq {
if batch == nil {
batch = make([]E, 0, n)
}
batch = append(batch, e)
if len(batch) == n {
if !yield(batch) {
return
}
batch = nil
}
}
if l := len(batch); l > 0 {
yield(batch[:l:l])
}
}
} |
I've got an implementation of that in my iterator support package. Mine reuses the slice for each chunk, leaving it up to the caller to copy them if they want to keep previous ones into the next iteration. I think that's the better approach as the user can make the decision that way. Another alternative is to allow the user to pass a slice to be used, with the capacity of the slice serving as the size of the chunks. |
imo it would be more convenient if // Chunk returns an iterator over consecutive iterators of up to n elements of seq.
// Chunk panics if n is less than 1.
func Chunk[E any](seq iter.Seq[E], n int) iter.Seq[iter.Seq[E]] {
if n < 1 {
panic("cannot be less than 1")
}
return func(yield func(iter.Seq[E]) bool) {
var stopped bool
take := func(next func() (E, bool), n int) iter.Seq[E] {
return func(yield func(E) bool) {
for range n {
item, ok := next()
if !ok || !yield(item) {
stopped = true
return
}
}
}
}
next, stop := iter.Pull(seq)
defer stop()
for {
if stopped || !yield(take(next, n)) {
return
}
}
}
} you can then aggregate them to slices if needed like: func ChunkSliced[E any](seq iter.Seq[E], n int) iter.Seq[[]E] {
return func(yield func([]E) bool) {
for chunkSeq := range Chunk(seq, n) {
chunk := slices.Collect(chunkSeq)
if len(chunk) == 0 || !yield(chunk) {
return
}
}
}
} or func ChunkSlicedInplace[E any](seq iter.Seq[E], chunk []E) iter.Seq[[]E] {
return func(yield func([]E) bool) {
for chunkSeq := range Chunk(seq, cap(chunk)) {
clear(chunk)
chunk = chunk[:0]
chunk = slices.AppendSeq(chunk, chunkSeq)
if len(chunk) == 0 || !yield(chunk) {
return
}
}
}
} |
@anatoly-kussul I updated my example to show that your proposal would not work for existing functions that accept slices. And converting returned (sub)slices to iterators is already possible with |
This wouldn't really work for cases where both iterator and your logic are I/O heavy. For example: with for chunk := range Chunk(stream, n) {
for msg := range chunk {
// do something
}
time.Sleep(cooldown)
} while You can always convert |
Another situation where "sequences of sequences" can be useful is situations where each element of an input sequence can produce zero or more elements in a result but the boundaries between those sub-sequences are not important to the overall problem. In situations like that (in other languages) I've found benefit in a "flatten" function, which I think would be spelled something like this in Go: func Flatten[V any](seq iter.Seq[iter.Seq[V]]) iter.Seq[V] Although I expect other uses are possible, the main use I have in mind is first using Using (I don't mean this comment to suggest that |
Python does the equivalent of You could also do cooldown with for i, item := range enumerate(stream) {
if i != 0 && i%chunkSize == 0 {
time.Sleep(cooldown)
}
doSomething(item)
} That seems clearer to me. |
We propose to add a new package golang.org/x/exp/xiter that defines adapters on iterators. Perhaps these would one day be moved to the iter package or perhaps not. There are concerns about how these would affect idiomatic Go code. It seems worth defining them in x/exp to help that discussion along, and then we can decide whether they move anywhere else when we have more experience with them.
The package is called xiter to avoid a collision with the standard library iter (see proposal #61897). An alternative would be to have xiter define wrappers and type aliases for all the functions and types in the standard iter package, but the type aliases would depend on #46477, which is not yet implemented.
This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.
Edit, 2024-05-15: Added some missing 2s in function names, and also changed Reduce to take the function first, instead of between sum and seq.
Edit, 2024-07-17: Updated code to match the final Go 1.23 language change. Corrected various typos.
/*
Package xiter implements basic adapters for composing iterator sequences:
*/
The text was updated successfully, but these errors were encountered: