Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter: new package for iterators #61897

Closed
rsc opened this issue Aug 9, 2023 · 247 comments
Closed

iter: new package for iterators #61897

rsc opened this issue Aug 9, 2023 · 247 comments
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted release-blocker
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

We propose to add a new package iter that defines helpful types for iterating over sequences. We expect these types will be used by other APIs to signal that they return iterable functions.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted.

See also:

Note regarding push vs pull iterator types: The vast majority of the time, push iterators are more convenient to implement and to use, because setup and teardown can be done around the yield calls rather than having to implement those as separate operations and then expose them to the caller. Direct use (including with a range loop) of the push iterator requires giving up storing any data in control flow, so individual clients may occasionally want a pull iterator instead. Any such code can trivially call Pull and defer stop.

I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs: instead, iterators can be defined in push form and preserved in that form by any general combination functions and then only converted to pull form as needed at call sites, once all adapters and other transformations have been applied. This avoids the need for any APIs that make cleanup (stop functions) explicit, other than Pull. Adapters that need to convert to pull form inside an iterator function can defer stop and hide that conversion from callers. See the implementation of the adapters in #TODO for examples.

Note regarding standard library: There a few important reasons to convert the standard library:

  • Ship a complete solution. We should not release a package we don’t use in obvious places where it should be used. This applies especially to new interfaces.
  • Put functionality in the right places. Ian’s earlier draft included FromSlice and FromMap, but these are more appropriately slices.All and maps.All.
  • Find problems or rough edges in the iter package itself that we want to find before its release. I have already changed a few definitions from what I started with as a result of working through the standard library changes.
  • Set an example (hopefully a good one) for others to follow.

Note that os.ReadDir and filepath.Glob do not get iterator treatment, since the sorted results imply they must collect the full slice before returning any elements of the sequence. filepath.SplitList could add an iterator form, but PATH lists are short enough that it doesn’t seem worth adding new API. A few other packages, like bufio, archive/tar, and database/sql might benefit from iterators as well, but they are not used as much, so they seem okay to leave out from the first round of changes.


The iter package would be:

/*
Package iter provides basic definitions and operations related to
iterators over sequences.

Iterators

An iterator is a function that passes successive elements of a
sequence to a callback function, conventionally named yield, stopping
either when the sequence is finished or when yield breaks the sequence
by returning false. This package defines [Seq] and [Seq2]
(pronounced like seek - the first syllable of sequence)
as shorthands for iterators that pass 1 or 2 values per sequence element
to yield:

type (
	Seq[V any]     func(yield func(V) bool) bool
	Seq2[K, V any] func(yield func(K, V) bool) bool
)

Seq2 represents a sequence of paired values, conventionally key-value,
index-value, or value-error pairs.

Yield returns true when the iterator should continue with the next
element in the sequence, false if it should stop. The iterator returns
true if it finished the sequence, false if it stopped early at yield's
request. The iterator function's result is used when composing
iterators, such as in [Concat]:

func Concat[V any](seqs ...Seq[V]) Seq[V] {
	return func(yield func(V) bool) bool {
		for _, seq := range seqs {
			if !seq(yield) {
				return false
			}
		}
		return true
	}
}

Iterator functions are most often called by a range loop, as in:

func PrintAll[V any](seq iter.Seq[V]) {
	for _, v := range seq {
		fmt.Println(v)
	}
}

Naming

Iterator functions and methods are named for the sequence being walked:

// All returns an iterator over elements in s.
func (s *Set[V]) All() iter.Seq[V]

The iterator method on a collection type is conventionally named All,
as in the second example, because it iterates a sequence of all the
values in the collection.

When there are multiple possible iteration orders, the method name may
indicate that order:

// All iterates through the list from head to tail.
func (l *List[V]) All() iter.Seq[V]

// Backward iterates backward through the list from tail to head.
func (l *List[V]) Backward() iter.Seq[V]

If an iterator requires additional configuration, the constructor function
can take additional configuration arguments:

// Bytes iterates through the indexes and bytes in the string s.
func Bytes(s string) iter.Seq2[int, byte]

// Split iterates through the (possibly-empty) substrings of s
// separated by sep.
func Split(s, sep string) iter.Seq[string]

Single-Use Iterators

Most iterators provide the ability to walk an entire sequence:
when called, the iterator does any setup necessary to start the
sequence, then calls yield on successive elements of the sequence,
and then cleans up before returning. Calling the iterator again
walks the sequence again.

Some iterators break that convention, providing the ability to walk a
sequence only once. These “single-use iterators” typically report values
from a data stream that cannot be rewound to start over.
Calling the iterator again after stopping early may continue the
stream, but calling it again after the sequence is finished will yield
no values at all, immediately returning true. Doc comments for
functions or methods that return single-use iterators should document
this fact:

// Lines iterates through lines read from r.
// It returns a single-use iterator.
func (r *Reader) Lines() iter.Seq[string]

Errors

If iteration can fail, it is conventional to iterate value, error pairs:

// Lines iterates through the lines of the named file.
// Each line in the sequence is paired with a nil error.
// If an error is encountered, the final element of the
// sequence is an empty string paired with the error.
func Lines(file string) iter.Seq2[string, error]

Pulling Values

Functions and methods that are iterators or accept or return iterators
should use the standard yield-based function signature, to ensure
compatibility with range loops and with other iterator adapters.
The standard iterators can be thought of as “push iterator”, which
push values to the yield function.

Sometimes a range loop is not the most natural way to consume values
of the sequence. In this case, [Pull] converts a standard push iterator
to a “pull iterator”, which can be called to pull one value at a time
from the sequence. [Pull] starts an iterator and returns a pair
of functions next and stop, which return the next value from the iterator
and stop it, respectively.

For example:

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
	return func(yield func(V, V) bool) bool {
		next, stop := iter.Pull(it)
		defer stop()
		v1, ok1 := next()
		v2, ok2 := next()
		for ok1 || ok2 {
			if !yield(v1, v2) {
				return false
			}
		}
		return true
	}
}

Clients must call stop if they do not read the sequence to completion,
so that the iterator function can be allowed to finish. As shown in
the example, the conventional way to ensure this is to use defer.

Other Packages

Many packages in the standard library provide iterator-based APIs. Here are some notable examples.

TODO FILL THIS IN AS OTHER PACKAGES ARE UPDATED

Mutation

Iterators only provide the values of the sequence, not any direct way
to modify it. If an iterator wishes to provide a mechanism for modifying
a sequence during iteration, the usual method is to define a position type
with the extra operations and then provide an iterator over positions.

For example, a tree implementation might provide:

// Positions iterates through positions in the sequence.
func (t *Tree[V]) Positions() iter.Seq[*Pos]

// A Pos represents a position in the sequence.
// It is only valid during the yield call it is passed to.
type Pos[V any] struct { ... }

// Pos returns the value at the cursor.
func (p *Pos[V]) Value() V

// Delete deletes the value at this point in the iteration.
func (p *Pos[V]) Delete()

// Set changes the value v at the cursor.
func (p *Pos[V]) Set(v V)

And then a client could delete boring values from the tree using:

for p := range t.Positions() {
	if boring(p.Value()) {
		p.Delete()
	}
}

*/
package iter

// Seq is an iterator over sequences of individual values.
// See the [iter] package documentation for details.
type Seq[V any] func(yield func(V) bool) bool
// Seq2 is an iterator over pairs of values, conventionally
// key-value or value-error pairs.
// See the [iter] package documentation for details.
type Seq2[K, V any] func(yield func(K, V) bool) bool

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

// Pull starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next value from the sequence. When there is
// such a value v, next returns v, true. When the sequence is over, next
// returns zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull[V any](seq Seq[V]) (next func() (V, bool), stop func())
// Pull2 starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next key-value pair from the sequence. When there is
// such a pair k, v, next returns k, v, true. When the sequence is over, next
// returns zero, zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull2[K, V any](seq Seq[K, V]) (next func() (K, V, bool), stop func())
@szabba
Copy link

szabba commented Aug 9, 2023

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

I think that's supposed to say type Yield[V any] = func(V) bool?

@icholy
Copy link

icholy commented Aug 9, 2023

In the "Pulling Values" example, I think next, stop := iter.Pull(it) should be next, stop := iter.Pull(seq)

@mateusz834
Copy link
Member

mateusz834 commented Aug 9, 2023

Isn't this kind of API more preferable? Consider a case when you need to pass the next and stop functions separately to an function, struct, it might be annoying.

type PullIter[T any] struct{}

func (*PullIter[T]) Stop()
func (*PullIter[T]) Next() (T, bool)

func Pull[V any](seq Seq[V]) PullIter[V]

@gazerro
Copy link
Contributor

gazerro commented Aug 9, 2023

Based on:

Range expression                                   1st value          2nd value
function, 1 value   f  func(func(V)bool) bool      value    v  V

the range in the Concat and PrintAll functions should have only one value?

@DmitriyMV
Copy link
Contributor

I believe

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
		...
		next, stop := iter.Pull(it)
		...
}

should be

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
		...
		next, stop := iter.Pull(seq)
		...
}

@jimmyfrasche
Copy link
Member

Meta: Most of the first post is about iterators in general. It's not obviously clear on a first reading what is actually going into the iter package. Also the formatting is a little rough.

@jimmyfrasche
Copy link
Member

Given that

  • every Seq2 can be converted to a Seq by a left or right projection
  • every Seq can be extended to a Seq2 by padding it like Seq[K]Seq2[K, struct{}] or an operation like python's enumerate

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

@jimmyfrasche
Copy link
Member

I know having a separate function/method to call to get the error can lead to people forgetting to do so but I really don't see how Seq2[K, error] makes sense except in the case where each K could have an associated error. I get why it seems like it would be appealing but I don't think it's going to be nice in practice:

It's easy to ignore with k := range f.

It's not easy to not ignore without being awkward. Neither

var k Key
var err error
for k, err = range f {
  if err != nil {
    break
  }
  proc(k)
}
if err != nil {
  handle(err)
}

nor

for k, err := range f {
  if err != nil {
    handle(err)
    break
  }
  proc(k)
}

look especially readable to me.

@DmitriyMV
Copy link
Contributor

@jimmyfrasche Map fits quite nicely for this for k, err := ... pattern. Working with buffers too. Parallel execution.

@earthboundkid
Copy link
Contributor

earthboundkid commented Aug 9, 2023

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

I think the other way around, you should have a Seq2 → Seq[Pair[T1, T2]] mapping and make Seq the default most places.

Edit: I guess you should have both mappings 1 → 2 and 2 → 1, but I also think 1 will end up being the default for things like Filter and TakeWhile etc.

@earthboundkid
Copy link
Contributor

Isn't this kind of API more preferable? …

I think in most cases pull iters will be used in place, not passed around, so a pair of funcs is better than an object with methods.

@rsc rsc moved this from Incoming to Active in Proposals Aug 9, 2023
@rsc
Copy link
Contributor Author

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@AndrewHarrisSPU
Copy link

xiter isn't addressing error-flavored 2-ary forms. There has also been a significant amount of prior discussion about e.g. Next() (T, error) (Maybe this is the TLDR? #61405 (comment))

If I thought an iter.Stream type or an iter/stream package would be worth exploring, I'm wondering if we'd have a preference or any guidance about where to explore that - in the current set of proposals, in a separate proposal, or leave that unresolved for now (maybe forever), etc.?

@djordje200179
Copy link

I think that method name for iterating through sequence shouldn't be All. Because my first guess when seeing that is that it is Python-like and C# LINQ-like method for checking if all elements meet the criteria.
Also, its non-consistent to have methods All and Backwards. In that case name Forwards would be more appropriate.

@Merovius
Copy link
Contributor

@djordje200179 IMO for _, v := range it.All() makes it clear, that this is not a predicate, but an iteration. And Forwards does not work as a conventional name, as not all data structures have a dedicated direction - see e.g. map.

@leaxoy
Copy link

leaxoy commented Aug 10, 2023

Although the proposal is attractive, but to be honest, the Seq2[K, V] looks ugly and very similar to the Seq[V], every time when add iter-func, we must consider there are two to version of Seq, it's a bad thing. In the long run, two versions of iterators can quickly bloat the code.

Introduce tuple type for Seq2 maybe another choice, but the final decision rests with the go official team.

What do you think about this potential problem, @rsc.

@kardianos

This comment was marked as resolved.

@dominikh
Copy link
Member

I'd like to comment on one effect of iter.Seq being a named type instead of a type alias: it discourages developers from defining their own named iterator types, which would be useful to be able to define methods on them.

For example, I am currently experimenting with an iterator-focused API for bezier paths and transformations on them. Consider the current API:

type PathElement { /* some representation of line segments, quadratic and cubic beziers */ }

type Path []PathElement

func (p Path) All() iter.Seq[PathElement] { return slices.Values(p) }

func Translate(seq iter.Seq[PathElement], trans Vec2) iter.Seq[PathElement]
func Scale(seq iter.Seq[PathElement], trans Vec2) iter.Seq[PathElement]
func Flatten(seq iter.Seq[PathElement], accuracy float64) iter.Seq[PathElement]
func ToSVG(seq iter.Seq[PathElement]) string

which leads to usage like

var p Path
println(ToSVG(Flatten(Scale(Translate(p.All(), v), v), 1)))

or

it = Translate(p.All(), v)
it = Scale(it, v)
it = ...

An alternative API I've considered would use a named iterator type and methods, as follows:

type PathElements iter.Seq[PathElement]

func (p Path) All() PathElements { return PathElements(slices.Values(p)) }

func (PathElements) Translate(trans Vec2) PathElements
func (PathElements) Scale(trans Vec2) PathElements
// etc

for the following usage:

println(p.All().Translate(v).Scale(v).Flatten(1).ToSVG())

But a significant downside of this approach is that it requires explicit conversion from PathElements to iter.Seq[PathElements] whenever we want to pass iterators to functions like slices.Collect or from xiter. This wouldn't be the case if iter.Seq were a type alias, as my named type would implicitly convert to the unnamed type func (...)

I'm not sure, however, if this pattern is a good idea to begin with. It works well for this specific use-case, but would fall apart the moment we'd need generic methods, e.g. for a Map method. It also hides the "seq"ness of the returned value behind another name. So maybe discouraging this pattern from the get-go is desired, not unintended?

@rogpeppe
Copy link
Contributor

I'd like to comment on one effect of iter.Seq being a named type instead of a type alias

I tend to agree that iter.Seq should be an alias rather than a named type.
Generic type aliases aren't quite there yet, but is there any reason we can't move to using an alias
when they've landed without breaking compatiblity?

@golang golang unlocked this conversation Jun 17, 2024
@adonovan
Copy link
Member

Generic type aliases aren't quite there yet, but is there any reason we can't move to using an alias
when they've landed without breaking compatibility?

That would not be a compatible change: https://go.dev/play/p/x9MR1A1suez

I too am curious as to the pros and cons of Seq being named, not merely an alias. I suppose it means we can add dozens of convenience methods to it later, but I'm not sure whether that's a pro or a con. ;-)

@Merovius
Copy link
Contributor

@adonovan Sorry but your example seems to demonstrate the wrong thing? It seems to demonstrate that it's breaking moving from an alias to a named type, but not vice-versa, no?

@gingerBill
Copy link

I've tried to explain why some people are "angry" over this design being too complicated in an article, but I think the short of it is that it feels like it goes against the apparent philosophy of Go that many people believe, coupled with it being a very functional way of doing things rather than imperative.

And because of those two reasons, I think that is why people don't like the iterator stuff, even if I completely understand the design choices made. It doesn't "feel" like what Go original was to many people.

If it was me, I would just not have allowed custom iterators into Go whatsoever, but I am not on the Go team (nor do I want to be).

@aarzilli
Copy link
Contributor

@gingerBill If you want to grasp at the value of push vs. pull iterators you should try to rewrite something like this as a pull iterator.
The recursive version is straightforwardly correct (more or less), but when you go and turn it into a state machine suddenly you have to keep your own stack, and each stack entry needs to have its own queue or, if you don't want to overallocate, you have to make a stack of state machines... suddenly correctness becomes a lot less obvious.

I don't see the range-over-func statement as leaning functional, in the imperative-vs-functional paradigm (unlike the issue we are commenting on, which is not about range-over-func and is, in fact, about functional programming). If anything it enables more imperative style programming, if you wanted to call functions you'd call functions, the syntactic sugar is there to let you call break, continue and early-return, which are hardly functional programming constructs.

@Merovius
Copy link
Contributor

Merovius commented Jun 17, 2024

My goal of asking this issue to be re-opened for comments wasn't to re-start the discussion on the merits of adding range-over-func. It was to be able to discuss the actual design issues as related to this package, like the above mentioned question about aliases. Please respect that, before the issue has to be locked again.

@adonovan
Copy link
Member

[@Merovius] Sorry but your example seems to demonstrate the wrong thing? It seems to demonstrate that it's breaking moving from an alias to a named type, but not vice-versa, no?

Ah, I thought you were proposing to land an emergency change from named to alias before the imminent go1.23 release so that the change be reverted later if desired. My example was evidence that that later change would be breaking.

But it's a breaking change either way. For example, switch x.(type) { case func(...): } will no longer match if the dynamic type of x changes from an alias to a named type.

@earthboundkid
Copy link
Contributor

earthboundkid commented Jun 17, 2024

The Go 1 guarantee isn't a legally binding contract, and e.g. the text/template/parse has broken it. Could we just caveat the 1.23 release notes with a big asterisk saying we reserve the right to switch to an alias in 1.24? Generics came with an asterisk, although the asterisk was never used.

@timothy-king
Copy link
Contributor

The interesting direction for backwards compatibility is to think about is iter.Seq[V] going from named to alias.

Type assertions like x.(iter.Seq[V]) would continue to work as intended. What would change though is x.(func(yield func(V) bool) bool). The named version would not match and the alias would match.

Type switches additional could stop compiling if someone tried to do both:

	switch x.(type) {
	case iter.Seq[int]:
		...
	case func(yield func(int) bool) bool:
		...
	}

These risks are not 0. They do not seem enormous either. I don't know why one would mix func(yield func(int) bool) bool and iter.Seq[int] in an interface value and check the runtime type. (I don't want to over index on my lack of imagination though.)

We could write a vet check to discourage usage that might not be backwards compatible in 1.23, but I suspect it would not help all that often or beyond 1.24. (The vet check might require an asterisk in the 1.23 spec to be justified too.)

A not very ergonomic solution is to ship iter without Seq and Seq2 and to expand the types until type parameterized aliases are available. It solves backwards compatibility, but would probably make this package much harder to learn and more annoying to use.

@Merovius
Copy link
Contributor

Merovius commented Jun 18, 2024

@timothy-king 1. Note that you don't have to do both in a type-switch for the breakage to happen:

func F(x any) {
    if x.(func(func(int) bool)) {
        panic("b0rk")
    }
}
func main() {
    F(maps.Keys(map[int]int{}))
}
  1. Note that code currently could contain such a type-assertion so making it illegal would break that code.
  2. The idea of special-case otherwise perfectly normal code to not compile - and including that special case in the spec - just as a preparation for a potential future switch is abhorrent to me.

FWIW I'm not convinced this really need a fix. I don't, ultimately, see more reason to make iter.Seq an alias than, say, bytes.Buffer. On the contrary, I'd argue that there is less reason, because while you can define methods on a function type, those methods can't really do anything except call the function or compare it to nil. So for iter.Seq (and similar types), literally anything a method can do, you can do in a function. Or with a wrapper type in a third-party package even. There is no unexported state that could be hidden.

So I don't really see any real argument to even make iter.Seq an alias. There isn't a strong reason to make it a named type either, true, but aliases should still be the exception, not the rule, lest we forget they where introduced for a specific problem.

@earthboundkid
Copy link
Contributor

FWIW I'm not convinced this really need a fix. I don't, ultimately, see more reason to make iter.Seq an alias than, say, bytes.Buffer. On the contrary, I'd argue that there is less reason, because while you can define methods on a function type, those methods can't really do anything except call the function or compare it to nil. So for iter.Seq (and similar types), literally anything a method can do, you can do in a function. Or with a wrapper type in a third-party package even. There is no unexported state that could be hidden.

bytes.Buffer has methods, so I don't see the analogy here. Passing it as an aliased struct to a series of functions would make using io.Copy impossible, for example.

The advantage of making iter.Seq/2 an alias is pretty clear: if package A defines myseq.WithCoolChainingMethods and package B defines func DoSomething[T any](seq iter.Seq[T]) iter.Seq[T], you can't pass a myseq to DoSomething without writing DoSomething(iter.Seq[foo](seq)). If it were an alias, you could skip that needless wrapping. Is that a good enough reason to have iter.Seq be an alias? I'm not sure, but I think all things equal, it would probably be better if it were, just to leave the door open for other definitions of sequences. In the standard library, there are some bootstrapping issues with not wanting to import iter from one package to another for fear of someday creating an import cycle. Again, it's not a huge issue, but using an alias would make it more clear that iter.Seq[T] just exists to save typing out func(func(T) bool) which is sort of tedious to do.

@Merovius
Copy link
Contributor

Merovius commented Jun 18, 2024

@earthboundkid My point is that nothing of what you say seems specific to iter.Seq. I agree that bytes.Buffer is a poor comparison to make that point. But, for example, fs.WalkDirFunc post-dates aliases and we still didn't make it an alias. Or e.g. these types in the ACME package.

And I'll note that I did provide a specific reason for why methods on func types are less useful, so there is a specific reason why this concern is less important in this case, than in other cases where you'd define a type to have a name to refer to.

I think all things equal, it would probably be better if it were, just to leave the door open for other definitions of sequences.

I'll note that 1. this argument cuts both ways: Making it an alias will prevent us from adding methods to iter.Seq ourselves and 2. all things are not equal: Making it an alias requires delaying this until aliases can be parameterized.

using an alias would make it more clear that iter.Seq[T] just exists to save typing out func(func(T) bool) which is sort of tedious to do.

iter.Seq[T] as a type serves a purpose beyond func(func(T) bool): It signals that you specifically intend this to be something that can be iterated over. Just as if I where to define type Path string, it serves a purpose beyond not typing string, even if I don't add methods. It says that this is its own type, with its own distinct identity and its own intended semantics.

It is, in my opinion, okay that if you want to define a different type to give it other methods, you need to convert back-and-forth.

@Sunshine40
Copy link

Sunshine40 commented Jun 21, 2024

There's a flaw in the break => return false => terminate iteration design.

What if by writing break I just mean to pause, and would like to resume iterating the half consumed iterator later?

Like (Go Playground):

source := slices.Values([]int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9})
for v := range xiter.Limit(source, 2) {
	fmt.Println(v) // 0 1
}
i := 1
for range source {
	if i == 3 {
		break
	}
	i++
}
for v := range xiter.Limit(source, 4) {
	fmt.Println(v) // expected: 5 6 7 8, got: 0 1 2 3
}

Unlike iterators in almost every other programming language that has them, which are modeled as state machines, the "iterators" in this implementation with type signature iter.Seq[V] are actually generator functions which are stateless and can only be consumed once using for...range loops.

In other words, if you assign a "go iterator" to a variable, each time you consume it, you're facing a brand new "iterator".

Which makes the following logic hard to write:
(Rust Playground)

let source = &mut (0..);

let mut target: Vec<_> = source.skip(2).take(3).collect();
target.extend(source.skip(4).take(5));

println!("{:?}", target); // [2, 3, 4, 9, 10, 11, 12, 13]

Even if we neglect the problem mentioned above, I think this syntactic sugar:

insSeq := gen {
	for i := 0; i < 10; i++ {
		yield i
	}
}

which would be desugared into:

insSeq := func (yield func(int) bool) {
	for i := 0; i < 10; i++ {
		if !yield(i) {
			return
		}
	}
}

could come in handy.

@SOF3
Copy link

SOF3 commented Jun 21, 2024

One way to look at this is that iter.Iter is an Iterable in Java or an IntoIterator in Rust. There are no iterators in Go, only iterator-generating objects. You wouldn't expect iterating over a slice twice would give you nothing the second time.

@Merovius
Copy link
Contributor

Merovius commented Jun 21, 2024

@Sunshine40 To re-state what has been said a bunch of times now: This issue is not about re-visiting the design of iterating over functions.

But FWIW the concern that some iterators are re-startable while others are not has been mentioned in the discussion. You can either use iter.Pull or this wrapper, to get a re-startable iteration:

func Keep[T any](it iter.Seq[T]) iter.Seq[T] {
	ch := make(chan T)
	done := make(chan struct{})
	var once sync.Once
	go func() {
		defer close(ch)
		for v := range it {
			select {
			case ch <- v:
			case <-done:
				return
			}
		}
	}()
	return func(yield func(T) bool) {
		defer once.Do(func() { close(done) })
		for v := range ch {
			if !yield(v) {
				return
			}
		}
	}
}

@Merovius
Copy link
Contributor

(Also, this issue has been closed for quite a while, which makes it even less appropriate that off-topic comments keep being posted here)

@Sunshine40
Copy link

@Merovius

(Also, this issue has been closed for quite a while, which makes it even less appropriate that off-topic comments keep being posted here)

Where should I post it then?

@Sunshine40
Copy link

Sunshine40 commented Jun 21, 2024

@Merovius

You can either use iter.Pull or this wrapper, to get a re-startable iteration

Can you give an example? From what I observed, the Keep wrapper doesn't turn a "go iterator" into a resumable iterator, it just stops yielding anything once a consumer breaks.

Also, where can I find the documentation/tutorial for existing iterator adaptors in Go?

Edit:

I modified Keep's implementation and got the desired outcome:

func Keep[T any](it iter.Seq[T]) iter.Seq[T] {
	ch := make(chan T)
	go func() {
		defer close(ch)
		for v := range it {
			ch <- v
		}
	}()
	return func(yield func(T) bool) {
		for v := range ch {
			if !yield(v) {
				return
			}
		}
	}
}

The key is to remove the cleaning up mechanics and actually keep the blocked goroutine around. Forever. If we don't exhaust it.

@Merovius
Copy link
Contributor

@Sunshine40 The issue where the language feature was discussed was #61405, which also is closed, as the feature has been accepted and implemented. If you believe there is a bug with the existing implementation, the right place would be to file a bug. If you want to propose a different language change, it would be to file a language change proposal. You can do either from here. Otherwise, if you just want to talk about it, you can do that on golang-nuts, reddit, slack, your personal website or social network of choice.

You are correct that my Keep function is wrong. You can still use iter.Pull (proposed and discussed in this issue and included in the upcoming release).

@thepudds
Copy link
Contributor

Hi @Sunshine40, it's worth recalling this is still an in-flight change, and all the user-facing documentation is not yet in place. Some existing WIP documentation is already published in the iter package documentation and the WIP language spec. (Based on your comments, you might be interested for example in the "Single-Use Iterators" section, which discusses the conventional behavior for restarting iterators in Go vs. single-use iterators).

More doc will be written prior to the release, but if you see a specific problem in the WIP documentation, doc issues are always welcome.

Also, FWIW, I am about to open a new issue with some more specific suggestions for enhancing the general iteration documentation, as a follow-up to #61897 (comment) and #61897 (comment) above. (I will CC you in case you are interested in commenting there).

Separately, I think it is worth pointing out that your edited version of Keep is likely problematic and likely overkill. You might be interested in the sections on "Push functions", "Pull functions" and "Duality of push and pull functions" from the older 2022 pre-proposal discussion, which includes this for converting a pull function into a push function:

func push(next func() (V, bool)) func(func(V)bool) {
	return func(yield func(V) bool) {
		for {
			v, ok := next()
			if !ok || !yield(v) {
				break
			}
		}
	}
}

I updated your example to use that push. (TBH, that might be wrong too — it's just a quick attempt, but seems to show your desired results and does not suffer from cleanup issues, as far as my currently limited knowledge goes. 😅 Regardless, while I think hashing out the correct approach for your example is not the topic of this issue, I do think how to do things like this seems likely worthwhile to document more explicitly somewhere outside of the older proposal discussions, such as maybe an official blog or similar).

@Sunshine40
Copy link

Sunshine40 commented Jun 22, 2024

@thepudds

I think it is worth pointing out that your edited version of Keep is likely problematic and likely overkill

Yeah there might be extra cost in the goroutine + channel version, but I don't think the difference is that great.

iter.Pull actually uses a stackful coroutine internally, so the real difference is that iter.Pull avoided using channels for syncing, by manually switching stack frames instead.

The "cleanup issues" are not specific to a goroutine + channel style implementation - it's that you must expose a "cleanup handle" to be deferred manually.

I mentioned this as a shortcoming of Go's push iterators because, in other languages, no one needs to manually clean up iterators since GC/RAII handles this problem. Not even in C++/Rust which are considered lower level system PL.

As for enhancing the wrapper, here's a revised version:

type Iter[V any] struct {
	next func() (V, bool)
	stop func()
	gen  func(yield func(V) bool)
}

type PullIterator[V any] interface {
	Next() (V, bool)
}

func (it *Iter[V]) Next() (V, bool) {
	return it.next()
}

type PushIterator[V any] interface {
	Push(yield func(V) bool)
}

func (it *Iter[V]) Push(yield func(V) bool) {
	it.gen(yield)
}

func (it *Iter[V]) Close() error {
	it.stop()
	return nil
}

type StatefulIterator[V any] interface {
	PullIterator[V]
	PushIterator[V]
	io.Closer
}

func Stateful[V any](seq iter.Seq[V]) StatefulIterator[V] {
	next, stop := iter.Pull(seq)
	it := Iter[V]{
		next: next,
		stop: stop,
		gen: func(yield func(V) bool) {
			for {
				v, ok := next()
				if !ok || !yield(v) {
					return
				}
			}
		},
	}
	runtime.SetFinalizer(&it, (*Iter[V]).Close)
	return &it
}

With runtime.SetFinalizer, there's no need for the manual cleanup, GC would take care of it. (Here's the proof)

You could still defer it.Close() to clean it up as soon as possible, but for most iterators whose related resources are only managed memory, doing these kind of manual cleanup would render the Garbage Collector pointless.

@ianlancetaylor
Copy link
Member

Using iterators with finalizers and a Stop method was discussed extensively over at #54245, and we decided against it. See, for example, the discussion thread at #54245 (comment) .

@drew-512
Copy link

drew-512 commented Jun 23, 2024

Curious if beloved fellow gophers here had considered the elegance of Odin iterators but with a new go keyword, iterator, perhaps like:

type Foo[T any] iterator {
    Start(count int)
    Next(current int, element *T) bool
    Index() int

    idx   int
    state []int64
}

Inspired from https://www.gingerbill.org/article/2024/06/17/go-iterator-design/

@earthboundkid
Copy link
Contributor

That's the same as #54245, which was rejected for good reasons: writing a "pull iterator" is almost always harder than writing a "push iterator" and leaves you with a problem of how to deal with clean up code. Having a push iterator-based design has the tradeoff that you have a problem of how to get pull iterators for the cases where it's more convenient to consume a few values at a time, but the iter package solves that with iter.Pull, which converts a push iterator in a pull iterator.

I appreciate that the closure based design of 1.23 iterators is a little noodle baking at first, but this was all debated and decided a long time ago.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted release-blocker
Projects
Status: Accepted
Status: Done
Development

No branches or pull requests