-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Implement Ruby's Enumerator Class #6357
Comments
Yup, it's been discussed in the past: https://play.crystal-lang.org/#/r/aob The main issue is that it's super slow. Like, 1000x times slower than Well, at least that was the last time I benchmarked it, maybe it should be benchmarked again. By the way, fibonacci can be currently implemented with a = b = 1
fib = Iterator.of do
a.tap { a, b = b, a + b }
end
p fib.first(10).to_a https://play.crystal-lang.org/#/r/4gz8 But, of course, more complex enumerators which yield in the middle of a method and continue from there can't be implemented with |
maybe similar to #4438 ? @felixbuenemann fun fact: your implementation reminds me #4198 ;) |
@asterite Yeah, my actual use case is much more complicated. I have a nested while loop with It can certainly implemented with a simple yield without an iterator, but it makes it harder to do stuff like In my case the performance of the enumerator shouldn't really matter, since most of the processing is in libxml2 anyways. I'm not partial to the ruby specific implementation, but some form of a generator abstraction that doesn't have @bew Thanks for the link, I searched the existing issues for "enumerator" but not for "generator". |
@felixbuenemann You can implement I just tried to implement class Enumerator(T)
include Iterator(T)
struct Yielder(T)
getter! enumerator
def initialize(@enumerator : Enumerator(T))
end
def <<(value : T)
@enumerator.value = value
end
end
getter! fiber : Fiber
getter! yielder : Yielder(T)
def initialize(&@block : Yielder(T) ->)
@current_fiber = Fiber.current
@yielder = Yielder(T).new(self)
@fiber = Fiber.new do
@block.call(@yielder.not_nil!)
end
end
@value : T?
def value=(@value : T)
@current_fiber.resume
end
def next
# This is probably not correct
@current_fiber = Fiber.current
stop unless fiber.alive?
fiber.resume
fiber.alive? ? @value.as(T) : stop
end
end
fib = Enumerator(Int32).new do |y|
3.times do |i|
y << i
end
end
p fib.next
p fib.next
p fib.next That is, using some sort of cooperative fibers, instead of the non-cooperative fibers we have now. But that But still, doing a simple benchmark of that implementation using Fibers vs Iterator gives 0.37s for Enumerator vs. 0.005s for Iterator. It's a dramatic difference. The code is something like this: e = Enumerator.new do |y|
loop do
y << 1
end
end
time = Time.now
10_000_000.times do
e.next
end
puts Time.now - time But then in Ruby that same example takes 12 seconds... so maybe it's not that bad? :-) If we can figure out how to implement it with fibers and context switching, without channels, it might be a good fit for the standard library, even if it's slower than a manually written |
Oh, and in the example above I defined |
(using the implementation in the first comment in this issue, using channels, the benchmark takes 0.6 seconds) |
Hey, I was also looking for a way to find out if a fiber was still running, because at first I tried to save the fiber from spawn to an instance variable and then restart the fiber if needed during So I think adding such a flag to I'll see if I can get your improved version running with my code. Can you post the patch you did to the |
It's just setting @alive = true in Fiber, then in run ensure it's set to false. |
@asterite I tried you implementation, but it just hangs. Here's the test suite I used (quick conversion from minitest.cr to crystal spec): require "spec"
require "../src/enumerator"
describe Enumerator do
describe "#initialize" do
it "returns an instance of Enumerator" do
instance = Enumerator(String).new {}
instance.should be_a(Enumerator(String))
end
end
describe "#next" do
it "returns the values in the order they were yielded" do
enumerator = Enumerator(String?).new do |yielder|
yielder << "hello"
yielder << nil
yielder << "world"
end
enumerator.next.should eq("hello")
enumerator.next.should be_nil
enumerator.next.should eq("world")
enumerator.next.should be_a(Iterator::Stop)
end
end
describe "#peek" do
it "peeks at the next value without affecting next" do
next # Not implemented
enumerator = Enumerator(String?).new do |yielder|
yielder << "hello"
yielder << "world"
end
enumerator.peek.should eq("hello")
enumerator.peek.should eq("hello")
enumerator.next.should eq("hello")
enumerator.peek.should eq("world")
enumerator.peek.should eq("world")
enumerator.next.should eq("world")
enumerator.peek.should be_a(Iterator::Stop)
enumerator.next.should be_a(Iterator::Stop)
enumerator.rewind
enumerator.peek.should eq("hello")
end
end
describe "#each" do
it "iterates the yielded values" do
enumerator = Enumerator(String).new do |yielder|
yielder << "hello"
yielder << "world"
end
values = [] of String
enumerator.each { |value| values << value }
values.should eq(["hello", "world"])
end
end
describe "Enumerable" do
it "responds to Enumerable methods" do
enumerator = Enumerator(String).new do |yielder|
yielder << "hello"
yielder << "world"
end
enumerator.to_a.should eq(["hello", "world"])
end
end
describe "#rewind" do
it "rewinds the iterator" do
next # Not implemented
enumerator = Enumerator(String).new do |yielder|
yielder << "hello"
yielder << "world"
end
# We rewind first to check that block isn't called multiple
# times if it was not enumerated before calling #rewind.
enumerator.rewind.should be_a(Enumerator(String))
enumerator.to_a.should eq(["hello", "world"])
# Must be rewound, should be empty.
enumerator.to_a.should eq([] of String)
enumerator.rewind.should be_a(Enumerator(String))
enumerator.to_a.should eq(["hello", "world"])
# And make sure rewind properly resets state.
enumerator.to_a.should eq([] of String)
enumerator.rewind.should be_a(Enumerator(String))
enumerator.rewind.should be_a(Enumerator(String))
enumerator.to_a.should eq(["hello", "world"])
end
end
describe "Nil type support" do
it "works when the type is Nil" do
enumerator = Enumerator(Nil).new do |yielder|
yielder << nil
yielder << nil
end
enumerator.to_a.should eq([nil, nil])
end
end
describe "yielder chaining" do
it "allows to chain yielder calls" do
enumerator = Enumerator(String).new do |yielder|
yielder << "hello" << "world"
end
enumerator.to_a.should eq(["hello", "world"])
end
end
describe "Fibonacci enumerator" do
it "generates the first 10 numbers in the Fibonacci sequence" do
# Example adapted from Ruby Enumerator docs.
fib = Enumerator(UInt32).new do |y|
a = b = 1_u32
loop do
y << a
a, b = b, a + b
end
end
fib.first(10).to_a.should eq([1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
end
end
end |
I know it hangs. I'm not sure cooperative and non-cooperative fibers can coexist. I believe maybe @waj will know how to implement this, but I don't think he'll have time. |
@asterite I got it working! The problem with your implementation was that it didin't resume the I came up with the following implementation based on you code, but using a class Enumerator(T)
include Iterator(T)
def initialize(&@block : Yielder(T) ->)
@current_fiber = Fiber.current
@stack = Deque(T).new(1)
@yielder = Yielder(T).new(self)
run
end
def next
fetch_next
@stack.shift { stop }
end
def rewind
run if @done
self
end
def peek
fetch_next
if @stack.empty?
stop
else
@stack[0]
end
end
protected def <<(value : T)
@stack.push(value)
@current_fiber.resume
end
private def fetch_next : Nil
@current_fiber = Fiber.current
@done || @fiber.not_nil!.resume if @stack.empty?
end
private def run
@stack.clear
@done = false
@fiber = Fiber.new do
begin
@block.call(@yielder.not_nil!)
ensure
@done = true
@current_fiber.resume
end
end
end
private struct Yielder(T)
def initialize(@enumerator : Enumerator(T))
end
def <<(value : T)
@enumerator << value
self
end
end
end Results:
Let me know what you think! |
Awesome! If it's just 10 times slower than an Iterator I think it's a really good candidate for the std. By the way, this doesn't work as expected: e = Enumerator(Int32).new do |y|
y << 1
end
spawn do
p e.next
end
Fiber.yield The problem is that the current fiber is saved in I'd say let's go for this, if it's solid. Sometimes writing an Iterator is very boring, and the 10x slowness of some parts of the code will probably not be the bottleneck of a program, and this boosts productivity a lot. |
Thanks for the explanation. I was wondering why you were setting I think I'll have some time this weekend to package it all up into a PR and add some polish to the specs. |
I've updated the code above to set |
By the way, maybe a Then, e = Enumerator.new do |y|
y << 1
y << 2
y << 3
end
p e.next # => 1
p e.next # => 2
e.rewind
p e.next # => 1
p e.next # => 2 But I'm not sure how we would implement it. So maybe for now |
There's another problem. This code doesn't free the memory: loop do
e1 = Enumerator(Int32).new do |y|
3.times do |i|
y << i
end
end
e1.next
end Fibers are created and maintained in a stack. I think these particular fibers should not be placed in that stack, so when they are not referenced anymore they are GCed. The source code in |
Peek is useful, when you don't yet know what's the next value is and you want to do different stuff based on that. I agree that it would make sense to have this for all Iterators. Regarding the I didn't know that you can check if an instance variable is initialized. Can you post an example? |
The fiber linked list is required for the GC, you can't just remove a fiber from it: https://github.com/crystal-lang/crystal/blob/master/src/fiber.cr#L325 You need to do this with |
The leak described by @asterite is actually the same problem as I plan to fix it by raising a The user could still do something stupid like rescuing from that exception inside the yielder block, but we could at least raise an exception in that case, since it can be detected by checking if the yielder transitioned to done after being told to stop by the enumerator. |
@RX14 But the GC will find the fiber reference in that linked list, thus never try to collect it, and will never invoke We can't just keep a list of active stack roots, instead of fibers, because the GC could then collect all fibers (unless I'm wrong?). Maybe we need 2 lists: one for "attached" fibers (default), and one to keep the stack roots of "detached" fibers? Once the fiber is no longer referenced, the GC collects it and invokes its finalizer that should return the stack root back to the pool. |
@ysbaddaden I think he meant a |
You just need to make sure the fiber doesn't have any references to the enumerator, which is... difficult. You'll probably need to employ |
Why would the fiber have a reference to the Enumerator? |
Probably because the |
I'm also not sure what happens if you call an |
@asterite but all fibers are by definition cooperative because there's no method to preempt them. I'm not sure what you mean. I don't see much special about the way this fiber is used or scheduled. |
@RX14 In Ruby, Anyway, |
the fiber has the reference to the yielder through the closure, the weakref should be from the yielder to the enumerator, surely? And to me, anything that's not preemptible is cooperative. Making crystal's fibers cooperative, just not the same implementation as ruby. It's just a terminology thing though, I get what you mean. |
@RX14 The fiber is in the linked list, so it can't be GCed. Then the fiber has a reference to the Enumerator through the closure, which can't be turned into a WeakRef. So I can't see how to implement this. I guess it would be a nice to have, but it seems it's kind of impossible to implement this right now... |
@asterite why does the fiber have a reference to the enumerator and not just the yielder? |
To make discussion of the implementation easier, I pushed a first version in #6385. Let me know what you think! I would also be grateful for a short example that can only be implemented with I tried to describe the use cases for both |
@felixbuenemann There's no such example. Anything can be implemented with |
@asterite With your patch, this ensure block won't execute: loop do
e = Enumerator(Int32).new do |y|
y << 1
ensure
puts "ensure"
end
e.next
sleep 0.001
end Such ensures are often used for cleaning up open resource handled, such as closing files, network connections etc. The enumerating fiber shouldn't just be terminated without unwinding it's stack. It should work similarly like an exception was raised in |
@straight-shoota I just tried it in Ruby, and the For reference: loop do
e = Enumerator.new do |y|
begin
y << 1
ensure
puts "BYE"
end
end
e.next
end Or: loop do
e = Enumerator.new do |y|
begin
y << 1
ensure
puts "BYE"
end
end
end |
Well, someone could create an issue in Ruby and see what they answer :-) |
There's at least already an issue for Crystal: #3561 =) |
Not gonna happen. There's no way to do it in Go, so I doubt we'll manage to do it. |
After all the discussion, I was unsure how useful an addition to the Crystal language this would be. Since most simple examples are not useful for that I wen't ahead and converted the I must say looking at the code side-by-side I much prefer the Enumerator version – it is much easier to comprehend. It also took me about 4.5 hours to convert the code until the tests were green and another 2.5 hours to reimplement the early abort logic in the underlying XML pull parser to get the Iterator implementation to the same performance level as the Enumerator one (these took maybe 15 minutes to implement initially in the Enumerator version). Here's an example of one of the Enumerators I replaced. The other one not shown here is powering the Enumerator Version: def documents(languages = ["de"])
Enumerator(Document).new do |documents|
attributes = {} of String => Attribute
classification_groups = {} of String => ClassificationGroup
commands(["Attribute", "ClassificationGroup", "Product"]).each do |reader|
case reader.name
when "Attribute"
attribute = AttributeParser.new(reader, languages)
attributes[attribute.id] = attribute.parse
when "ClassificationGroup"
classification_group = ClassificationGroupParser.new(reader, languages)
classification_groups[classification_group.id] = classification_group.parse
when "Product"
product = ProductParser.new(reader, languages).parse
languages.each do |language|
document = DocumentBuilder.new(
attributes,
classification_groups,
language,
product
).build
documents << document
end
end
end
end
end Iterator Version: def documents(languages = ["de"])
return Iterator.of(Iterator.stop) if languages.empty?
attributes = {} of String => Attribute
classification_groups = {} of String => ClassificationGroup
_commands = commands(["Attribute", "ClassificationGroup", "Product"])
_commands.each do |reader|
case reader.name
when "Attribute"
attribute = AttributeParser.new(reader, languages)
attributes[attribute.id] = attribute.parse
when "ClassificationGroup"
classification_group = ClassificationGroupParser.new(reader, languages)
classification_groups[classification_group.id] = classification_group.parse
when "Product"
product = ProductParser.new(reader, languages).parse
products = Iterator.of do
reader = _commands.next
if reader.is_a?(Iterator::Stop)
Iterator.stop
else
ProductParser.new(reader.as(XML::Reader), languages).parse
end
end
_languages = languages.each
return Iterator.of do
language = _languages.next
if language.is_a?(Iterator::Stop)
language = _languages.rewind.next
product = products.next
end
if product.is_a?(Iterator::Stop)
Iterator.stop
else
DocumentBuilder.new(
attributes,
classification_groups,
language.as(String),
product.as(Product)
).build
end
end
end
end
return Iterator.of(Iterator.stop)
end As can be seen the iterator version involves several internal So overall, I think it is very worthwhile to have this in the standard library, if only for the productivity gains (and me keeping my sanity ;-). |
You could make your own Enumerator class perhaps? Make it a library? |
A small note that unless one needs |
@eregon Are you sure about that? top_ten = enumerator.take(10)
top_twenty = top_ten + enumerator.take(10) In essence, all Only if all you want to do is a single (partial) iteration, there is no need to keep track of iteration status and thus no need for a fiber. |
Yes I'm sure: enumerator = Enumerator.new do |y|
i = 0
loop {
y << i += 1
}
end
p enumerator.take(10) # => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
p enumerator.take(10) # => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] So the "suspend" and the need for a Fiber is only for next/peek (and next_values/peek_values) which are stateful, all other Enumerator/Enumerable methods are not stateful and always "restart from the start" of the Enumerator. |
I think the difference is that In Crystal we'd like |
p Fiber.current # => #<Fiber:0x00007fb8fc423328 (resumed)>
enumerator = Enumerator.new do |y|
i = 0
loop {
p Fiber.current # => #<Fiber:0x00007fb8fc423328 (resumed)>
y << i += 1
}
end.lazy
p enumerator.take(3) # => #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: #<Enumerator::Generator:0x00007fb8fc3ecee0>:each>>:take(3)>
p enumerator.take(3).to_a # => [1, 2, 3] |
@eregon So how is this implemented in Ruby if it's not using fibers under the hood, or having the VM suspend the execution at some places? |
Basically it just remembers what it has to do in the Enumerator::Lazy object(s) and then finally eager evaluate and possibly using (non-local) |
@eregon Would you mind trying to implement this for Crystal? I'm almost sure it's impossible. This is how far I got: class Enumerator(T)
class Yielder(T)
def <<(value : T)
end
end
def initialize(&@block : Yielder(T) ->)
end
def each
yielder = Yielder(T).new
# @block.call(yielder)
end
end
e = Enumerator(Int32).new do |y|
a = b = 1
loop do
y << a
a, b = b, a + b
end
end
e.each do |x|
puts x
break if x > 10
end How do you implement |
This works, I wrote it in Ruby because it's easier for me. class MyEnumerator
class Yielder
def initialize(&block)
@block = block
end
def << value
@block.call(value)
end
end
def initialize(&block)
@block = block
end
def each(&block)
yielder = Yielder.new(&block)
@block.call(yielder)
end
end
e = MyEnumerator.new do |y|
a = b = 1
loop do
y << a
a, b = b, a + b
end
end
e.each do |x|
p x
break if x > 10
end |
Nice, thanks! This works: class Enumerator(T)
include Enumerable(T)
class Yielder(T)
def initialize(@block : T ->)
end
def <<(value : T)
@block.call(value)
end
end
def initialize(&@block : Yielder(T) ->)
end
def each(&block : T ->)
yielder = Yielder(T).new(block)
@block.call(yielder)
end
end
e = Enumerator(Int32).new do |y|
a = b = 1
10.times do
y << a
a, b = b, a + b
end
end
e.each do |x|
p x
end I changed the loop to But then we can't make So in the end |
Just to clarify: In Crystal there is a clear separation between captured blocks (which becomes procs and can be stored) and yielded blocks (which are inlined into the yielding method and don't exist as a separate item at runtime). |
That said, when we were originally thinking about Crystal we really wanted to be able to But we never got to do it. It's probably a lot of work. |
Does non-local return exist in Crystal? That would be a workaround, just wrapping the Regarding non-local break/return they are typically implemented with an exception, unwind or a similar mechanism (and some ID to know how far to unwind), and that doesn't need to check the return value on every return.
I don't know Crystal, but Ruby Enumerable only depends on |
Would this work maybe? class Enumerator(T)
def each(&block : T ->)
yielder = Yielder(T).new(Proc.new { |x| yield x })
@block.call(yielder)
end
end |
No. Or at least not yet.
We could eventually do that in Crystal! See #6357 (comment) . But I think exceptions are slow so I'm not sure I'd like to introduce slow features.
No because you can't To be honest, I'm not exactly happy about the |
Rust is adding |
I really like Ruby's Enumerator class and I think it would be great to have something similar in Crystal.
The
Enumerator
class can be used to implement generators.Here's an example from the Ruby docs:
The
y
block parameter in the example is called the yielder and the iterator returned fromnew
blocks until the yielder receives a value or the block ends.Since I was porting some Ruby code that depended on
Enumerator
to Crystal, I wen't ahead and implemented my own genericEnumerator
class using theIterator
module andChannel
to implement the concurrent behavior:The Fibonacci generator could now be implemented similar to the Ruby example:
Please let me know if you think this is a useful addition to the Crystal Standard Library.
If you do, I would be happy to polish the code up into a PR along with documentation and specs.
I've also implemented
Enumerator#peek
locally, which allows to look at the next iterator value without moving it forward by buffering the value on peek, but left it out of the example for simplicity.The text was updated successfully, but these errors were encountered: