-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterator: remove rewind
. Implement cycle
by storing elements in an array
#7440
Iterator: remove rewind
. Implement cycle
by storing elements in an array
#7440
Conversation
https://github.com/search?q=rewind+-filename%3Aiterator_spec.cr&l=Crystal&type=Code There's a bit of 'em, although it's debatable how much of the results are meaningful. |
Thanks Sija. All the result I saw seem to be IO#rewind or implementations of rewind (sometimes called in initialize, but that's easy to fix), but none of them seem to be actual usages of the rewind method. This further confirms my belief that rewind is there just because it needs to be there, not because it's actually useful (other than for implementing cycle in a different way). |
Let's wait Brian before merging this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. The implementation is more composable, there is less code and the results are more consistent despite the nature of the iterator. 💯
A final note :-) |
I was using this to get the size of an iterator without having to allocate an array for the elements.
Can this be done in any other way? |
@baelter You could see if the specific iterator support Something like: if itr.responds_to?(:rewind)
# acurate size hint
size = itr.size
itr.rewind
a = Array(T).new(size)
else
# no size hint
a = Array(T).new
end
# use `a` and and `itr` But depending on the use case just collection the items in the array might be enough. If you definitely needs the whole set of data there is probably no improvement by using an iterator. |
I don't need all the items, so I would prefer to not have to allocate memory for new array for them only to get the size of the collection. I suppose I could iterate everything and count and only emit objects where needed.
|
Maybe you could describe your entire use case? It seems pretty weird to me. |
@asterite btw, I had a similar issue recently, could you please take a look at my approach in Sija/retriable.cr@8cd9986 - does it make sense / is there a room for improvement? |
Are the items there like thousands or millions? Is it an infinite list? Because if not, will the code work equally well with an Array? I think all of this might be a case of premature optimization. |
Use case is a huge collection, potentially millions, of largeish objects that are being exposed in a paged json api. I need the have the totalt count but only give a small subset to the json builder. I want to only use an iterator all the way to avoid allocating an array for the entire collection. I've found a working solution in iterating the entire collection and counting the total while only building json for the page window. Code was nicer with |
One way to do it is: def results_and_total(input)
iter = input.each
results = iter.first(10).to_a
total = results.size + iter.size
p! input, results, total
end
results_and_total(1..100)
results_and_total(1..8) That is, you take what you need, and then you get the size of the remaining stuff. This isn't longer code than using |
One thing I miss with I previously had code that cycled a list of languages from inside an iterator of documents and generated each document in each language. This means that each call to the iterator returned something like this:
(The nested iterator is streaming an unknown amount of documents that are way too large to keep in memory, thus the use of the Iterator protocol.) In this specific case I could just replace This only works if you have all the state you need to reinitialise the iterator that needs to be cycled, while previously a reference to the iterator itself was sufficient. I think overall the implementation of With Here's an example of the e = [1, 2, 3, 4, 5, 6].each
e.first(3).to_a => [1, 2, 3]
e.first(3).to_a => [1, 2, 3] Compared to Crystal: e = [1, 2, 3, 4, 5, 6].each
e.first(3).to_a => [1, 2, 3]
e.first(3).to_a => [4, 5, 6] This difference has bitten me a few times when porting over code from Ruby to Crystal. This specific case is also more surprising since the docs for
But actually it returns the |
@felixbuenemann I agree that Ruby seems to be more intuitive. I have no idea how they implement that, though. Also, try this in Ruby: # with stdin having one \n two \n three \n four \n five
a = STDIN.each_line
p a.next # => "one"
p a.next # => "two"
p a.first(1) # => ["three"]
p a.first(2) # => ["four", "five"] Now with an array: a = %w(one two three four five).each
p a.next # => "one"
p a.next # => "two"
p a.first(1) # => ["one"]
p a.first(2) # => ["one", "two"] Now do the same in Crystal (adding I think Crystal is a bit more consistent here. |
Yeah, I agree Crystal is more consistent here, but Ruby is more convenient. I think in Ruby it would be nicer if it threw an exception if an Enumerable method is called and the underlying Enumerator cannot be rewound. Currently it throws an Exception when calling |
I think |
Note that first in Ruby always returns from the beginning of the enumeration, unlike Crystal. |
Internally Ruby's I'm not sure that overloading |
I agree, take is a better name here. However, while trying to make a PR to rename it, I found that iterator inherits enumerable, which means it inherits the Iterator then overrides enumerable to make #first return an iterator, which changes the return type of Iterator really shouldn't be enumerable, the enumerable class makes assumptions that the enumberable |
This is a controversial PR. It removes the
#rewind
method from all existingIterator
s.The reasons for this PR are explained in this comment but I'll explain it again here.
The reason why I originally added
Iterator#rewind
was:Enumerable#cycle
Later on I found our that
cycle
was actually implemented in Ruby by building an internal Array of the elements in the Enumerable.I think the advantages of implementing
cycle
like this, and removingrewind
are:Iterator.of
)STDIN.each_line
, which doesn't give an error, we can't detect the error and it simply works wrong)Iterator
is simple: just implementnext
, no need to think about rewindingThis is a breaking change. However, I seriously doubt anyone out there is using
rewind
onIterator
(but maybe they are usingcycle
). I'm not sure, of course, but in any case I'd prefer to remove it.