Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved the use of Extend in a few Iterator methods #65219

Closed
wants to merge 7 commits into from

Conversation

Lucretiel
Copy link
Contributor

  • Reimplemented Iterator::partition to use a full iterator extend on one of the two collections, rather than pushing to both of them one element at a time.
  • Reimplemented Iterator::unzip to use a full iterator extend on one of the two collections, rather than pushing to both of them at the same time

- Reimplemented `Iterator::partition` to use a full iterator `extend` on one of the two collections, rather than pushing to both of them one element at a time.
- Reimplemented `Iterator::unzip` to use a full iterator `extend` on one of the two collections, rather than pushing to both of them at the same time
@rust-highfive
Copy link
Collaborator

r? @cramertj

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 8, 2019
@cramertj
Copy link
Member

cramertj commented Oct 8, 2019

r? @cuviper

@rust-highfive rust-highfive assigned cuviper and unassigned cramertj Oct 8, 2019
Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You call this an improvement, but do you have anything to quantify that?

In theory I could see this helping performance if the left container type implements a folding extend, combined with an iterator that benefits from folds like Chain or FlatMap. It would be nice to demonstrate this with a benchmark though.

src/libcore/iter/traits/iterator.rs Show resolved Hide resolved
src/libcore/iter/traits/iterator.rs Show resolved Hide resolved
@Lucretiel
Copy link
Contributor Author

You call this an improvement, but do you have anything to quantify that?

In theory I could see this helping performance if the left container type implements a folding extend, combined with an iterator that benefits from folds like Chain or FlatMap. It would be nice to demonstrate this with a benchmark though.

No; this is just based on published best-practice for iterators. In particular I'm not sure about the partitioner, because the main benefit of using lhs.extend is pre-allocation, which only works when you have a known length.

However, it should be an improvement in most cases on unzip, because that doesn't change the length of the iterator and will therefore give containers the opportunity to correctly pre-allocate storage

@cuviper
Copy link
Member

cuviper commented Oct 8, 2019

OK, so besides allowing internal iteration (folding), allocating for a good size_hint probably does help, at least for the left side that you're extending in full. Still, it would be nice to measure this -- can you add something to src/libcore/benches/iter.rs?

It would be nice if Extend had a method to feed a size hint, then we could pass that to both sides. Maybe we could add that with a default no-op implementation.

@cuviper
Copy link
Member

cuviper commented Oct 8, 2019

Hmm, you could potentially pre-allocate both collections with an empty iterator that lies about its size, something like:

pub struct SizeHint<T> {
    hint: (usize, Option<usize>),
    item: PhantomData<T>,
}

impl<T> Iterator for SizeHint<T> {
    type Item = T;

    fn next(&mut self) -> Option<T> {
        None
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        self.hint
    }
}

@ollie27
Copy link
Member

ollie27 commented Oct 8, 2019

Hmm, you could potentially pre-allocate both collections with an empty iterator that lies about its size, something like:

Let's not bring that back (#34155). Lying about size_hint is a contract violation so isn't an option.

So this PR brings up an interesting question about whether implementations of Extend are allowed to stop iterating before the iterator is fully consumed. For example ArrayVec stops after it's full. Implementations like that make this PR a breaking change (playground). Personally I would argue that ArrayVec is at fault here but there doesn't seem to be anything in the Extend docs that requires implementors to fully consume the iterator.

@cuviper
Copy link
Member

cuviper commented Oct 8, 2019

Let's not bring that back (#34155).

Ha! I am amused how close that was to what I just wrote.

Good point about early-terminating Extend too.

@Lucretiel
Copy link
Contributor Author

It would be nice if Extend had a method to feed a size hint, then we could pass that to both sides. Maybe we could add that with a default no-op implementation.

That's... a good idea. I was thinking that I was annoyed that there's no way to hint extend, and was thinking of (and rejecting) various transducer-based ideas for how to fix it, but that actually makes way more sense. I'll see about adding & documenting something like that as a separate PR.

@Lucretiel
Copy link
Contributor Author

Lucretiel commented Oct 8, 2019

So this PR brings up an interesting question about whether implementations of Extend are allowed to stop iterating before the iterator is fully consumed. For example ArrayVec stops after it's full. Implementations like that make this PR a breaking change (playground). Personally I would argue that ArrayVec is at fault here but there doesn't seem to be anything in the Extend docs that requires implementors to fully consume the iterator.

This is a good point. I suppose it's a case of being under-specified (similar to #52279)

@Lucretiel
Copy link
Contributor Author

There's some precedent in the standard library for iterator adapters doing extra work to emulate ordinary iteration for the purpose of preserving observable side effects (for instance, this part of the iter::Zip implementation). I think the question of formally specifying what Extend should do can be deferred for now, and I can update this implementation to simply extend rhs after lhs is done, to preserve behavioral compatibility?

@Lucretiel
Copy link
Contributor Author

Lucretiel commented Oct 9, 2019

Follow-up: regardless of the behavior of Extend, unzip does promise to "fully consume" the iterator:

unzip() consumes an entire iterator of pairs, producing two collections: one from the left elements of the pairs, and one from the right elements. [source]

So I'll make an update that fulfills that contract even if extend doesn't.

@rust-highfive
Copy link
Collaborator

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-10-09T00:18:40.2167719Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-10-09T00:18:40.2362209Z ##[command]git config gc.auto 0
2019-10-09T00:18:40.2455182Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-10-09T00:18:40.2518259Z ##[command]git config --get-all http.proxy
2019-10-09T00:18:40.2699610Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/65219/merge:refs/remotes/pull/65219/merge
---
2019-10-09T00:24:47.9619518Z Checking rustdoc artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2019-10-09T00:24:47.9637644Z Checking std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2019-10-09T00:24:48.3735587Z    Compiling cc v1.0.35
2019-10-09T00:24:48.3735949Z     Checking core v0.0.0 (/checkout/src/libcore)
2019-10-09T00:24:56.0672165Z error[E0271]: type mismatch resolving `<&mut Self as iter::traits::collect::IntoIterator>::Item == B`
2019-10-09T00:24:56.0672664Z     --> src/libcore/iter/traits/iterator.rs:2398:13
2019-10-09T00:24:56.0672902Z      |
2019-10-09T00:24:56.0673203Z 2398 |         rhs.extend(&mut self);
2019-10-09T00:24:56.0673754Z      |
2019-10-09T00:24:56.0674060Z      = note: expected type `(A, B)`
2019-10-09T00:24:56.0674300Z                 found type `B`
2019-10-09T00:24:56.0674591Z      = help: type parameters must be constrained to match other types
2019-10-09T00:24:56.0674591Z      = help: type parameters must be constrained to match other types
2019-10-09T00:24:56.0674957Z      = note: for more information, visit https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters
2019-10-09T00:24:56.0728246Z error[E0618]: expected function, found `()`
2019-10-09T00:24:56.0728591Z     --> src/libcore/iter/traits/iterator.rs:2401:9
2019-10-09T00:24:56.0728815Z      |
2019-10-09T00:24:56.0728815Z      |
2019-10-09T00:24:56.0729115Z 2401 |           self.for_each(#[inline(always)] |_| {})
2019-10-09T00:24:56.0729466Z      |  _________-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2019-10-09T00:24:56.0730027Z 2403 | |         (lhs, rhs)
2019-10-09T00:24:56.0730376Z      | |__________________- call expression requires function
2019-10-09T00:24:56.0730420Z 
2019-10-09T00:24:56.1833245Z    Compiling build_helper v0.1.0 (/checkout/src/build_helper)
---
2019-10-09T00:25:00.9921014Z == clock drift check ==
2019-10-09T00:25:00.9921067Z   local time: Wed Oct  9 00:25:00 UTC 2019
2019-10-09T00:25:01.0763937Z   network time: Wed, 09 Oct 2019 00:25:01 GMT
2019-10-09T00:25:01.0768615Z == end clock drift check ==
2019-10-09T00:25:14.6127189Z ##[error]Bash exited with code '1'.
2019-10-09T00:25:14.6171661Z ##[section]Starting: Checkout
2019-10-09T00:25:14.6173494Z ==============================================================================
2019-10-09T00:25:14.6173566Z Task         : Get sources
2019-10-09T00:25:14.6173613Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Lucretiel
Copy link
Contributor Author

As a matter of precedent, tuples supply an Extend implementation that is literally a no-op but nonetheless fully consumes the iterator: https://doc.rust-lang.org/src/core/iter/traits/collect.rs.html#346-350

@HeroicKatora
Copy link
Contributor

HeroicKatora commented Oct 9, 2019

So this PR brings up an interesting question about whether implementations of Extend are allowed to stop iterating before the iterator is fully consumed.

It is more useful if it does not require the iterator to be fully consumed. This leaves handling of 'overflow' to the caller. Since the definition of Extend requires being polymorphic over all possible iterators, it is always possible and useful to pass &mut iter instead of iter to inspect iter after extend returns or in this case consume the rest of it. If extend were required to fully consume the iterator then this would be much less useful.

Also, droping instances is most often associated with giving up some resources. In resource constrained implementations, such as ArrayVec, there is no possible place available to store the surplus Ts and thus no other choice but to drop them. That should be avoided, not mandatory.

It's of course perfectly fine for some utility methods such as Iterator::unzip to provide stronger guarantees such as consuming the full iterator regardless.

@cuviper
Copy link
Member

cuviper commented Oct 9, 2019

tuples supply an Extend implementation that is literally a no-op

To be clear, that's just for the unit (), which is only a degenerate sort of tuple.

Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is getting complicated, which increases the need for benchmarks to justify it. The change to extending through &mut self, allowing a followup to consume the remainder, also means they can't take advantage of internal iteration anymore, so we're only left with possible pre-allocation benefits.

I think the partition implementation also needs to fully consume the same way.

src/libcore/iter/traits/iterator.rs Outdated Show resolved Hide resolved
src/libcore/iter/traits/iterator.rs Outdated Show resolved Hide resolved
@Lucretiel
Copy link
Contributor Author

Lucretiel commented Oct 9, 2019

The change to extending through &mut self, allowing a followup to consume the remainder, also means they can't take advantage of internal iteration anymore, so we're only left with possible pre-allocation benefits.

I was going to ask about this part anyway– is there any reason that impl Iterator for &mut T: Iterator doesn't implement a forwarding try_fold? It seems like that would make sense, since we encourage iterator authors to implement their own try_fold as the basis for other internal iteration

EDIT: See follow-up, here: #65219 (comment)

@Lucretiel
Copy link
Contributor Author

Lucretiel commented Oct 9, 2019

This code is getting complicated, which increases the need for benchmarks to justify it.

I'm working on a benchmark, as well as a regression test for the changed behavior demonstrated in #65219 (comment)

- Simplified unzip implementation (removed size_hint checks)
- Added forwarding try_fold implementation to &mut T: Iterator
@Lucretiel
Copy link
Contributor Author

For reference: I looked into implementing try_fold for &mut I, but it appears it's been done before, and is currently blocked on the stabilization of trait specializations: #62483

@Lucretiel
Copy link
Contributor Author

Quick note: This code change includes a closure. I'd like to replace it with a method, but I tried this:

fn invert<T>(predicate: impl FnMut(T) -> bool) -> impl FnMut(T) -> bool {
    move |value| !predicate(value)
}
// I also tried FnMut(&T) -> bool and FnMut(&'a T) -> bool + 'a

// ...

right.extend(fused.filter(invert(predicate)));

And got this error:

error[E0277]: expected a `ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>` closure, found `impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>`
    --> src/libcore/iter/traits/iterator.rs:1540:35
     |
1540 |         right.extend(fused.filter(invert(predicate)));
     |                                   ^^^^^^^^^^^^^^^^^ expected an `FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>` closure, found `impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>`
     |
     = help: the trait `for<'r> ops::function::FnMut<(&'r <Self as iter::traits::iterator::Iterator>::Item,)>` is not implemented for `impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>`

error[E0271]: type mismatch resolving `for<'r> <impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)> as ops::function::FnOnce<(&'r <Self as iter::traits::iterator::Iterator>::Item,)>>::Output == bool`
    --> src/libcore/iter/traits/iterator.rs:1540:28
     |
1540 |         right.extend(fused.filter(invert(predicate)));
     |                            ^^^^^^ expected bound lifetime parameter, found concrete lifetime

error[E0271]: type mismatch resolving `for<'r> <impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)> as ops::function::FnOnce<(&'r <Self as iter::traits::iterator::Iterator>::Item,)>>::Output == bool`
    --> src/libcore/iter/traits/iterator.rs:1540:15
     |
1540 |         right.extend(fused.filter(invert(predicate)));
     |               ^^^^^^ expected bound lifetime parameter, found concrete lifetime
     |
     = note: required because of the requirements on the impl of `iter::traits::iterator::Iterator` for `iter::adapters::Filter<iter::adapters::Fuse<Self>, impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>>`

error[E0277]: expected a `ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>` closure, found `impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>`
    --> src/libcore/iter/traits/iterator.rs:1540:22
     |
1540 |         right.extend(fused.filter(invert(predicate)));
     |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an `FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>` closure, found `impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>`
     |
     = help: the trait `for<'r> ops::function::FnMut<(&'r <Self as iter::traits::iterator::Iterator>::Item,)>` is not implemented for `impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>`
     = note: required because of the requirements on the impl of `iter::traits::iterator::Iterator` for `iter::adapters::Filter<iter::adapters::Fuse<Self>, impl ops::function::FnMut<(&<Self as iter::traits::iterator::Iterator>::Item,)>>`

Which appears to be lifetime related issue that I can't figure out how to solve. At this point I'm leaning toward reverting the change to partition, since the principle advantage is related to size_hint (which is lost on partition because the lower bound is always 0) but I'll put together some benchmarks first, just to be sure.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-10-11T02:08:19.9285545Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-10-11T02:08:19.9605255Z ##[command]git config gc.auto 0
2019-10-11T02:08:19.9675613Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-10-11T02:08:19.9735881Z ##[command]git config --get-all http.proxy
2019-10-11T02:08:19.9915471Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/65219/merge:refs/remotes/pull/65219/merge
---
2019-10-11T02:16:04.4928728Z    Compiling serde_json v1.0.40
2019-10-11T02:16:06.4186430Z    Compiling tidy v0.1.0 (/checkout/src/tools/tidy)
2019-10-11T02:16:18.6636761Z     Finished release [optimized] target(s) in 1m 34s
2019-10-11T02:16:18.6727655Z tidy check
2019-10-11T02:16:19.5265793Z tidy error: /checkout/src/libcore/iter/traits/iterator.rs: too many lines (3002) (add `// ignore-tidy-filelength` to the file to suppress this error)
2019-10-11T02:16:21.1578876Z some tidy checks failed
2019-10-11T02:16:21.1587196Z Found 482 error codes
2019-10-11T02:16:21.1589474Z Found 0 error codes with no tests
2019-10-11T02:16:21.1592118Z Done!
2019-10-11T02:16:21.1592118Z Done!
2019-10-11T02:16:21.1592341Z 
2019-10-11T02:16:21.1592513Z 
2019-10-11T02:16:21.1595633Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor"
2019-10-11T02:16:21.1596546Z 
2019-10-11T02:16:21.1596803Z 
2019-10-11T02:16:21.1600861Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
2019-10-11T02:16:21.1601249Z Build completed unsuccessfully in 0:01:38
2019-10-11T02:16:21.1601249Z Build completed unsuccessfully in 0:01:38
2019-10-11T02:16:21.1662536Z == clock drift check ==
2019-10-11T02:16:21.1676919Z   local time: Fri Oct 11 02:16:21 UTC 2019
2019-10-11T02:16:21.3262924Z   network time: Fri, 11 Oct 2019 02:16:21 GMT
2019-10-11T02:16:21.3263059Z == end clock drift check ==
2019-10-11T02:16:22.3016641Z ##[error]Bash exited with code '1'.
2019-10-11T02:16:22.3050076Z ##[section]Starting: Checkout
2019-10-11T02:16:22.3051946Z ==============================================================================
2019-10-11T02:16:22.3052001Z Task         : Get sources
2019-10-11T02:16:22.3052046Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@wirelessringo
Copy link

Ping from triage. @cuviper any updates on this? Thanks.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 26, 2019
@JohnCSimon
Copy link
Member

Ping from triage.
@Lucretiel Can you please address the request from @cuviper or if you have, mark it as resolved?
Thanks.

@JohnCSimon JohnCSimon added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Nov 2, 2019
@JohnCSimon
Copy link
Member

Ping from triage.
Unfortunately, I'm going to close this one out as inactive.
Feel free to re-open this PR at any time.

Thank you!

@JohnCSimon JohnCSimon closed this Nov 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants