-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guarantee inbounds iteration over Array{T} #27386
Guarantee inbounds iteration over Array{T} #27386
Conversation
@nanosoldier |
@nanosoldier (Nanosoldier listens only to members regrettably :).) |
Ah, thx, but I think I should have been more patient since the tests are failing. Will fix soon. |
base/array.jl
Outdated
@inbounds return (A[1], 1) | ||
end | ||
|
||
function iterate(A::Array, i::Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that iterate is not necessarily a private interface, so if somebody passed say iterate(A, -1)
, that could cause trouble. Perhaps a better thing to do would be to teach LLVM more about array lengths such that it can elide the boundscheck itself when the iteration protocol is used correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a simpler, high-level solution exists by taking the state a struct that wraps i
, with an inner constructor that sets i = 1
. I think the only way to break things then is to reinterpret a negative integer as that state struct, but well... who would do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try using an unsigned type and starting the state at i=0
. That way all inputs, other than the one we explicitly check for would be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, maybe a pattern to enforce users not to break the state:
struct State
i::Int
State() = new(1)
State(prev::State) = new(prev.i + 1)
end
State() # State(1)
State(State()) # State(2)
State(-1) # error
first(reinterpret(State, [-1])) # the only far-fetched way to break it I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're currently asserting i <= N && 0 < i <= N
, dropping just the duplicate test (and keeping the i > 0 || throw(BoundsError())
) seems like it would still be a strict improvement
Seems there's an anti-pattern where The tests that fail are all iterators that wrap other iterators and dictate the initial state of the inner iterator. In my opinion a wrapper like that should call Example from the tests:
More obvious example: function iterate(itr::PartitionIterator{<:Vector}, state=1)
iterate(itr.c, state) === nothing && return nothing # this is not iterating, but indexing
l = state
r = min(state + itr.n-1, length(itr.c))
return view(itr.c, l:r), r + 1
end |
That is completely true; one should never make assumptions about the state of an arbitrary iterator. But I think that can be relaxed somewhat if the type of the iterator is known, since then it is possible to know about its state. |
@nanosoldier |
Example: julia> @btime mapreduce(abs, +, 1, v) setup = (v = rand(Int, 10000) .% 10) 11.164 μs on master vs 2.340 μs on this branch :) |
The
compared to this branch with the zip iterator:
which suggests that we should implement
versus
|
@nanosoldier Edit: Woops, should have read the logs before cycling Nanosoldier. Thanks Kris! |
dont think rerunning nanosoldier will help until that is fixed |
Hm, one thing I did not consider is that restricting the state type in I only found out about this because in #27415 there's a failing doctest 😕. The issue is:
which is a result of
So, the only reasonable fix I can think of is to restrict the state type in Another thing is making the |
Alternatively, I think I'd just define:
We could even make that a bit more graceful with an inner function and a deprecation. |
I seriously don't think we should make an iterator state type. As Keno mentioned, we can just make this an unsigned type and then we get the full range check for free (this is also how we emit bounds-checks for array):
|
My major hesitation with julia> next([1,2,3], 0x1)
(1, 2) Yes, it's technically an abuse of the iteration protocol to do something like this, but it's an abuse that's fairly widespread. This would incur surprising off-by-one effects, whereas the
|
I think I handled this in the right direction, but I didn't test my code snippet. If it's not working quite right, that should be a simple matter of fixing the offset of the bounds-check. Note closely that my code is using a modulus trick of unsigned number overflow that's fairly common in embedded code, not just a trivial comparison. |
To add to the comment of @mbauman, some bugs have gone unnoticed up to now which would probably not have been around if there was more type safety:
I'd choose type safety over one-liners then. Also, it more or less enforces one to stop using iterators as indexing functions. |
On this point, I had proposed at some point a rest-like iterator that takes an index (the difference between this and a subarray being that |
I get the trick 😅 |
Something like this:
|
I would also prefer not to add an ArrayIteratorState. It has to be ok to use a simple integer as a state. I don't see where the |
ae53af8
to
e4dd4f9
Compare
Rerunning benchmarks after inline change @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Holy cow, that's amazing. Just a few regressions that are far outweighed by the sea of ✅ . I feel bad pulling these out because all these red ❌s look horrible by themselves. This post needs more ✅. ✅✅✅. Or just click the report. ✅✅✅. 🎉
I'll look into this; my guess is that we just need a few extra |
Hm, those regressions weren't there before the force Although the force inline also brought some extra improvements. |
Yup, exactly, which is why I'm thinking we're now pushing a few functions that call iterate over their inlining threshold. |
(Amazing work as always @haampie! ❤️) |
Has anybody been able to reproduce the |
Alpha vs this branch:
|
Thanks, but alpha is too old, it lacks recent compiler improvements. :-/ |
On beta they are indeed the same. Odd. |
OK, that's intriguing... See also a similar phenomenon at #27743 (comment). |
I cannot reproduce skipmissing either. This did hit JuliaCI/Nanosoldier.jl#48, so the PR got the additional advantage of 8eba7c4, and d7bf89f, but I don't see how that would affect skipmissing. On the regression front, the https://github.com/JuliaLang/julia/blob/master/base/dict.jl#L299 This line non-deterministically allocates depending upon the hash structure of the dictionary. If we never have a spurious hash collision, then we need to access the Oh, and now I understand how the skipmissing test also changed. The random data were different between runs. That took far too long. Alright, let's try this again: @nanosoldier |
Looks like Nanosoldier doesn't want to start... :-/ |
Restarted the server. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Ok, that's slightly better. Looks like Nanosoldier did the correct comparison this time. We still have a handful of big regressions flagged (16-17x !?), but I cannot reproduce them locally. |
@nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan |
Good to go then? |
It's so nice to see SIMD instructions with functions like this! function mysum(X)
s = zero(eltype(X))
for x in X
s += x
end
s
end Do you think this solution could be used with any |
It's not easy, see @mbauman's comment here #27384 (comment) |
* Guarantee inbounds iteration over Array{T} * Replace indexing with an iterator with actual indexing * Make the CharStr iterator unaware of the initial state of a Vector{Char} iterator * Fix tests * Force inlining of ::Array iterate function
I think the new iteration protocol really shines in this PR :). Fixes #27384 and replaces #15291.
As an example,
extrema(itr)
is a generic function that applies to any iterator, and hence it cannot safely use@inbounds
. Therefore currently:After this PR however, we don't need bounds checking and we can have SIMD and everything 🎉 :
It preserves the new behaviour of #27079.