Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't ask the call site to provide @inbounds when iterating over an array #27384

Closed
haampie opened this issue Jun 2, 2018 · 11 comments
Closed
Labels
arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol performance Must go faster

Comments

@haampie
Copy link
Contributor

haampie commented Jun 2, 2018

This issue came about when I discovered that sort! on a small vector of integers was slower than usual; further inspection shows that extrema is now 65% slower for a vector of 500 integers, and this is a result of bounds checks in each iteration.

I would not dare to fix this by adding @inbounds to the loop of extrema, since the function is defined for iterators in general (why is it extrema's responsibility to signal that the internals of the iterator will do inbounds things?). I'd say an iterator over an array should guarantee to stay inbounds and not require its call site to provide @inbounds.

But the issue is quite subtle. The function iterate(A::Array, ::Int) does something that comes close to bounds checking

function iterate(A::Array, i=1)
    @_propagate_inbounds_meta # why require the parent function to provide inbounds
    i >= length(A) + 1 ? nothing : (A[i], i+1) # when we here (kinda) assert we are inbounds?
end

but it's not entirely bounds checking, because in principle i could overflow, so to be safe an actual check is required to guarantee i > 0.

The issue is a result of #27079 (more robust iteration over vectors), but in fact it cheats a bit by requiring the user to always provide @inbounds. Edit (sorry!): without inbounds for x in xs would do bounds checking already.


Possible solutions:

  • Create a wrapper for x in Mutable(xs) that allows to modify xs on the go; for x in xs is then immutable iteration by convention and can be fast even without @inbounds.
  • Create a separate implementation of extrema(::AbstractArray) that uses @inbounds (meh)
  • Just use @inbounds in extrema and document that it does that
  • Improve the speed of checkbounds(Bool, ::AbstractArray, ::Int) and use that rather than the i >= length(A) + 1 condition, and somehow tell the compiler i cannot overflow, so that the i > 0 check can be optimized away.

Edit, seems like the new iteration protocol is enough to fix things.

@mbauman
Copy link
Member

mbauman commented Jun 2, 2018

This is quite closely related to #21402, but indeed things are different now that we only have one method — and don't need to assume that the caller did the start/next/done dance correctly.

@nalimilan nalimilan added performance Must go faster arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol labels Jun 2, 2018
@JeffBezanson
Copy link
Member

extrema is now 65% slower for a vector of 500 integers

Slower than v0.6.x?

@haampie
Copy link
Contributor Author

haampie commented Jun 2, 2018

65% slower on 0.7 when comparing bounds checking enabled vs disabled. Full report:

> @benchmark extrema(v) setup = (v = rand(Int, 1000))

884.1 ns on 0.6.2
1.222 μs on 0.7.0-alpha.0
422.8 ns with the proposed fix of #27386 (i.e. no bounds checking)

@andyferris
Copy link
Member

I have to agree with @haampie's general point here... generic code should be able to be just as fast as concretely-typed code, and the interfaces available for use by functions like extrema(::Any) should be clear, fast and safe. (IMO we should try to avoid as much as possible the situation where a concrete version of perfectly valid generic method is added, with extra @inbounds added for speed).

Have we discussed having an unsafe_iterate (used when lowering for loops) with a default fallback to iterate, which was something already suggested for unsafe_next in #15291 (and before)? The alternative seems to be asserting that all implementations of @inbounds iterate(...) shouldn't be able to crash if the user doesn't mess with the iteration state (and have lowering do this for us?). Tricky, tricky...

@mbauman
Copy link
Member

mbauman commented Jun 11, 2018

I think we can do this without an unsafe_iterate. What would that gain us beyond Harmen's proposal in #20469 (comment)?

@nalimilan
Copy link
Member

I guess the idea of unsafe_iterate is that for x in X would be lowered to it, which would allow getting rid of bounds checks? But maybe we can achieve the same result by lowering for x in X to @inbounds iterate(...)?

@mbauman
Copy link
Member

mbauman commented Jun 11, 2018

But doesn't the new iteration protocol give the iteration implementors enough information that they can safely add the @inbounds themselves without any additional context from the caller? No longer do we have to trust that the caller has called doneiterate itself does that check. The only thing it doesn't protect ourselves from is someone manually constructing and passing an invalid state that happens to pass the check but it still out of bounds.

The PR over at #27386 demonstrates (or hopefully will demonstrate once Nanosoldier is feeling better) that it's possible to check the entire domain of the state without incurring a performance penalty.

Now, it's harder (impossible?) to check the entire domain with abstract iterator implementations, but #20469 (comment) is arguing that the iterator state is sufficiently complicated that it would require the caller to specifically craft an invalid state such that it passes the "done" check but still yields an out of bounds access. I was initially inclined to agree, but as I was writing this it dawned on me that the easiest way to mess this up would be by iterating over two arrays of different lengths at the same time and inadvertently mixing up the states. Is there another approach we could take here?

I'd really love it if we could avoid either kludge (that is, either unsafe_iterate or @inbounds iterate).

@haampie
Copy link
Contributor Author

haampie commented Jun 11, 2018

inadvertently mixing up the states

I did not think of that, and it looks pretty hard to avoid unfortunately.

@andyferris
Copy link
Member

The only thing it doesn't protect ourselves from is someone manually constructing and passing an invalid state that happens to pass the check but it still out of bounds.

Exactly. AFAICT the previous policy is that if no-one calls something called unsafe_xxx, uses ccall, or adds any @inbounds, Julia will basically never crash (i.e. segfault).

I am however happy to agree to ditch that policy - but it's worth discussing.

@andyferris
Copy link
Member

andyferris commented Jun 12, 2018

(To be honest the fact that done and next are now together is only a minor mitigation of the fact that users can mess iteration up, since as mentioned they can still pass whatever state that they want).

@andyferris
Copy link
Member

I'd really love it if we could avoid either kludge (that is, either unsafe_iterate or @inbounds iterate).

Rather silly idea: if we simply renamed iterate to unsafe_iterate then we are basically done (ugly, I know, but brutally honest).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants