Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that axes start with 1 for AbstractRange operations #30950

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

timholy
Copy link
Member

@timholy timholy commented Feb 4, 2019

Now that we have Base.IdentityUnitRange and Base.Slice, we need to be careful about fallbacks that just use first, step, stop-style properties. First noticed in JuliaArrays/IdentityRanges.jl#8.

The only surprising part about this is that I needed to introduce a trait: ranges like typemin(Int):typemax(Int) trigger an OverflowError if you call axes on them.

@timholy timholy requested a review from mbauman February 4, 2019 02:58
Now that we have Base.IdentityUnitRange and Base.Slice, we need to be
careful about fallbacks that just use `first`, `step`, `stop`-style
properties.
Copy link
Member Author

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this ended up being a bigger deal than expected. Easiest to review ignoring whitespace changes due to introducing a @testset in test/offsetarray.jl.

_axistype(::AxesStart1, ::AxesStart1, a, b) = OneTo{Int}(a)
_axistype(::AxesStartAny, ::AxesStart1, a, b) = a # if we get here, b has length 1
_axistype(::AxesStart1, ::AxesStartAny, a, b) = error("mismatched axes, or specialize `Broadcast.axistype` for ", typeof(a), " and ", typeof(b), " axes")
_axistype(::AxesStartAny, ::AxesStartAny, a, b) = error("mismatched axes, or specialize `Broadcast.axistype` for ", typeof(a), " and ", typeof(b), " axes")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure about these erroring methods. Here's an interesting and troubling case:

julia> using OffsetArrays, Test

julia> x = [1]
1-element Array{Int64,1}:
 1

julia> y = OffsetArray([1], -1:-1)
1-element OffsetArray(::Array{Int64,1}, -1:-1) with eltype Int64 with indices -1:-1:
 1

julia> @inferred(x.*y)
1-element OffsetArray(::Array{Int64,1}, -1:-1) with eltype Int64 with indices -1:-1:
 1

julia> @inferred(y.*x)
ERROR: mismatched axes, or specialize `Broadcast.axistype` for Base.OneTo{Int64} and Base.IdentityUnitRange{UnitRange{Int64}} axes
Stacktrace:
 [1] error(::String, ::Type, ::String, ::Type, ::String) at ./error.jl:42
 [2] _axistype(::Base.AxesStart1, ::Base.AxesStartAny, ::Base.OneTo{Int64}, ::Base.IdentityUnitRange{UnitRange{Int64}}) at ./broadcast.jl:497
 [3] axistype(::Base.OneTo{Int64}, ::Base.IdentityUnitRange{UnitRange{Int64}}) at ./broadcast.jl:494
 [4] _bcs1 at ./broadcast.jl:485 [inlined]
 [5] _bcs at ./broadcast.jl:479 [inlined]
 [6] broadcast_shape at ./broadcast.jl:473 [inlined]
 [7] combine_axes at ./broadcast.jl:468 [inlined]
 [8] instantiate at ./broadcast.jl:256 [inlined]
 [9] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(*),Tuple{OffsetArray{Int64,1,Array{Int64,1}},Array{Int64,1}}}) at ./broadcast.jl:802
 [10] _materialize_broadcasted(::Function, ::OffsetArray{Int64,1,Array{Int64,1}}, ::Vararg{Any,N} where N) at /home/tim/src/julia-branch/usr/share/julia/stdlib/v1.2/Test/src/Test.jl:1279
 [11] top-level scope at REPL[5]:1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fascinating. So the larger question is: which axis should be used as the "basis" for the output and which axis is being "extruded" over that basis?

If we have any non-singleton dimension then it's a no-brainer: that's our output axis and all other singleton axes are discarded as they're being "extruded". We demand that the axes values match, so multiple non-singleton dimensions are by definition identical or an error. (We do promote the axis type, though, such that an OffsetArray that happens to be 1-based will still return a similarly non-offset OffsetArray for type stability.)

But when we have multiple singleton dimensions it's not clear. On master, we just use the last-provided axis for the given dimension, which isn't great. It's not really a one-based vs. offset thing — the same happens for combinations of offset arrays:

julia> axes(OffsetArray([0], 1) .+ OffsetArray([0], 2))
(Base.IdentityUnitRange(3:3),)

julia> axes(OffsetArray([0], 2) .+ OffsetArray([0], 1))
(Base.IdentityUnitRange(2:2),)

I think the one-based vs. offset axis thing is a red herring — we don't really know which is going to matter in the output. So I say we just find some consistent (symmetric) rule that can generalize to multiple singleton offset axes, too. Some possibilities:

  • Always use a one-based output for any combination of singleton axes. I don't like this because then OffsetArray(rand(1), -1) .+ OffsetArray(rand(1), -1) would no longer be offset. So let's ax this candidate.
  • Always demand that combinations of singleton axes match. At first glance I'd think this would be massively breaking. It does help, though, that we completely ignore the pseudo-axes beyond the dimensionality of the array (e.g., we don't care what axes(fill(0), 2) returns, so scalars and zero-dimensional arrays could still broadcast with anything). It may be interesting to try out, but it could make for a really annoying trap where an expression mostly works until you end up with a one-vector instead of a two-vector.
  • Use some symmetric value-based rule that picks the result based on the axes values themselves. Possible candidates include:
    • Just pick the axis with the minimum or maximum element
    • Pick the axis with the element closest to 1 (biasing the comparison to favor either axes below or above 1 so there aren't ties). Perhaps the arc of the Julian universe bends towards 1-based arrays.
    • Pick the axis with the element farthest from 1 (again, biased to favor either above/below). The reasoning here would be that someone chose to have that offset; they should be allowed to keep it.
    • Pick the axis that matches the most other axes in the expression. In case of ties (which would be common for binary broadcasts), use one of the other rules.

@@ -140,7 +140,20 @@ abstract type AbstractRange{T} <: AbstractArray{T,1} end
RangeStepStyle(::Type{<:AbstractRange}) = RangeStepIrregular()
RangeStepStyle(::Type{<:AbstractRange{<:Integer}}) = RangeStepRegular()

convert(::Type{T}, r::AbstractRange) where {T<:AbstractRange} = r isa T ? r : T(r)
AxesStartStyle(::Type{<:AbstractRange}) = AxesStart1() # opt-out of AxesStart1 for "weird" range types
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways I'd prefer "opt in" on this trait, but I think it's too breaking. So I think it's better to mark AbstractRange types that violate the expectation of having unit indexing.

@@ -157,6 +170,8 @@ type can represent values smaller than `oneunit(Float64)`.
"""
abstract type OrdinalRange{T,S} <: AbstractRange{T} end

convert(::Type{OrdinalRange{T,S}}, r::OrdinalRange{T,S}) where {T,S} = r
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary for this PR, I think, but I noticed their lack and thought it might be good for disambiguating.

@@ -43,33 +44,24 @@ function reduced_indices0(inds::Indices{N}, d::Int) where N
end
end

function reduced_indices(inds::Indices{N}, region) where N
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially started modifying this code to fix a problem but then transitioned to the changes above. Still, this seemed like old code in need of some cleanup.

@timholy
Copy link
Member Author

timholy commented Jun 16, 2020

Just added a 2.0 tag here so we don't forget to do this. Ideally we'll get this done during the 1.x series, but this particular PR felt like it was careening off in a bad direction. If I or someone gets back to it, I think the best strategy is to look at this to refresh memory on where it found problems, then forget about it and start from scratch, slowly and carefully seeing whether this can be fixed in a backwards-compatible way.

This is definitely one of those bugfixes that is likely to break someone's code, and the oft-linked https://xkcd.com/1172/ is not a perfect metaphor because the old broken behavior is not nearly so crazy. This has gone unfixed for so long in part because, for a long time, being an AbstractRange meant that you had 1-based indexing, and the change had deeper implications than I appreciated at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ranges Everything AbstractRange
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants