Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documented abs overflow behavior, added checked_abs for overflow prot… #13841

Merged
merged 3 commits into from
Nov 2, 2015

Conversation

MichaeLeroy
Copy link
Contributor

…ection

Summary

This PR is to address, in part, #13549. It adds documentation to the signed integer version of abs that explains its overflow behavior. It also adds an overflow trapped version of this function, Base.checked_abs. As yet unaddressed is the addition of an Integer overflow entry to Julia's FAQ.

abs Documentation

With this change, the documentation for abs reads:

julia> @doc abs
  abs(x::Signed)

  The absolute value of x.  When abs is applied to signed integers, overflow may occur, resulting in the
  return of a negative value.  This overflow occurs only when abs is applied to the minimum representable
  value of a signed integer.  That is when x == typemin(typeof(x)), abs(x) == x, not -x as might be
  expected.

  abs(x)

  Absolute value of x

It is interesting that the added signed integer specific abs documentation is printed prior to the already existing generic documentation. As far as I can tell, such presentation is a result of the design of the documentation system, which presents documentation in order of descending specificity,

Base.checked_abs

This function's documentation tells most of the story:

julia> @doc Base.checked_abs
  checked_abs(x::Signed)

  The absolute value of x, with signed integer overflow error trapping. checked_abs will throw an
  OverflowError when x == typemin(typeof(x)).  Otherwise checked_abs behaves as abs, though the overflow
  protection may impose a perceptible performance penalty.

I chose to implement this function only for Signed types, rather than (trivially) making it more generic. I did so, because it struck me as a sort of false advertising to provide this function for other types when it useful only for Signed types. That said, I have no strong feelings against making it more generic.

@ScottPJones
Copy link
Contributor

The following might be a bit faster: (I have to run, Halloween for the kids)

function checked_abs{T<:Signed}(v::T)
    (v = abs(v)) < 0 && throw(InexactError())
    v
end

@ScottPJones
Copy link
Contributor

Btw, very nice PR!:+1:

@MichaeLeroy
Copy link
Contributor Author

@ScottPJones, interesting suggestion for the implementation of checked_abs. I'm too ignorant of compilers and machine architecture to have any idea which alternative is better, though I think mine is prettier 😜. I suppose that an argument for your approach is that it is easier to check whether a single register's MSB is set than to do a register to register comparison (assuming that the integers we're talking about are register width). I'm sure that benchmarking abs versus checked_abs was easy compared to the task of comparing the two implementations of checked_abs, so I'll not go there unless someone insists.

As for OverflowError versus InexactError, I guess that I can see an argument for the latter:

julia> Int8(2^7+1)
ERROR: InexactError()
 in call at essentials.jl:58
 in eval at ./boot.jl:264

but

julia> Int64(2^63+1)
-9223372036854775807

Still, I like OverflowError, which seems more intuitive in this context. But I'm plenty willing to be schooled in the canonical use of the error codes.

@MichaeLeroy MichaeLeroy reopened this Oct 31, 2015
@KristofferC
Copy link
Member

For this to appear in the manual I believe you have to add

.. function:: abs(x::Signed)

and

.. function::  checked_abs(x::Signed)

to the correct places in the .rst files. Then run make and then julia doc/genstdlib.jl and then the new docstrings should be spliced into where you put the .. function.

@MichaeLeroy
Copy link
Contributor Author

@KristofferC thanks for the pointer on properly linking in-code documentation to the manual. I'll investigate that soon on my fork. Right now, my Julia working day is close to its end...

@mikewl
Copy link
Contributor

mikewl commented Nov 1, 2015

This is on Julia nightly 49a01db. Copied the test code from the issue.
Scott's method is slower with the simple comparison.

julia> function checked_abs{T<:Signed}(x::T)
                  x == typemin(T) && throw(OverflowError())
                  abs(x)
              end
checked_abs (generic function with 1 method)

julia> function checked_abs2{T<:Signed}(v::T)
           (v = abs(v)) < 0 && throw(InexactError())
           v
       end
checked_abs2 (generic function with 1 method)

julia> @time (for i in 1:10^7; abs(rand(-10^8:10^8)) end)
  0.450079 seconds

julia> @time (for i in 1:10^7; abs(rand(-10^8:10^8)) end)
  0.467534 seconds

julia> @time (for i in 1:10^7; checked_abs(rand(-10^8:10^8)) end)
  0.453054 seconds

julia> @time (for i in 1:10^7; checked_abs(rand(-10^8:10^8)) end)
  0.459850 seconds

julia> @time (for i in 1:10^7; checked_abs2(rand(-10^8:10^8)) end)
  0.480046 seconds

julia> @time (for i in 1:10^7; checked_abs2(rand(-10^8:10^8)) end)
  0.494326 seconds

julia> checked_abs(typemin(Int))
ERROR: OverflowError()
 in checked_abs at none:2

julia> checked_abs2(typemin(Int))
ERROR: InexactError()
 in checked_abs2 at none:2

@yuyichao
Copy link
Contributor

yuyichao commented Nov 1, 2015

You are almost certainly benchmarking mainly the overhead.

julia> using Benchmarks

julia> function checked_abs{T<:Signed}(x::T)
           x == typemin(T) && throw(OverflowError())
           abs(x)
       end
checked_abs (generic function with 1 method)

julia> function checked_abs2{T<:Signed}(v::T)
           (v = abs(v)) < 0 && throw(InexactError())
           v
       end
checked_abs2 (generic function with 1 method)

julia> @benchmark checked_abs(1)
================ Benchmark Results ========================
     Time per evaluation: 2.13 ns [2.11 ns, 2.15 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 11601
   Number of evaluations: 126646601
         R² of OLS model: 0.961
 Time spent benchmarking: 0.51 s


julia> @benchmark checked_abs2(1)
================ Benchmark Results ========================
     Time per evaluation: 3.40 ns [3.37 ns, 3.44 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 11401
   Number of evaluations: 104667101
         R² of OLS model: 0.971
 Time spent benchmarking: 0.51 s


julia> @benchmark abs2(1)
================ Benchmark Results ========================
     Time per evaluation: 1.80 ns [1.78 ns, 1.82 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 11401
   Number of evaluations: 104667101
         R² of OLS model: 0.958
 Time spent benchmarking: 0.50 s


julia> @benchmark checked_abs(-1)
================ Benchmark Results ========================
     Time per evaluation: 2.03 ns [2.01 ns, 2.05 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 11901
   Number of evaluations: 168565701
         R² of OLS model: 0.964
 Time spent benchmarking: 0.51 s


julia> @benchmark checked_abs2(-1)
================ Benchmark Results ========================
     Time per evaluation: 3.38 ns [3.34 ns, 3.42 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 11801
   Number of evaluations: 153241801
         R² of OLS model: 0.954
 Time spent benchmarking: 0.67 s


julia> @benchmark abs(-1)
================ Benchmark Results ========================
     Time per evaluation: 1.90 ns [1.88 ns, 1.92 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 10201
   Number of evaluations: 33352001
         R² of OLS model: 0.979
 Time spent benchmarking: 0.50 s

@MichaeLeroy
Copy link
Contributor Author

@Mike43110 I'm afraid that you copied my first and admittedly bad attempt at benchmarking abs versus checked_abs. Later on in #13549, after a couple of pointers from @ScottPJones, I had come up with improved, though still simplistic, tests. If that early effort lead to confusion, I apologise. @yuyichao impressive. And thanks, I'll learn from your example.

@ScottPJones
Copy link
Contributor

@MichaeLeroy I'll try benchmarking it myself tomorrow, I had just had time to look at the output of @code_native, and thought it should be faster than checking up front against typemin. About InexactError vs. OverflowError, I just put InexactError because I'm used to getting that in conversions where the result can't fit - and this is a case where the result needs the next larger integer size type, or an unsigned type. I.e. if the result of abs(x) were UInt instead of Int, this wouldn't even be a problem.
The code generated isn't really optimal anyway, not for abs() or checked_abs(), has some extra unnecessary instructions (some of which go away when this is in-lined, as it usually would be)

@MichaeLeroy
Copy link
Contributor Author

@ScottPJones, you might want to have a look at @yuyichao's contribution to the discussion. He's already done some good looking benchmarks of the two versions of checked_abs. I'm unfamiliar with the Benchmarks package and am a novice at benchmarking, so I'll refrain from comments on those results.

@ScottPJones
Copy link
Contributor

@MichaeLeroy 1) I haven't found the macro yet that @yuyichao is using (I did Pkg.add("Benchmark"), which doesn't seem to be updated for 0.4.0 yet), but it only has functions, not a macro. 2) His results go against what I know from benchmarking Intel chips for >35 years 3) Now that I've had time to do my own benchmarking, I get very different results - in fact, the numbers from the 3 methods are so close, that I think abs() should simply be replaced by checked_abs(). 4) The code generated seems very different between v0.4.0 and v0.5 (freshly built from master), and depending on whether it is LLVM 3.3 (the default), LLVM 3.7 or LLVM-svn.

Here are some results, on 0.5 (built today), LLVM 3.3, 64-bit Mac OS X:
v = i
0.013148 seconds (2.00 M allocations: 30.502 MB, 5.21% gc time)
v = abs(i)
0.013958 seconds (2.00 M allocations: 30.502 MB, 4.75% gc time)
v = scott_abs(i)
0.013871 seconds (2.00 M allocations: 30.502 MB, 4.81% gc time)
v = checked_abs(i)
0.013952 seconds (2.00 M allocations: 30.502 MB, 4.47% gc time)

As you can see, the abs() function takes about the same time as doing nothing at all,
and the scott_abs() function is the fastest (but to be fair, running this multiple times, it's pretty random which of the 3 abs functions is fastest, which is why I think abs should simply become one of the checked versions).

@ScottPJones
Copy link
Contributor

Here is a gist with my benchmarking code: I'd love to see the results on your machine!
https://gist.github.com/ScottPJones/6f15de663e310faa5e2e

@MichaeLeroy
Copy link
Contributor Author

@ScottPJones sorry but as I said above, I'm not terribly interested in getting deep into the weeds with benchmarking to look for marginal differences between checked_abs and scott_abs. I certainly do not say this to be dismissive of benchmarking, rather it is an acknowledgement of my novice level of expertise with this art. Also, there are many aspects of issue #13549 beyond the provision of a checked_abs function, and I do not want to see these aspects eclipsed by arguments over checked_abs versus scott_abs. Indeed, a more interesting question than the exact form that checked_abs should take is whether Julia should include such a function. For the record, I support inclusion, given the existence of the other checked arithmetic functions, but I'm not going to be heartbroken if the community consensus is that checked_abs not be added. What I most hope for is that this PR will be an occasion to work through questions of good documentation by tackling a rather thorny problem of describing the computational versus mathematical behavior of a function.

So far, I've seen nothing in the arguments by benchmark that convince me to revisit the version of checked_abs put forward in this PR. Arguments that would stand a better chance of doing so are those based on: code clarity, adherence to Julia idioms, and what the code should and does accomplish. Of course evidence is more persuasive than assertion. And in general any suggestions that help me to become a better Julia user and community member are most welcome. In this light, what I'd really like to see along with benchmarks that clearly support an alternative are the reasons why the alternative performs better. It's the reasoning, not benchmarks alone, that will help me become better at Julia (and better able to contribute to its development).

OK, I've disarmed myself by indicating a lack of competence and immediate interest in deep-dive benchmarking. I've also declined to take a look at the proffered gist. Still I can't resist a couple of comments on the latest benchmark numbers that you present. The large allocations and significant gc activity strike me as inconsistent with testing focused on the performance of the flavors of abs functions under discussion. Also, I find it astonishing that your numbers indicate that scott_abs is dead on par with abs. How can that be? I invite you to reconsider or clarify your claim (as I read it) that there are no performance penalties to checked arithmetic.

@simonster
Copy link
Member

@ScottPJones Since v is not initialized before the loop, it is getting boxed, which is why your benchmarks are allocating O(n) memory. @benchmark is defined in the unregistered package Benchmarks.jl.

@ScottPJones
Copy link
Contributor

@simonster Thanks! I'd looked at the native code of the 3 different abs versions, as well as with the test functions, and didn't understand why there was so much stuff there. Could the compiler figure that out, and not do all of that boxing and memory allocation?

@MichaeLeroy A full explanation would take a while, and the version in the PR is just fine, due to the way modern processors operate (I'm not sure what would happen on an ARM chip though!) (which is also why adding the check doesn't really slow things down, because the CPU has multiple ALUs, and can also handle speculative execution). I was just surprised that @yuyichao's benchmarks showed such a difference - and wanted to understand just what was causing that effect in Julia.
I thought you might have wanted to look at the gist, for the benchmarking techniques, but if you are not interested in that sort of thing, fine. (I love benchmarking, and optimizing code! 😀)

I do think this is a very fine PR, and hopefully will be included soon.

@simonster
Copy link
Member

@ScottPJones #6914

FWIW, I think the implementation in this PR looks good and I'm on board with the general idea, but it should have tests.

@MichaeLeroy
Copy link
Contributor Author

@simonster thanks for your contributions to the discussion. I should be clear that my expressed scepticism of benchmarking is limited to the sort of argument by benchmarking (alone) over alternative implementations of checked_abs that I saw as arising over this PR (which I found unilluminating and potentially counterproductive as described in my previous comment).

I'm all for having good benchmarks of abs versus checked_abs, as my work on #13549 leading up to this PR indicates. Do you have any advice on what constitutes adequate testing? Can you point me to any well executed, closed, PRs that would provide good models of adequate testing? I apologise if I'm asking too many questions. Keep in mind that this PR is a result of an "Intro Issue" and that this is my first PR to Julia proper, so it is a learning exercise for me.

@simonster
Copy link
Member

Agreed that further benchmarking is unnecessary at this point. You just need to add a couple lines somewhere in test/int.jl that check that the function does what it's supposed to, something like:

@test checked_abs(-1) == 1
@test_throws OverflowError checked_abs(typemin(Int))

These tests are run automatically before we merge pull requests or make releases, or you can run them manually by running make from the test directory (or make int, if you don't want to run the full test suite).

Usually you want to make sure the tests cover all lines and all branches that could be hit. There's documentation on Base.Test, the module that we use for unit testing, here, although @test and @test_throws should be the only things you need to use for this PR.

Thank you for your contribution!

@yuyichao
Copy link
Contributor

yuyichao commented Nov 2, 2015

The RST doc still needs to be updated.

However, given that this is basically ready to merge and there's some updates in the inline doc that are not sync to the RST doc yet, I'd prefer to just merge this without the sync (so please just add the test @simonster suggested) and I'll include the doc update in #13837 once the pending issue in that PR is sorted out.

@MichaeLeroy
Copy link
Contributor Author

@simonster and @yuyichao thanks for your inputs. I have only a bit of work time left in my day (and am frazzled from the checked_abs versus scott_abs discussion), but I'll attempt to get the appropriate tests implemented and checked in the time I have. If not, finalization will have to wait for my morning some 12 hours hence.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 2, 2015

@MichaeLeroy take your time.

@yuyichao yuyichao mentioned this pull request Nov 2, 2015
@MichaeLeroy
Copy link
Contributor Author

@yuyichao and @simonster adding and running the suggested tests was not hard. (I did loop through the Signed types for thoroughness.) That and I managed the mechanics of github. So I think this PR is good to go. Thanks again for your help. Oops AppVeyor failed....

@MichaeLeroy
Copy link
Contributor Author

...trailing white space in test file. I've tried a fix.

@MichaeLeroy
Copy link
Contributor Author

OK, looks good now.

@StefanKarpinski
Copy link
Member

@MichaeLeroy, I just wanted to say that this is an exemplary first pull request. Please disregard @ScottPJones's comments and benchmarks since they are noise.

@MichaeLeroy
Copy link
Contributor Author

@StefanKarpinski thanks for your kind words. I've found Julia to be a remarkable language with great promise, certainly the coolest I've had the pleasure to use. I'll try to contribute what I can as time and talent allow. It's fun.

tkelman added a commit that referenced this pull request Nov 2, 2015
Documented abs overflow behavior, added checked_abs for overflow prot…
@tkelman tkelman merged commit 59cabb6 into JuliaLang:master Nov 2, 2015
@tkelman
Copy link
Contributor

tkelman commented Nov 2, 2015

Yep, @MichaeLeroy you're doing everything right. Welcome, thanks for the contribution, and I hope we'll have the chance to see more from you in the future!

@nalimilan
Copy link
Member

I'm coming a little late to the party, but wouldn't it make sense to define checked_abs(x) = abs(x) as a fallback for all non-Signed types? This allows writing fully generic code while still being certain that overflows will be trapped for types where it may happen.

@MichaeLeroy
Copy link
Contributor Author

@nalimilan I certainly considered, and vacillated if not agonized, over that question. And I did mention it in opening the PR. While generally speaking, being as generic as reasonably possible is desirable, I envision checked_abs as being used only rarely when overflow is both unavoidable and dangerous. In those cases, I expect the user to be well aware of and carefully choose types, not to be constructing highly generic functions. In other words, generic functions are useful for constructing other generic functions and I did not see this as the role of checked_abs. But as I said in the PR, I have no strong objection to making checked_abs more generic.

@MichaeLeroy
Copy link
Contributor Author

@nalimilan BTW, if you are importing checked_abs it's not that much extra effort to make it more generic:

julia> import Base.checked_abs

julia> checked_abs(0x23)
ERROR: MethodError: `checked_abs` has no method matching checked_abs(::UInt8)
 in eval at ./boot.jl:264

julia> checked_abs(x) = abs(x)
checked_abs (generic function with 2 methods)

julia> checked_abs(0x23)
0x23

julia> checked_abs(typemin(Int))
ERROR: OverflowError()
 in checked_abs at int.jl:70
 in eval at ./boot.jl:264

julia> checked_abs(typemin(UInt8))
0x00

So maybe it's not that much of a restriction that, as implemented, checked_abs is narrowly defined and calls attention to itself when applied to anything other than Signed.

@nalimilan
Copy link
Member

@MichaeLeroy I agree it's easy to define manually when you need it, but my point was rather about doing the right thing from the start, which is what makes Julia great to use: people don't need to spend time on each method for each type to get things to work. For example, abs is defined for unsigned types currently, even if it's a no-op and could easily be defined by hand when needed; and checked_add works for BigInts even if no overflow can occur with them.

@toivoh
Copy link
Contributor

toivoh commented Nov 2, 2015

Are we really sure that

checked_abs(x) = abs(x)

is safe as a fallback for all non-Signed types? We really don't want to provide a fallback that could silently provide the wrong behavior.

@MichaeLeroy
Copy link
Contributor Author

@nalimilan I guess that we have a bit of a philosophical difference on this issue. (Not, on generics in general but on this function in particular.) That's not a big deal from my perspective, certainly not a line in the sand.

In fairness, I did attempt to solicit a discussion on this issue by explicitly calling attention to it in the PR and acknowledging the validity of the fully generic approach. I had hoped to see some discussion of it to get the community's consensus. So I'm certainly interested in seeing this issue aired. But after all of the work that I put into responding to the weedy discussions of alternative implementations and of benchmarking, I don't feel that I have the time and energy just now to take a leading role in this further discussion. I don't know what the appropriate protocol is for re-opening/continuing a discussion over a closed PR, but feel free to do what's right. BTW, #13549, which lead to this PR remains open pending the creation of a FAQ entry on Integer overflow, and I'm happy to get comments, criticism and advice on this work. If you do open a new issue or PR on this question, please do ping me, so I'll have an opportunity to follow the discussion.

@MichaeLeroy
Copy link
Contributor Author

@toivoh I think that you are right that it would be improper merely to fall back to checked_abs(x) = abs if there were some sort of checking that would be natural for abs applied to a type other than Signed. That is what I was hinting at in the original PR when I called a fully generic checked_abs "a sort of false advertising". I did try to think of what checks might be needed for other types and could not come up with anything. But it may be that I lack sufficient imagination 😄.

@nalimilan
Copy link
Member

@MichaeLeroy No worries, your PR is arguably an improvement, and the changes I propose can be made at any point if they are deemed useful. At least the current situation is completely safe.

You're both right that the fallback wouldn't be 100% safe for all non-Signed types. It should only be defined for Unsigned types, for which abs is a no-op (according to the method in int.jl).

If other people support this idea, I can make another PR (unless Michael wants to do it).

@MichaeLeroy
Copy link
Contributor Author

@nalimilan I'd be happy to have you create another PR to expand the domain of checked_abs if that is appropriate. And I am more comfortable with having that domain grow to include Unsigned or to encompass Integer rather than its being broadly applicable to all of the types for which abs(x) is defined.

@nalimilan
Copy link
Member

I've made a PR: #13912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants