Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unsafe number ops #7226

Merged
merged 7 commits into from
Jan 3, 2019

Conversation

bcardiff
Copy link
Member

@bcardiff bcardiff commented Dec 28, 2018

This PR adds some new methods to be able to keep current wrapping/unsafe behaviour once the overflow is activated.

New methods/constructs:

  • value.to_iX!/value.to_uX!/value.to_fX! value.unsafe_to_iX/value.unsafe_to_uX/value.unsafe_to_fX
  • Int32.new!(value) Int32.unsafe_new(value).
  • Int#&** is also added, and Int#** was refactored to avoid possible overflow in last iteration.

It prepares some new primitives to be able to have cleaner compiler code in the future. That is, drop the :cast internal in favor of :convert and :unchecked_convert :unsafe_convert to reflect better the nature of to_X methods.

This new methods should be used to be ready for upcoming breaking changes.

Copy link
Member

@asterite asterite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The only thing I'm not sure about is the to_unsafe_... name. It's not really unsafe, converting -1 to UInt8 gives 255, the program wont' crash or behave incorrectly. That said, I can't come up with a better name, and the unsafe there probably means you'll want to double check whether you really want to use those methods instead of the stronger variants. So 👍

@ysbaddaden
Copy link
Contributor

I agree with @asterite there is nothing unsafe in those conversions. It's unchecked for over/underflow, or it's maybe wrapping, as in "be careful, it may not be the value you meant", but it's never unsafe as in "be careful, this can segfault".

Maybe just as_x as in -1.as_u8 for example?

@asterite
Copy link
Member

@ysbaddaden I think as_... and to_... can be easily confused. I like that you said "unchecked", maybe unchecked_to_i? But it's quite long, so unsafe might still be a good alternative.

@Sija
Copy link
Contributor

Sija commented Dec 28, 2018

@asterite unchecked_to_... is long, but IMO most descriptive of all the given options so far. as_... OTOH is short and nice but as you said could be easily confused with to_....

@RX14
Copy link
Contributor

RX14 commented Dec 28, 2018

Int#trunc_to_* and T.new_trunc(). Since the number is truncated to fit the new size (instead of raising)

@Sija
Copy link
Contributor

Sija commented Dec 28, 2018

@RX14 but they're not, truncating is way different then overflowing:

truncate
verb | trʌŋˈkeɪt, ˈtrʌŋkeɪt | [with object]

  • (often as adjective truncated) shorten (something) by cutting off the top or the end: a truncated cone shape | discussion was truncated by the arrival of tea.

with truncating behavior I'd expect that (Int32::MAX + 1 == Int32::MAX).should be_true

@RX14
Copy link
Contributor

RX14 commented Dec 28, 2018

@Sija you're cutting off the top bits when you cast to a smaller type. It's exactly truncation.

src/big/big_int.cr Outdated Show resolved Hide resolved
src/big/big_int.cr Outdated Show resolved Hide resolved
@j8r
Copy link
Contributor

j8r commented Dec 28, 2018

There are already {String|Array|Bool...}#to_unsafe, we can have #to_unsafe_u (instead of unsafe_to_u).
This reads also better.

@straight-shoota
Copy link
Member

@RX14 What about when the size is kept? Converting Int32 to UInt32 doesn't truncate. It simply changes the interpretation of the byte values.

This re-interpretation is similar to Object#unsafe_as, except that it's not unsafe. So #as_x as suggested by @ysbaddaden would fit pretty well.
Yes, it could be difficult to distinguish from #to_x. But I'd prefer that over any of the mentioned alternatives.

Maybe as_unchecked_x would be a compromise but it's really long :/

@Sija
Copy link
Contributor

Sija commented Dec 29, 2018

@RX14 so it seems that's a semantic issue as well, for me the result of -256.to_u8 # => 0_u8 expression exhibits truncation + overflowing, but maybe it's because I'm lacking proper low-level IT knowledge in this regard.

@j8r
Copy link
Contributor

j8r commented Dec 29, 2018

Got an idea, what about appending a exclamation mark: #to_i32!.
This syntax is already used for alternate methods like p!, pp!, not_nil!, sort!, reject!.
This can say: "Hey, beware of overflows!"

@bcardiff
Copy link
Member Author

  • to_unsafe_u vs unsafe_to_u, what is unsafe is the conversion, not the "u". So I think it make more sense the current proposal: unsafe_to_u

  • unsafe_ vs unchecked_/trunc_, the unsafe means that the result might not be semantically correct. Not necessary that will segfault. There are other "unsafe" methods with that express things that the risk is not to segfault. Also the unsafe comes from keeping the semantic with C, where to_unsafe is used.

Maybe the bang alternative is not bad, to_i32! is like convert it to i32 no matter what, but then unsafe_as should be also renamed?

Since it's not a method I would expect to be used that much by the user, using a more verbose name should be fine.

@asterite
Copy link
Member

@bcardiff I liked your explanation. Now I think keeping unsafe in the names is fine. Plus you would most likely use these methods when interfacing with C.

src/big/big_decimal.cr Show resolved Hide resolved
@straight-shoota
Copy link
Member

straight-shoota commented Dec 29, 2018

To throw another option into the ring: #as_x!. It's similarly distinctive as #to_x! but it signals clearer that it's not converting the value but the interpretation.

I don't follow why #unsafe_as would need to be renamed. It is unsafe because it can result in invalid values, whereas the unchecked conversions are always valid integers.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Dec 29, 2018

@asterite it won't be used just when interacting with C but anytime you must implement a binary protocol where overflow is expected: random number generators, HTTP/2 HPACK headers (huffman compression), probably in decoding DWARF sections, ...

I'm fine with a longer name, but would prefer a rust-like #wrapping_to_x over the #unsafe_to_x proposal. It's more explicit as to what happens (wrapping integer conversion, which can be an expected conversion), and keeps unsafe for raw pointers, interacting with external functions (C, C++), #unsafe_as that casts a type to whatever other type, ... things that Crystal can't assert to be safe operations. I don't get why a wrapping integer conversion would be unsafe.

@bcardiff
Copy link
Member Author

I would not got with as_x. I kind of think as should be used for casting and not converting.

It's true that wrapping is more explicit, but the behavior was not settle for example in BigInt. Calling BigInt#to_iX before this PR might throw exception.

The wrapping works fine in integer but for floats the semantic it's truncation actually.
We would need to have the codegen emit diffrent methods here if we don't have a common prefix. Which we can, but it's adding some more logic.

@bcardiff
Copy link
Member Author

@straight-shoota regarding

I don't follow why #unsafe_as would need to be renamed. It is unsafe because it can result in invalid values, whereas the unchecked conversions are always valid integers.

I meant for consistency. #unsafe_as(T) will always read the portion of memory as a T no matter what. to_iX! could be thought in the same way.

@RX14
Copy link
Contributor

RX14 commented Dec 29, 2018

I'm with with unsafe_ but I still think truncation is a bit better way to describe it.

@bcardiff
Copy link
Member Author

bcardiff commented Jan 1, 2019

I would like some more final comments or proposals in the light of my last comments regarding why #unsafe_to_X is still what I would choose as the conversions to go out of the crystal land.

We could have a method for truncation as public API and still have a #unsafe_to_X mainly for libs, effectively an alias. But that could be a separate PR.

@asterite
Copy link
Member

asterite commented Jan 2, 2019

I think trunc might be a good prefix. In LLVM there's a trunc operation that we use when converting an integer to a smaller size. The only "problem" is that the Crystal operation can also (sign) extend the value, and also allows converting from signed to unsigned and floating point to integer so it's not really truncation in those cases.

Then, there's nothing unsafe about these conversion, they just might return a number that's not very intuitive.

I also thought about reintepret but maybe it's too long and not that good.

I'm thinking to_i32! and alternatives might be good. Like @bcardiff said, it reads as "convert this to an Int32 no matter what, even if it overflows. But that doesn't mean we have to rename unsafe_as, which is unsafe (could very easily lead to segmentation fault if used incorrectly, while these number conversions can never lead to one). But we'd need to have the equivalent Int32.new!(...).

@bcardiff
Copy link
Member Author

bcardiff commented Jan 2, 2019

Ok so the api can be #to_X!, .new! and the primitives could be convert and unchecked_convert. 👍 before rewriting?

@bcardiff bcardiff force-pushed the feature/unsafe-number-ops branch from c1fdc8f to fbcc327 Compare January 2, 2019 14:14
@bcardiff
Copy link
Member Author

bcardiff commented Jan 2, 2019

ready to_review!

@bcardiff bcardiff requested a review from RX14 January 2, 2019 15:38
@j8r
Copy link
Contributor

j8r commented Jan 2, 2019

@bcardiff Have you seen #7226 (comment) and #7226 (comment) ? That's minor style improvement, still it reads better :)

@bcardiff bcardiff force-pushed the feature/unsafe-number-ops branch from fbcc327 to 14d3175 Compare January 2, 2019 20:01
@bcardiff
Copy link
Member Author

bcardiff commented Jan 2, 2019

Sorry, @j8r done!

I would like one more core team member review before merging and go to the next partial PR for overflow.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor improvements.

spec/std/big/big_int_spec.cr Show resolved Hide resolved
spec/std/int_spec.cr Outdated Show resolved Hide resolved
spec/std/int_spec.cr Outdated Show resolved Hide resolved
spec/std/int_spec.cr Outdated Show resolved Hide resolved
spec/std/number_spec.cr Outdated Show resolved Hide resolved
src/float.cr Outdated Show resolved Hide resolved
src/float.cr Outdated Show resolved Hide resolved
cast will be used for .as(T) expressions only.
new! initializers use #to_*! to avoid overflow if the value is out of range of the target type
Adding noted for future raising of OverflowError
Add BigDecimal#to_big_i, BigDecimal#to_big_u 
Note: BigRational does not have to_i / to_u methods
@bcardiff bcardiff merged commit 04c1d0c into crystal-lang:master Jan 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants