unsigned conversion regression #12554

mratsim · 2019-10-29T16:55:13Z

The following used to give 44 on 0.20.2.

It now raises an error range error on 1.0.

let x = 300'u
echo byte(x)

This broke our upcoming networking code: vacp2p/nim-libp2p@541f0f2. In general, I don't think people expect range checking on unsigned integers, especially on converting to raw byte.

This may have a large impact on low-level code for example: graphics, crypto, networking, emulation ...

The text was updated successfully, but these errors were encountered:

kraptor · 2019-10-30T11:01:02Z

For low-level code, what I always do is using proper casting to avoid this:

let x = 300'u
echo cast[byte](x)

Araq · 2019-10-30T11:50:56Z

Here is what the spec says about it:

`Automatic type conversion`:idx: is performed in expressions where different
kinds of integer types are used: the smaller type is converted to the larger.

A `narrowing type conversion`:idx: converts a larger to a smaller type (for
example ``int32 -> int16``. A `widening type conversion`:idx: converts a
smaller type to a larger type (for example ``int16 -> int32``). In Nim only
widening type conversions are *implicit*:

Type conversions
----------------
Syntactically a `type conversion` is like a procedure call, but a
type name replaces the procedure name. A type conversion is always
safe in the sense that a failure to convert a type to another
results in an exception (if it cannot be determined statically).

So it now behaves as the spec requires and previously it didn't. But I agree the wording is not explicit about unsigned types and worse there was no changelog entry for this change.

Araq · 2019-10-30T11:57:00Z

This may have a large impact on low-level code for example: graphics, crypto, networking, emulation ...

We will add a changelog entry and we might add a switch to turn it off again, but looking at it without the history, it seems to be an improvement, use bit masking or cast to convert to bytes. Nim got more consistent in its behaviour.

dryajov · 2019-10-30T14:56:56Z

I generally agree with the behavior - it is safer, but the problem is that it fails at runtime, it should be caught at compile time and warn at the very least, tho failing compilation is much more preferable.

Araq · 2019-10-30T15:01:15Z

Alright, good. Let's make this create a new warning.

dryajov · 2019-10-30T15:12:19Z

Dope!

krux02 · 2019-10-30T15:28:14Z

@Araq what kind of warning do you have in mind?

Araq · 2019-10-30T15:32:24Z

Warning: This type conversion now produces a runtime check that can raise

dryajov · 2019-10-30T15:33:57Z

How about:

Warning: narrowing type conversions are [deprecated/not allowed] 🚲 🏠

krux02 · 2019-10-30T15:36:09Z

What if this runtime check is exactly what I need?

dryajov · 2019-10-30T15:38:33Z

I think this isn't for checks, its for assignments specifically, thats where I would expect it to warn.

let myByte: byte = 300 should warn

krux02 · 2019-10-30T15:39:23Z

that one isn't allowed, it is checked at compile time.

dryajov · 2019-10-30T15:43:14Z

You're right, that one is caught, this one isn't (https://play.nim-lang.org/#ix=20lj):

let val: uint = 300
let myByte: byte = byte(val)

Araq · 2019-10-30T15:43:38Z

Then you turn off this warning? or rewrite your code to have a raise statement and a range check.

krux02 · 2019-10-30T15:44:28Z

How about: "Everything is fine as it is."? When the network code crashes because of an unhandled case, that is a bug that should be fixed. It is a good crash. Don't hide it.

dryajov · 2019-10-30T15:47:43Z

I don't see how this is network specific, any sort of bit shifting will land you this error, moreover, I bet that there's a huge amount of code out there that (unknowingly) relies on this behavior that will now start to randomly crash all over. The intent of the warning is to uncover all those places that need fixing before they crash your app.

Clyybber · 2019-10-30T15:56:05Z

Another case for a third integer type, the safe uint. (a uint with range and overflow checks)

mratsim · 2019-10-30T16:05:47Z

unsigned is not supposed to have overflow checks, when a number goes over the range it should roll over, including when narrowing the type.

Assuming we want to add all of that on the Nim unsigned integer, I would need new memory int types that behaves like the current unsigned int for the purposes of implementing VM and cryptography. Just like pointers and ptr UncheckedArray they would mean "I know what I'm doing".

Otherwise unsigned would be the first Nim feature with no low-level escape hatch, this also has:

runtime costs
code auditing cost
code size cost

Unlike signed integers which are general purpose, unsigned main use-case is having a tight control over the memory layout and the code generated.

krux02 · 2019-10-30T16:46:09Z

@mratsim I am entirely sure what you want to say. Especially with this memory int types.

But I can ensure you, that only the conversion between the types has a range check.

Otherwise unsigned would be the first Nim feature with no low-level escape hatch

Remember there is also always the conversion with cast that never has a range check.

... including when narrowing the type

Are you proposing that cast[byte](x) and byte(x) should be the same thing? What should I do when I want the old behavior of byte(x) back?

mratsim · 2019-10-30T19:37:16Z

I was under the impression that casting was for reinterpreting the raw memory i.e. it would only be valid between types of the exact same size.

The old behaviour of byte in 0.20.2 and before was the one I was describing.
The current behaviour is a regression introduced in 1.0 that has the potential of breaking all the past networking, graphics, VM, emulators, crypto and probably kernel in Nim projects codes written.

What is even worse is that it's a change that will not be caught at compile-time but at runtime, potentially only after a long time until we have an overflow.

And the ugly is that someone who wants to not use this behavior has no recourse. I don't have alternative types that behave like a machine integer anymore. This is not expected from a system programming language, you need an escape hatch for low-level programming.

Lastly, in many cases and especially for cryptography, you might want not to throw any exceptions at all on certain part of the codebase. Now as soon as you use int or uint, you have an OverflowError tag that will pollute the effect system which makes one of the most promising usage of the effect system useless.

If you want the current 1.0 behavior you can use:

proc safe_byte(x: int): byte =
  if x > int(high(int8)):
    raise newError(RangeError, "Not in the 0 .. 255 range: " & $x)
  byte(x)

I want to emphasize that the issue is not just byte, it's all unsigned integer types as well.

Unsigned -> unsigned

Unsigned -> unsigned conversion should be runtime check free

Works in 0.20.2 // Fails in 1.0.x

let x = 1'u64 shl 32 + 1
echo uint32(x)

Signed -> unsigned conversion

Signed -> unsigned should have range check because signed integers are checked.

This was the case also in the past

let x = -1'i64
echo uint32(x)

Signed -> signed conversion

Signed -> signed should have range check because signed integers are checked.

This was not the case in 0.20.2 and has been fixed in 1.0 (and introduced the current regression)

let x = 1'u64 shl 32 + 1
echo int32(x)

Araq · 2019-10-30T20:08:54Z

I'm changing my mind about this issue and agree with @mratsim more and more.

dryajov · 2019-10-30T21:39:08Z

I'm on the same boat, after some cursory literature review, e.g. https://en.wikipedia.org/wiki/Integer_overflow (citation 1) - a computation involving unsigned operands can never overflow, as per C11 standard, I tend to agree with this being at the very least surprising behavior. Nim is ofcourse free to make its own choices in this regard, but this seems to be a well established practice.

My suggestion with regards to the warning where mostly to enable early detection of issues that this change introduced, other than that I now agree with @mratsim assessment.

krux02 · 2019-10-30T21:42:03Z

I don't think that conversions to unsigned integers were not checked in the past. They were half ass range checked with special cases where range checking could not be implemented. Finally we have the int128 type in the compiler and we could implement all range checks correctly. uint64 was such an exception type that was left out for range checks.

My suggestion is that you at least write an RFC for the behavior that you want, because you want to change the spec.

@dryajov we are just talking about conversations into unsigned integer types. Not about computations within unsigned types. These already have the expected mod2 behavior.

krux02 · 2019-10-30T21:47:23Z

Btw, I really think we should not touch this thing anymore. It was a big pain in the ass to get it all working correctly to the spec. It all works according to the spec. Using cast[byet](x) if you don't want a really dosn't seem to be a big deal. and btw, I made sure that casting between integer types of different sizes works correctly on the VM as well.

dryajov · 2019-10-30T22:22:08Z

@dryajov we are just talking about conversations into unsigned integer types. Not about computations within unsigned types. These already have the expected mod2 behavior.

Then I would say that this is even less consistent, if computation already behaves in this way I would expect conversion to work the same.

dryajov · 2019-10-30T22:27:02Z

Btw, I really think we should not touch this thing anymore. It was a big pain in the ass to get it all working correctly to the spec. It all works according to the spec. Using castbyet if you don't want a really dosn't seem to be a big deal. and btw, I made sure that casting between integer types of different sizes works correctly on the VM as well.

If we don't do something about it, existing code is going to trip on this pretty badly. At the very least this would require everybody to audit their code or expect random crashes, I'm not sure this is the best experience for end users.

To avoid this we either a) revert to the previous behavior and allow conversion to behave just like computation does b) at the very least introduce a warning that allows people to find and fix their code.

krux02 · 2019-10-31T01:39:50Z

@dryajov you want the old behavior where var x: uint64 = -1 is fine, but var y: uint8 = 300 is not?

Just a reminder let x = byte(300) was always illegal.

dryajov · 2019-10-31T02:34:26Z

@krux02 I would go with what @mratsim has suggested so far, I.e. making sure that it is at least consistent with 0.20.2.

On a broader note, maybe arithmetic types need some revising, but I'm certainly not an expert on Nim nor type systems to have a super strong opinion. At this point tho, anything that touches this would probably be a breaking change and won't come before the next major release.

Btw, thanks for the awesome job on getting to 1.0. Nim is truly a special language and you've all done an amazing job!

Araq · 2019-10-31T06:33:33Z

My suggestion is that you at least write an RFC for the behavior that you want, because you want to change the spec.

Correct, the bugfixes requires a spec update. I'm willing to do that because people have to come to rely on the old behaviour, so the spec should reflect the behavior of 0.20.

mratsim · 2019-10-31T09:19:48Z

@dryajov you want the old behavior where var x: uint64 = -1 is fine, but var y: uint8 = 300 is not?

Just a reminder let x = byte(300) was always illegal.

This is about conversion between unsigned types. Your second example is using signed int.

This used to work

let x = 300'u64
let y = byte(x)

Note that the fact that this doesn't let x = byte(300'u64) is due to the VM representing everything as an int internally for constant folding.

krux02 · 2019-10-31T09:53:04Z

let x = byte(300'u64) isn't computed in the VM. It is computed in semfold.nim.

arnetheduck · 2019-10-31T13:36:08Z

I would generally expect all conversions to unsigned not to raise - that's why I use unsigned in the first place, to get wrapping behaviour and I don't see why the function that converts to unsigned should be any different. the mirror operation, uint->int should raise however - int is a checked type

By this logic, byte(-1) is a totally permissible expression that resolves to the byte 0xff both at runtime and at compile time (on a two's complement machine).

generally, I'd also think that byte(300'u64) should be allowed - why should runtime and compile time have different behaviour? it is useful when wanting to find the bit pattern of some large number without calculating it in your head, and it's quite odd that var x = 300'u64; byte(x) would be handled differently.

zah · 2019-10-31T13:40:31Z

There is a way to follow the spec and to address @mratsim's concerns at the same time.

The narrowing conversion can be modelled to have a pre-condition that the value must be in range. When the compiler is not sure that this pre-condition is satisfied, it must insert a range check before the narrowing. But if the control flow demonstrates that the pre-condition is satisfied, the range check can be omitted.

How does the compiler know that the pre-condition is satisfied? Well, this is exactly the type of data-flow analysis that will become possible with the planned integration of Z3. It boils down to noticing that expressions such as intVal and 0xff are within a certain range and the same is also ensured by branches in the code such as if value < 256: ....

This will allow us to have a simple non-casting template looking like this:

template truncate*(val: SomeUnsigned, TargetType: typedesc): TargetType =
  TargetType(val and high(TargetType))

The bit-masking should be included in the peephole optimisation patterns of the back-end compiler and it produces exactly the same code as the "machine truncation" byte(int_val). Check out the compiled code here:

https://godbolt.org/z/DD-uJc

Unguarded narrowing can become a warning for now and an error in the future. truncate can use a cast until the more sophisticated data-flow analysis is in place.

mratsim · 2019-10-31T14:25:31Z

It is fine to implement such an optimization for signed integers.
However unsigned integers should not have any check at all, and this should be guaranteed.

This is an escape hatch required to write any low-level code were the developer knows better than the default language.

In particular, crypto developers already have to jump through some hoops to ensure that the C compiler never produce an "if" instruction see: https://github.com/veorq/cryptocoding.
Nim shouldn't add its own difficulties on top.
It needs a family of integer types that guarantees never to branch or throw an exception to be usable for cryptography.

Additionally, I would argue that any change that adds or remove thrown exceptions should be considered a breaking API change (in that case it was announced for signed int, and is a regression for unsigned).

Note that this is not only true for crypto, if we want to add GPU code generation in the future we will need to not generate such checks/exceptions as well.

Araq · 2019-10-31T15:01:15Z

Here you go, nim-lang/RFCs#175

Araq · 2019-11-16T09:04:15Z

This one works for me fwiw:

let t: uint = 3
assert 0 < t

Same for:

let t: uint32 = 0
echo t xor 0o777 # fine
echo 0o777 xor t # nope

disruptek · 2019-11-16T16:36:14Z

Never mind, I must have had 245a954 in my compiler, which means this shouldn't work in 1.0. Duh.

Araq · 2019-11-20T16:35:25Z

Fixed, 1.0.4 will have it too.

mratsim added the Regression label Oct 29, 2019

Araq added the Showstopper label Oct 31, 2019

Araq mentioned this issue Oct 31, 2019

Conversion to unsigned integers shall not produce range checks nim-lang/RFCs#175

Closed

disruptek mentioned this issue Nov 16, 2019

Nimble Installation Fails disruptek/golden#15

Closed

Araq closed this as completed Nov 20, 2019

tersec mentioned this issue Aug 21, 2020

Inconsistent unsigned -> signed RangeDefect usage across integer sizes #15210

Closed

unsigned conversion regression #12554

unsigned conversion regression #12554

Comments

mratsim commented Oct 29, 2019

kraptor commented Oct 30, 2019

Araq commented Oct 30, 2019 • edited Loading

Araq commented Oct 30, 2019

dryajov commented Oct 30, 2019

Araq commented Oct 30, 2019

dryajov commented Oct 30, 2019

krux02 commented Oct 30, 2019

Araq commented Oct 30, 2019

dryajov commented Oct 30, 2019 • edited Loading

krux02 commented Oct 30, 2019

dryajov commented Oct 30, 2019

krux02 commented Oct 30, 2019

dryajov commented Oct 30, 2019 • edited Loading

Araq commented Oct 30, 2019 • edited Loading

krux02 commented Oct 30, 2019

dryajov commented Oct 30, 2019

Clyybber commented Oct 30, 2019

mratsim commented Oct 30, 2019 • edited Loading

krux02 commented Oct 30, 2019

mratsim commented Oct 30, 2019 • edited Loading

Unsigned -> unsigned

Signed -> unsigned conversion

Signed -> signed conversion

Araq commented Oct 30, 2019

dryajov commented Oct 30, 2019

krux02 commented Oct 30, 2019

krux02 commented Oct 30, 2019

dryajov commented Oct 30, 2019

dryajov commented Oct 30, 2019

krux02 commented Oct 31, 2019 • edited Loading

dryajov commented Oct 31, 2019

Araq commented Oct 31, 2019

mratsim commented Oct 31, 2019

krux02 commented Oct 31, 2019

arnetheduck commented Oct 31, 2019

zah commented Oct 31, 2019 • edited Loading

mratsim commented Oct 31, 2019 • edited Loading

Araq commented Oct 31, 2019

Araq commented Nov 16, 2019 • edited Loading

disruptek commented Nov 16, 2019

Araq commented Nov 20, 2019

Araq commented Oct 30, 2019 •

edited

Loading

dryajov commented Oct 30, 2019 •

edited

Loading

dryajov commented Oct 30, 2019 •

edited

Loading

Araq commented Oct 30, 2019 •

edited

Loading

mratsim commented Oct 30, 2019 •

edited

Loading

mratsim commented Oct 30, 2019 •

edited

Loading

krux02 commented Oct 31, 2019 •

edited

Loading

zah commented Oct 31, 2019 •

edited

Loading

mratsim commented Oct 31, 2019 •

edited

Loading

Araq commented Nov 16, 2019 •

edited

Loading