enhancement(remap): add integer division #5353

StephenWakely · 2020-12-03T12:51:42Z

Closes #3729

Discussion in that issue stalled, so I have moved forward to try to satisfy all the issues raised there..

This PR makes two changes.

Ordinary division is now always float division. Before, if the first parameter was an Integer it would be Integer division. So 9 / 12 == 0. Now it will be 9 / 12 == 0.75. It does mean that even if the result is a whole number it will be a float - 4 / 2 == 2.0
There is a new integer division operator - //. So 9 // 12 == 0. Float parameters are cast to an integer.

Signed-off-by: Stephen Wakely [email protected]

Signed-off-by: Stephen Wakely <[email protected]>

JeanMertz

LGTM.

It does mean that even if the result is a whole number it will be a float - 4 / 2 == 2.0

Does it make sense to check if we have a fraction and return an integer if we don't (if both lhs and rhs are integers)?

e.g. v == (v as i64) as f64

StephenWakely · 2020-12-03T14:08:26Z

Does it make sense to check if we have a fraction and return an integer if we don't (if both lhs and rhs are integers)?

e.g. v == (v as i64) as f64

This is how the json parser works - if it can be an integer it becomes an integer, otherwise a float. But, I personally prefer to return a consistent type. Am open to objections if anyone prefers otherwise?

JeanMertz · 2020-12-03T14:14:08Z

Personally, having 4 / 2 return 2.0 isn't "consistent" to me. I'd prefer it if we return an integer if both sides are integers. I am not advocating to return an integer if someone does 4.0 / 2, that should still return 2.0 (same for 4 / 2.0).

StephenWakely · 2020-12-03T14:18:55Z

Personally, having 4 / 2 return 2.0 isn't "consistent" to me.

The inconsistency I mean is 4 / 2 would return an Integer, but 5 / 2 would return a Float. In a real example, where at least of those sides would be a variable sometimes .foo / 2 would be a Integer and other times it would be a Float - the type would only be determined at runtime.

JeanMertz · 2020-12-03T14:55:26Z

The inconsistency I mean is 4 / 2 would return an Integer, but 5 / 2 would return a Float.

Right. To me, I would describe this not as inconsistency, but as "the most useful result". I get that from a type-level perspective, it's inconsistent, but as a user, I wouldn't want 5 / 2 to resolve to 3.

We could have / do "what you most likely want" and have // do what is "technically correct", maybe?

In a real example, where at least of those sides would be a variable sometimes .foo / 2 would be a Integer and other times it would be a Float - the type would only be determined at runtime.

I don't see how that's a problem in this case. If, at runtime, .foo is a float, then the result will be a float as well.

StephenWakely · 2020-12-03T16:25:11Z

If, at runtime, .foo is a float, then the result will be a float as well.

If I've understood your suggestion, then if .foo == 4 then the result would be an int.

All told though, I don't think it will have a negative impact to how things work, so I am happy to make the change. The fact it is an Int will not prevent you from doing anything that you could do if it was a Float. Plus it does remain consistent with the Json parsing.

bruceg · 2020-12-03T17:02:35Z

Unless we have a pattern of converting a Float into an Integer when possible (note that this must include both the rounding discussed here and overflow detection), then I would dissent. Operations should have a consistent result dependent on the types of the arguments but not on the values. So, A + B has one result type. A * B has one result type. So also A / B should have a single result type, particularly if A is an event field. I usually work exclusively with integers and so have normalized the silent truncation, but mathematically a Float result is much less surprising.

Parsing JSON where a number may be parsed as either an Integer or a Float depending on the value is a different case. There is a textual difference between floats and integers that makes such an automatic conversion much more obvious and less surprising.

Of course, if everything downstream for this treats Integers and Floats equivalently, this is all pretty much moot, and the only reasonable thing is to do whatever is easiest. I don't think that is actually the case, though.

StephenWakely · 2020-12-04T00:21:10Z

@binarylogic We have reached a bit of a stalemate. Do you have an opinion on this.

5 / 2 == 2.5 (always a float)

So should:

4 / 2 == 2 (an integer)

or should

4 / 2 == 2.0 (a float)

?

binarylogic · 2020-12-04T00:50:56Z

🧑‍⚖️

I lean towards consistent types, especially since Vector will be writing to some storages that require a strict schema. For example, Elasticsearch might see an integer as a field's first value and set the field type as integer, and then subsequent float values would be unexpectedly rounded.

Therefore, I vote for:

5 / 2 = 2.5
4 / 2 = 2.0

My rationale is:

5 / 2 = 2.5 is the least surprising.
4 / 2 = 2.0 fits into my consistent types argument above.

bruceg

Code looks ok except for conversion issues. I'm not sure if that's in scope for this issue.

bruceg · 2020-12-05T00:50:22Z

lib/remap-lang/src/value.rs

@@ -315,14 +318,26 @@ impl Value {
        let err = || Error::Div(self.kind(), rhs.kind());

        let value = match self {
-            Value::Integer(lhv) => (lhv / i64::try_from(&rhs).map_err(|_| err())?).into(),
+            Value::Integer(lhv) => (lhv as f64 / f64::try_from(&rhs).map_err(|_| err())?).into(),


This is one of those silent data corruption problems that Rust doesn't warn about. i64 as f64 is not a lossless conversion (it will lose precision for values > 2^53, see https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d9425e2eef14bb569653638ab3076ff5), so try_from may be appropriate on both sides.

I doubt this is the only place this shows up, so it might not be in scope for this issue.

Is there such a try_from function between float and int? I can't find any.

The f64::try_from(Value that we are using here is just doing under the hood:

match value { Value::Integer(v) => Ok(*v as f64), Value::Float(v) => Ok(*v), _ => Err(Error::Coerce(value.kind(), Kind::Float)), }

I thought there was, but I can't find it now so I must be mistaken. It would have to be emulated:

let result = value as f64; match result as i64 { value => Ok(result), _ => Err(Error::Coerce(…)), }

Again, it might be out of scope for this particular PR, but something to consider making an issue of.

I can't see how that emulation would work. Rust won't let you compare an i64 with an f64, so you would have to cast result back to an f64 - by which point the comparison is pointless.

I can't quite see a good way around this, so I think I will close this and raise an issue.

That's exactly what this is doing, by casting value to f64 and then back to i64, any conversion failure will cause that value to be different than the original.

I agree, though, that should probably go in a new issue.

Oh, yes.. Good point.

I'm not sure that helps though if the number is a power of 2. Take 36028797018963968 as i64. Converted to a f64 that is 36028797018963970. But that number converted back to i64 is back to the original 36028797018963968.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1b03fa41be91569e49e7446e2b87ecff

bruceg · 2020-12-05T00:59:54Z

lib/remap-lang/src/value.rs

+
+        let value = match &self {
+            Value::Integer(lhv) => (lhv / i64::try_from(&rhs).map_err(|_| err())?).into(),
+            Value::Float(lhv) => (*lhv as i64/ i64::try_from(&rhs).map_err(|_| err())?).into(),


Here too, values larger than 2^63 silently turn into gibberish: 50000000000000000000.0 as i64 == 9223372036854775807.

Signed-off-by: Stephen Wakely <[email protected]>

StephenWakely · 2020-12-13T09:54:31Z

Have raised #5530 to look into the issues of converting between floats and ints.

StephenWakely added 2 commits December 3, 2020 12:31

Add integer division

d11073f

Signed-off-by: Stephen Wakely <[email protected]>

Integer division should cast floats to integers

0c1d24e

Signed-off-by: Stephen Wakely <[email protected]>

StephenWakely requested review from JeanMertz and bruceg December 3, 2020 12:51

JeanMertz approved these changes Dec 3, 2020

View reviewed changes

jamtur01 assigned StephenWakely Dec 3, 2020

jamtur01 added domain: vrl Anything related to the Vector Remap Language transform: remap Anything `remap` transform related labels Dec 3, 2020

jamtur01 added this to the 2020-11-23: Pseudo-chitin armor milestone Dec 3, 2020

bruceg reviewed Dec 5, 2020

View reviewed changes

JeanMertz mentioned this pull request Dec 12, 2020

VRL - support for comments #5443

Closed

Formatting

a42be77

Signed-off-by: Stephen Wakely <[email protected]>

StephenWakely merged commit 565bc52 into master Dec 13, 2020

StephenWakely deleted the 3729_integer_division branch December 13, 2020 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement(remap): add integer division #5353

enhancement(remap): add integer division #5353

StephenWakely commented Dec 3, 2020

JeanMertz left a comment •

edited

Loading

StephenWakely commented Dec 3, 2020

JeanMertz commented Dec 3, 2020

StephenWakely commented Dec 3, 2020

JeanMertz commented Dec 3, 2020

StephenWakely commented Dec 3, 2020

bruceg commented Dec 3, 2020

StephenWakely commented Dec 4, 2020

binarylogic commented Dec 4, 2020 •

edited

Loading

bruceg left a comment

bruceg Dec 5, 2020

StephenWakely Dec 6, 2020

bruceg Dec 7, 2020

StephenWakely Dec 12, 2020

bruceg Dec 12, 2020

StephenWakely Dec 12, 2020

bruceg Dec 5, 2020

StephenWakely commented Dec 13, 2020

enhancement(remap): add integer division #5353

enhancement(remap): add integer division #5353

Conversation

StephenWakely commented Dec 3, 2020

JeanMertz left a comment • edited Loading

Choose a reason for hiding this comment

StephenWakely commented Dec 3, 2020

JeanMertz commented Dec 3, 2020

StephenWakely commented Dec 3, 2020

JeanMertz commented Dec 3, 2020

StephenWakely commented Dec 3, 2020

bruceg commented Dec 3, 2020

StephenWakely commented Dec 4, 2020

binarylogic commented Dec 4, 2020 • edited Loading

bruceg left a comment

Choose a reason for hiding this comment

bruceg Dec 5, 2020

Choose a reason for hiding this comment

StephenWakely Dec 6, 2020

Choose a reason for hiding this comment

bruceg Dec 7, 2020

Choose a reason for hiding this comment

StephenWakely Dec 12, 2020

Choose a reason for hiding this comment

bruceg Dec 12, 2020

Choose a reason for hiding this comment

StephenWakely Dec 12, 2020

Choose a reason for hiding this comment

bruceg Dec 5, 2020

Choose a reason for hiding this comment

StephenWakely commented Dec 13, 2020

JeanMertz left a comment •

edited

Loading

binarylogic commented Dec 4, 2020 •

edited

Loading