-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"-NUM.method()" is parsed in surprising ways #117155
Comments
This also made me question my sanity some time ago, a lint would have been very helpful. The code in question was assert_eq!(1_i32, -1_i32.rem_euclid(12)) // should be (-i_i32) |
I would personally be very surprised if This matches similar precedence rules in maths, like how -2x is -(2x). Another good test is if adding 0 to the beginning would change the result: |
In math I would insist on parentheses since -2^x looks ambiguous.
That's not at all how think about negative integer literals. It's great that your intuition aligns with the current parser, but that doesn't mean that everybody's intuition aligns with the parser. I literally thought the |
I too would be very surprised if it was not parsed the way it currently is. If To my knowledge, every language with both unary minus and a I also don't think it's fair to say that the compiler is disassociating the |
It doesn't matter what the lexer does internally. For many humans reading and writing code, "-5" is an atom that cannot be split meaningfully. If the lexer doesn't work like that then it is up to us to minimize the fallout and confusion stemming from that decision. This is true even if the alternative parsing would be just as surprising in other contexts. Thay just means we should steer the user towards code that is unambiguous.
|
Even ignoring what the lexer does, my point is that for many other humans, especially those who work in mathematics or computing, As a side note, |
The way it's currently being parsed feels very natural and correct to me as well. It not only matches similar rules in math as pointed out earlier but is also how it works in every language that allows an equivalent syntax. Adding extra parentheses in situations like this would just feel like visual noise to me, personally. |
Note that there are cases where a negative literal is treated as a fused literal, in particular in cases with the I don't really think that any of the proposed suggestions for negation really work well, though. Adding an extra space looks like a typo, and adding extra parentheses also makes rewriting code more obnoxious since it's a non-local change (adding a negative sign requires adding a closing parenthesis at the end of the expression). Even though the precedence rules are well-motivated, parentheses exist regardless and Rust even has lints that encourage using them even in unambiguous situations, like converting So, I do think that at minimum there could be an allow-by-default lint suggesting one or more of these methods, and/or style rules for |
As a data point, I work in computing. Programming languages are literally my job. I guess we'd need a study to see how many "many" is. 🤷
It's not realistic to change the parser so we agree on the outcome here, if not on the reasons.
I'm not aware of math having well-defined operator precedence for cases like this. Do you have a source for this claim?
Yeah, that's why I was thinking a lint would make sense here. FWIW those cases are ambiguous if you can't remember whether "and" or "or" binds stronger. And in languages like bash, they actually bind equally strong (and are left-associative), so this is inconsistent across languages. I don't think it is realistic for everyone to remember all the precedence rules, so I agree with the compiler that the parentheses should be added here to avoid ambiguity. |
I doubt that many mathematicians would consider "-5" anything but a single unit, the negative integer that is the additive inverse of 5, just like they wouldn't consider "½" a division but the rational number that's the multiplicative inverse of 2. |
Interestingly, As someone who can't imagine this expression being parsed any other way, I'm curious how everyone else views whitespace. My assumption has always been that I can add arbitrary whitespace around any operator and get the same results. So if you find the way this expression is parsed confusing, how would you mentally parse the following expressions?
|
Even though this wasn't directed at me, I figured I'd say that my primary argument is based upon the exponent rules (which always bind exponents before unary minus in mathematics), and the fact that function calls are generally put at the highest precedence above even that. Note that these rules are variable, for example, some programs like Excel will bind the unary minus before exponents, but I consider this unnatural since it goes against standard mathematics. In primary school, we were explicitly taught that expressions like -2x were a potentially confusing case under these rules, although if you consider unary minus the same as any other arithmetic operator, it makes sense that it would have lower precedence. Function calls having the highest precedence has also been the standard since C, and I would imagine that methods would have similarly high precedence. Since unary minus doesn't have to apply to just numeric literals, I would always imagine it to be parsed separately, since as shown, the precedence of |
From the peanut gallery: there's an additional complication for minimum signed integer values:
|
The parser parses everything according to standard precedence rules, treating
How does your brain parse
You've added another operator ( |
While this makes sense, it's certainly surprising and I never thought of this case. I definitely agree that situations like this should have a warn-by-default lint at minimum, and maybe there can be a specific list of methods where this particular parsing rule has "weird" results, like this one. |
To clarify, the method being called doesn't matter here. It's a compile-time error because the compiler can't create an |
My intuition parses
Yeah you mentioned that above. I wasn't aware of this rule, but Wikipedia confirms. Wikipedia also lists some exceptions where the rule does not apply, though. Anyway, while we can go on discussing which version is "more intuitive", that's not quite the point of this issue. The point of this issue is that the current parsing is ambiguous and surprising to some people. There's two outcomes here:
I prefer outcome 2, but I don't have the data to prove that it's ambiguous for "sufficiently many" people. In this issue we seem to have a fairly even split, but of course this is far from representative.
C doesn't have method calls so the issue can't even arise there? But I take it this means Java and C++ parse this like Rust. That's a good argument to keep the parser as-is, but it's not an argument against a lint IMO. |
C actually does have method calls since you can put function pointers in structs, although since function application has a higher precedence than field access, you need to add parentheses anyway. Although I should clarify my position here: I'm fully in support of adding rules to both rustfmt and rust, it's mostly whether they'd be enabled by default that is my particular concern. |
Thanks for clarifying. I do think the lint should be warn-by-default, since otherwise it won't actually help people that run into this without realizing the possibility of running into this (like my past self). |
How languages handle exponentiation is a close analogy. If a language were to treat However, in Fortran (1957-), Perl, Python, and Ruby, In Haskell, And in Rust, Linting may seem somewhat opinionated given the precedent and history. We should also think about symmetry with the other unary operator, #[test]
fn test() {
assert_eq!(127, !2u8.pow(7));
assert_eq!(false, !2u8.pow(7).is_power_of_two());
} (I would not expect linting here. But I would expect consistency with |
Mathematically, adding a negative sign is taught as being exactly equivalent to a multiplication by negative one. That's at least one justification for why |
And how may people programming in these languages know this rule? In a Python code review I would definitely insist on adding parentheses to
No. These are clearly not literals.
|
As @chfogelman described, the |
I definitely support warn-by-default. Parsing is invisible to the user, so if you don't know that operator precedence is the issue, it'll take a very long time to figure out. I don't think
Yep, and so does |
It's not just overflow checks. It's part of turning the tokens into an AST. |
At the level of AST generation, the literal is always ProofWe can see this with fn main() {
_ = -128i8;
} Becomes: Expr {
kind: Unary(
Neg,
Expr {
kind: Lit(
Lit {
kind: Integer,
symbol: "128",
suffix: Some(
"i8",
),
},
),
attrs: [],
tokens: None,
},
),
attrs: [],
tokens: None,
}, And this: fn main() {
_ = !128i8;
} Becomes: Expr {
kind: Unary(
Not,
Expr {
kind: Lit(
Lit {
kind: Integer,
symbol: "128",
suffix: Some(
"i8",
),
},
),
attrs: [],
tokens: None,
},
),
},
),
attrs: [],
attrs: [],
tokens: None,
}, Here's one place where there is special handling in the compiler: compiler/rustc_lint/src/types.rs:414 fn lint_int_literal<'tcx>(
...
) {
...
if (negative && v > max + 1) || (!negative && v > max) {
...
}
} Looking at: fn main() {
dbg!(!127i8);
dbg!(-128i8);
} In the MIR, these both end up as |
@traviscross rust/compiler/rustc_mir_build/src/thir/cx/expr.rs Lines 503 to 512 in 2f1bd07
Basically, nothing in the entire compiler treats I can get behind linting |
@chfogelman: Interesting. Thanks for pointing to where that happens during MIR lowering. Agreed |
I think that no matter what here, the solution shouldn't be motivated by what the parser actually does here, and instead by what people expect the parser should do. The reason why I personally would rather not format However, my expectations are not everyone else's expectations, and that's reason to look into this. Explaining what the parser does here ultimately doesn't make a difference because we're not talking about the current implementation, but how we expect the parser to act regardless of implementation. |
fwiw, Ruby parses it the way that's expected in the original issue (different from Rust): -2.abs
#=> 4
x = 2
-x.abs
#=> -4 |
Rollup merge of rust-lang#121364 - Urgau:unary_precedence, r=compiler-errors Implement lint against ambiguous negative literals This PR implements a lint against ambiguous negative literals with a literal and method calls right after it. ## `ambiguous_negative_literals` (deny-by-default) The `ambiguous_negative_literals` lint checks for cases that are confusing between a negative literal and a negation that's not part of the literal. ### Example ```rust,compile_fail -1i32.abs(); // equals -1, while `(-1i32).abs()` equals 1 ``` ### Explanation Method calls take precedence over unary precedence. Setting the precedence explicitly makes the code clearer and avoid potential bugs. <details> <summary>Old proposed lint</summary> ## `ambiguous_unary_precedence` (deny-by-default) The `ambiguous_unary_precedence` lint checks for use the negative unary operator with a literal and method calls. ### Example ```rust -1i32.abs(); // equals -1, while `(-1i32).abs()` equals 1 ``` ### Explanation Unary operations take precedence on binary operations and method calls take precedence over unary precedence. Setting the precedence explicitly makes the code clearer and avoid potential bugs. </details> ----- Note: This is a strip down version of rust-lang#117161, without the binary op precedence. Fixes rust-lang#117155 `@rustbot` labels +I-lang-nominated cc `@scottmcm` r? compiler
Implement lint against ambiguous negative literals This PR implements a lint against ambiguous negative literals with a literal and method calls right after it. ## `ambiguous_negative_literals` (deny-by-default) The `ambiguous_negative_literals` lint checks for cases that are confusing between a negative literal and a negation that's not part of the literal. ### Example ```rust,compile_fail -1i32.abs(); // equals -1, while `(-1i32).abs()` equals 1 ``` ### Explanation Method calls take precedence over unary precedence. Setting the precedence explicitly makes the code clearer and avoid potential bugs. <details> <summary>Old proposed lint</summary> ## `ambiguous_unary_precedence` (deny-by-default) The `ambiguous_unary_precedence` lint checks for use the negative unary operator with a literal and method calls. ### Example ```rust -1i32.abs(); // equals -1, while `(-1i32).abs()` equals 1 ``` ### Explanation Unary operations take precedence on binary operations and method calls take precedence over unary precedence. Setting the precedence explicitly makes the code clearer and avoid potential bugs. </details> ----- Note: This is a strip down version of rust-lang/rust#117161, without the binary op precedence. Fixes rust-lang/rust#117155 `@rustbot` labels +I-lang-nominated cc `@scottmcm` r? compiler
This came up here but I am pretty sure I ran into this before: Rust parses
-5.5f32.method()
as-(5.5f32.method())
. I don't know if that is the parsing result that anyone expects, I certainly would never expect it-5.5f32
is a float literals, and that is what I want to apply the method to, I never even considered that the compiler might interpret this as a unary minus applied to the method result.Assuming that we cannot change the parser, I think we should spend some effort defending against the possible confusion here. For instance:
-5.5f32.method()
(suggesting to write-(5.5f32.method())
if that's really what one meant)The text was updated successfully, but these errors were encountered: