Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add noIrregularWhitespace rule #3333

Merged
merged 19 commits into from
Jul 11, 2024

Conversation

michellocana
Copy link
Contributor

@michellocana michellocana commented Jul 1, 2024

Summary

Implementation of the no-irregular-whitespace rule from ESLint in Biome.
I also had to do a change in the biome_diagnostics crate to validate the range of irregular whitespaces properly and show them correctly in the output.

Closes #53

Valid

const foo = '\\\u2028';
const foo = '\\\u2029';
/* \u{b}    */ const foo = '�';
/* \u{c}    */ const foo = '';
/* \u{20}   */ const foo = ' ';
/* \u{85}   */ const foo = '�';

Invalid

/* \u{b}    */ const�foo�='thing';
/* \u{c}    */ constfoo='thing';
/* \u{85}   */ const�foo�='thing';
/* \u{feff} */ constfoo='thing';
/* \u{a0}   */ const foo = 'thing';
/* \u{1680} */ constfoo='thing';

Test Plan

Added snapshots, for valid and invalid cases according to the original rule tests.

@github-actions github-actions bot added A-Project Area: project A-Linter Area: linter A-Parser Area: parser L-JavaScript Language: JavaScript and super languages L-JSON Language: JSON and super languages A-Diagnostic Area: diagnostocis labels Jul 1, 2024
@michellocana michellocana changed the title feat: add noIrregularWhitespace rule feat: add noIrregularWhitespace rule Jul 1, 2024
Copy link

codspeed-hq bot commented Jul 1, 2024

CodSpeed Performance Report

Merging #3333 will degrade performances by 7.73%

Comparing michellocana:feat/no-irregular-whitespace (1229ea2) with main (8eb0e8a)

Summary

⚡ 1 improvements
❌ 2 regressions
✅ 105 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main michellocana:feat/no-irregular-whitespace Change
js_analyzer[lint_11654203871763747189.ts] 26.4 ms 28.6 ms -7.6%
react.production.min_3378072959512366797.js[cached] 2 ms 1.9 ms +6.82%
db_2930068967297060348.json[cached] 12.5 ms 13.5 ms -7.73%

Comment on lines 77 to 79
let syntax = node.syntax();
let node_text = syntax.text_trimmed();
let range_start: u32 = node.range().start().into();
Copy link
Member

@ematipico ematipico Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can evaluate a different approach. As you might notice in this example, the irregular whitespaces are inside trivia.

This information is already present inside the nodes, inside the trivia attached to those nodes. You can retrieve trivia from tokens. You have multiple ways to retrieve the tokens: you directly query all tokens via Ast<JsSyntaxToken>, use some methods from syntax (there are many depending on the situation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't know we can query all tokens like that, that definitely makes more sense, I will look into that :)

@ematipico ematipico force-pushed the feat/no-irregular-whitespace branch from f0f1ee1 to 17b1b42 Compare July 4, 2024 07:20
}

impl Rule for NoIrregularWhitespace {
type Query = Ast<AnyJsStatement>;
Copy link
Member

@ematipico ematipico Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the thing about this query, and why it doesn't play well with the logic of the rule. This returns any statement, and in this example, you have multiple statements:

const f = function f() {
	if (true) {} else {}
}

The query will trigger for all the statements of this snippet. However, for each statement, you retrieve ALL tokens, which means that, for example, the else token is checked multiple times because it is returned by the majority of the statements you're querying.

That's why you have this massive regression in performance; the rule is doing repeated work, A LOT. I believe the best strategy is to query the root node of the CST, so you're sure that you aren't doing repeated work on the same tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I can make the change to query only the root node (I assume it's the JsModule module one?), but that would mean that we can only have one error report about irregular whitespaces per file, right? As far as I know there's no way to report multiple error ranges on a lint rule

Copy link
Member

@ematipico ematipico Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I can make the change to query only the root node (I assume it's the JsModule module one?), but that would mean that we can only have one error report about irregular whitespaces per file, right? As far as I know there's no way to report multiple error ranges on a lint rule

Yes, there is: https://github.com/biomejs/biome/blob/main/crates/biome_analyze/CONTRIBUTING.md#multiple-signals

It gets more challenging to work because you can't do things like ok()?, but it's possible and many rules this approach already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh, okay, I'm gonna dig into that, thanks!

Copy link
Contributor Author

@michellocana michellocana Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ematipico just did this change, worked like a charm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, the benchmarks still failed, I will keep working on that

@michellocana michellocana force-pushed the feat/no-irregular-whitespace branch 2 times, most recently from 9582f47 to 46ce89c Compare July 5, 2024 17:53
@github-actions github-actions bot added the A-CLI Area: CLI label Jul 5, 2024
@michellocana michellocana requested a review from ematipico July 5, 2024 18:40
@ematipico ematipico requested a review from a team July 9, 2024 21:03
@michellocana michellocana force-pushed the feat/no-irregular-whitespace branch from 63f8ab1 to 0f1b7d5 Compare July 9, 2024 21:14
@unvalley
Copy link
Member

@michellocana Thank you, could you resolve the conflicts please?

@michellocana michellocana force-pushed the feat/no-irregular-whitespace branch from 0f1b7d5 to 1229ea2 Compare July 11, 2024 19:46
@michellocana
Copy link
Contributor Author

michellocana commented Jul 11, 2024

@michellocana Thank you, could you resolve the conflicts please?

@unvalley done!

@ematipico ematipico merged commit 4284b19 into biomejs:main Jul 11, 2024
11 of 12 checks passed
@ematipico
Copy link
Member

Thank you @michellocana for this rule. If you have the chance, can you send a PR to add a changelog entry?

@michellocana
Copy link
Contributor Author

Thank you @michellocana for this rule. If you have the chance, can you send a PR to add a changelog entry?

sure, here it is @ematipico :) #3418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-CLI Area: CLI A-Diagnostic Area: diagnostocis A-Linter Area: linter A-Parser Area: parser A-Project Area: project L-JavaScript Language: JavaScript and super languages L-JSON Language: JSON and super languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

📎 Implement nursery/noIrregularWhitespace - eslint/no-irregular-whitespace
4 participants