Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat(rome_css_parser): CSS lexer number and ident #4682 #4712

Merged
merged 1 commit into from
Jul 21, 2023

Conversation

@netlify
Copy link

netlify bot commented Jul 19, 2023

Deploy Preview for docs-rometools canceled.

Name Link
🔨 Latest commit f5e66b5
🔍 Latest deploy log https://app.netlify.com/sites/docs-rometools/deploys/64ba50265afa650008f172fa

@github-actions github-actions bot added the A-Tooling Area: our own build, development, and release tooling label Jul 19, 2023
@denbezrukov denbezrukov force-pushed the feat/css-lexer-number-ident branch from 120bc9f to 28d4c33 Compare July 19, 2023 08:24
@github-actions
Copy link

Parser conformance results on ubuntu-latest

js/262

Test result main count This PR count Difference
Total 48863 48863 0
Passed 47810 47810 0
Failed 1053 1053 0
Panics 0 0 0
Coverage 97.84% 97.84% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 40 40 0
Passed 37 37 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.50% 92.50% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 6212 6212 0
Passed 1764 1764 0
Failed 4448 4448 0
Panics 0 0 0
Coverage 28.40% 28.40% 0.00%

ts/babel

Test result main count This PR count Difference
Total 639 639 0
Passed 573 573 0
Failed 66 66 0
Panics 0 0 0
Coverage 89.67% 89.67% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 17224 17224 0
Passed 13121 13121 0
Failed 4103 4103 0
Panics 0 0 0
Coverage 76.18% 76.18% 0.00%

@denbezrukov denbezrukov force-pushed the feat/css-lexer-number-ident branch 2 times, most recently from d822c40 to 7022e59 Compare July 19, 2023 09:52
@denbezrukov denbezrukov requested a review from ematipico July 20, 2023 13:05
crates/rome_css_parser/src/lexer/mod.rs Show resolved Hide resolved
Comment on lines +219 to +221
fn assert_at_char_boundary(&self, offset: usize) {
debug_assert!(self.source.is_char_boundary(self.position + offset));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the compiler smart enough to remove this function in production builds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check it.
We have the same function in other parsers:

/// Asserts that the lexer is at a UTF8 char boundary
#[inline]
fn assert_at_char_boundary(&self) {
debug_assert!(self.source.is_char_boundary(self.position));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rust compiler and its LLVM backend are smart enough to recognize when functions are empty or when their output isn't used, and can often inline or eliminate such calls during the optimization phase. This process is sometimes referred to as "dead code elimination".

So, in a release build, the assert_at_char_boundary function should in theory have no runtime cost because it is empty, and the call to it in functions should be removed.

Comment on lines +244 to +246
unsafe {
core::hint::unreachable_unchecked();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't unreachable!() enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting.
Actually, I don't know :D
I've just copied it from:

fn current_char_unchecked(&self) -> char {
// Precautionary measure for making sure the unsafe code below does not read over memory boundary
debug_assert!(!self.is_eof());
self.assert_at_char_boundary();
// Safety: We know this is safe because we require the input to the lexer to be valid utf8 and we always call this when we are at a char
let string = unsafe {
std::str::from_utf8_unchecked(self.source.as_bytes().get_unchecked(self.position..))
};
let chr = if let Some(chr) = string.chars().next() {
chr
} else {
// Safety: we always call this when we are at a valid char, so this branch is completely unreachable
unsafe {
core::hint::unreachable_unchecked();
}
};
chr
}

fn current_char_unchecked(&self) -> char {
// Precautionary measure for making sure the unsafe code below does not read over memory boundary
debug_assert!(!self.is_eof());
self.assert_at_char_boundary();
// Safety: We know this is safe because we require the input to the lexer to be valid utf8 and we always call this when we are at a char
let string = unsafe {
std::str::from_utf8_unchecked(self.source.as_bytes().get_unchecked(self.position..))
};
let chr = if let Some(chr) = string.chars().next() {
chr
} else {
// Safety: we always call this when we are at a valid char, so this branch is completely unreachable
unsafe {
core::hint::unreachable_unchecked();
}
};
chr
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference between the two lies in their behavior and purpose:

unreachable!(): This macro is used to indicate a section of code that should never be reached under normal conditions. If the unreachable!() code path is executed, the program will panic, providing an error message and backtrace. This is mainly used for situations where you are confident that the code will never be reached, but if it does get reached due to a logic error, you want to know about it. The macro can also optionally take a custom message, like unreachable!("Custom message").

core::hint::unreachable_unchecked(): This function, on the other hand, is used for telling the compiler that a certain piece of code will never be reached, allowing it to eliminate the code. However, if this code path does get executed, it results in undefined behavior, meaning that anything can happen. It could crash, it could corrupt data, it could keep working as if nothing happened. It's a way to make a promise to the compiler. If you break that promise, all bets are off. This should be used sparingly and only when you're absolutely sure that this code will never be reached and you need every bit of performance you can get.

crates/rome_css_parser/src/lexer/mod.rs Show resolved Hide resolved
crates/rome_css_parser/src/lexer/mod.rs Show resolved Hide resolved
crates/rome_css_parser/src/lexer/mod.rs Show resolved Hide resolved
@denbezrukov denbezrukov force-pushed the feat/css-lexer-number-ident branch from 7022e59 to f5e66b5 Compare July 21, 2023 09:30
@github-actions github-actions bot added L-CSS Language: CSS A-Parser Area: parser labels Jul 21, 2023
@denbezrukov denbezrukov merged commit 9bc3630 into main Jul 21, 2023
@denbezrukov denbezrukov deleted the feat/css-lexer-number-ident branch July 21, 2023 16:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Parser Area: parser A-Tooling Area: our own build, development, and release tooling L-CSS Language: CSS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants