-
Notifications
You must be signed in to change notification settings - Fork 657
feat(rome_css_parser): CSS lexer number and ident #4682 #4712
Conversation
✅ Deploy Preview for docs-rometools canceled.
|
120bc9f
to
28d4c33
Compare
Parser conformance results on ubuntu-latestjs/262
jsx/babel
symbols/microsoft
ts/babel
ts/microsoft
|
d822c40
to
7022e59
Compare
fn assert_at_char_boundary(&self, offset: usize) { | ||
debug_assert!(self.source.is_char_boundary(self.position + offset)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the compiler smart enough to remove this function in production builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check it.
We have the same function in other parsers:
tools/crates/rome_js_parser/src/lexer/mod.rs
Lines 542 to 546 in 22bca8e
/// Asserts that the lexer is at a UTF8 char boundary | |
#[inline] | |
fn assert_at_char_boundary(&self) { | |
debug_assert!(self.source.is_char_boundary(self.position)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Rust compiler and its LLVM backend are smart enough to recognize when functions are empty or when their output isn't used, and can often inline or eliminate such calls during the optimization phase. This process is sometimes referred to as "dead code elimination".
So, in a release build, the assert_at_char_boundary
function should in theory have no runtime cost because it is empty, and the call to it in functions should be removed.
unsafe { | ||
core::hint::unreachable_unchecked(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't unreachable!()
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting.
Actually, I don't know :D
I've just copied it from:
tools/crates/rome_json_parser/src/lexer/mod.rs
Lines 177 to 196 in 22bca8e
fn current_char_unchecked(&self) -> char { | |
// Precautionary measure for making sure the unsafe code below does not read over memory boundary | |
debug_assert!(!self.is_eof()); | |
self.assert_at_char_boundary(); | |
// Safety: We know this is safe because we require the input to the lexer to be valid utf8 and we always call this when we are at a char | |
let string = unsafe { | |
std::str::from_utf8_unchecked(self.source.as_bytes().get_unchecked(self.position..)) | |
}; | |
let chr = if let Some(chr) = string.chars().next() { | |
chr | |
} else { | |
// Safety: we always call this when we are at a valid char, so this branch is completely unreachable | |
unsafe { | |
core::hint::unreachable_unchecked(); | |
} | |
}; | |
chr | |
} |
tools/crates/rome_js_parser/src/lexer/mod.rs
Lines 508 to 527 in 22bca8e
fn current_char_unchecked(&self) -> char { | |
// Precautionary measure for making sure the unsafe code below does not read over memory boundary | |
debug_assert!(!self.is_eof()); | |
self.assert_at_char_boundary(); | |
// Safety: We know this is safe because we require the input to the lexer to be valid utf8 and we always call this when we are at a char | |
let string = unsafe { | |
std::str::from_utf8_unchecked(self.source.as_bytes().get_unchecked(self.position..)) | |
}; | |
let chr = if let Some(chr) = string.chars().next() { | |
chr | |
} else { | |
// Safety: we always call this when we are at a valid char, so this branch is completely unreachable | |
unsafe { | |
core::hint::unreachable_unchecked(); | |
} | |
}; | |
chr | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main difference between the two lies in their behavior and purpose:
unreachable!()
: This macro is used to indicate a section of code that should never be reached under normal conditions. If the unreachable!() code path is executed, the program will panic, providing an error message and backtrace. This is mainly used for situations where you are confident that the code will never be reached, but if it does get reached due to a logic error, you want to know about it. The macro can also optionally take a custom message, like unreachable!("Custom message")
.
core::hint::unreachable_unchecked()
: This function, on the other hand, is used for telling the compiler that a certain piece of code will never be reached, allowing it to eliminate the code. However, if this code path does get executed, it results in undefined behavior, meaning that anything can happen. It could crash, it could corrupt data, it could keep working as if nothing happened. It's a way to make a promise to the compiler. If you break that promise, all bets are off. This should be used sparingly and only when you're absolutely sure that this code will never be reached and you need every bit of performance you can get.
7022e59
to
f5e66b5
Compare
Summary
Lexes number literal and ident
https://drafts.csswg.org/css-syntax/#consume-a-number
https://drafts.csswg.org/css-syntax/#consume-an-ident-sequence
https://github.com/swc-project/swc/blob/main/crates/swc_css_parser/src/lexer/mod.rs
https://github.com/servo/rust-cssparser/blob/master/src/tokenizer.rs
Test Plan
cargo test -p rome_css_parser