-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parsing performance by reducing token cloning #1587
Conversation
I've been sitting around a 10% improvement with each added commit not moving the needled all that much. Then after poking a bit harder at Instruments I realized that some super hot paths are in the token manipulation methods themselves. This adds an 18% improvement against my previous commit which gives a grand total of about 28% on both benchmarks.
Except for a single instance, every use of expect_keyword was ignoring the returned token. This adds a new `expect_keyword_is` that avoids that unnecessary clone. I nearly added a `#[must_use]` attribute to the `expect_keyword` method, but decided against it as that feels like a breaking API change even if it would nudge folks toward the correct method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @davisp -- this is a very nice PR ❤️
I ran the benchmarks as well on a linux machine and see the same improvements you did. Very nice
++ critcmp main reduce-token-cloning
group main reduce-token-cloning
----- ---- --------------------
sqlparser-rs parsing benchmark/sqlparser::select 1.46 6.4±0.03µs ? ?/sec 1.00 4.4±0.02µs ? ?/sec
sqlparser-rs parsing benchmark/sqlparser::with_select 1.30 30.7±0.09µs ? ?/sec 1.00 23.5±0.10µs ? ?/sec
++ popd
~/datafusion-benchmarking
FYI @iffyio and @Dandandan
@@ -186,6 +186,15 @@ impl std::error::Error for ParserError {} | |||
// By default, allow expressions up to this deep before erroring | |||
const DEFAULT_REMAINING_DEPTH: usize = 50; | |||
|
|||
// A constant EOF token that can be referenced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
This might be useful enough to make pub const
as well (so users of the crate can do the same thing)
@@ -3376,22 +3407,26 @@ impl<'a> Parser<'a> { | |||
matched | |||
} | |||
|
|||
pub fn next_token(&mut self) -> TokenWithSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add doc comments to these methods as well?
Maybe we can also mention "see next_token_ref
for a faster, non copying API"
/// If the current token is the `expected` keyword, consume the token. | ||
/// Otherwise, return an error. | ||
/// | ||
/// This differs from expect_keyword only in that the matched keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be really nice to add a link in the docs too:
/// This differs from expect_keyword only in that the matched keyword | |
/// This differs from [`Self::expect_keyword`] only in that the matched keyword |
&format!("one of {}", keywords.join(" or ")), | ||
self.peek_token(), | ||
self.peek_token_ref(), | ||
) | ||
} | ||
} | ||
|
||
/// If the current token is the `expected` keyword, consume the token. | ||
/// Otherwise, return an error. | ||
pub fn expect_keyword(&mut self, expected: Keyword) -> Result<TokenWithSpan, ParserError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a follow on PR we could mark this API as deprecated to locate other uses of this API as well as help downstream consumers upgrade 🤔
datafusion-sqlparser-rs/src/tokenizer.rs
Line 604 in 5de5312
#[deprecated(since = "0.53.0", note = "please use `TokenWithSpan` instead")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @davisp!
I merged this PR up to main to resolve a conflict |
This PR has another conflict now What I am hoping / planning to do is make the suggested doc improvements above and get it merged in. I also think it would be fine to merge it in and make the doc comment improvements as a follow on PR I'll try to do this if no one else beats me to it |
I am going to merge this PR up to resolve the conflicts and make some of the suggested improvements in documentation, etc as a follow on PR |
I merged up to resolve a conflict. While reviewing the code I have some ideas on how to simplify it -- I'll try and make a follow on PR later today. It will be a fun coding exercise |
Thanks again @davisp @Dandandan and @iffyio |
This basically just takes the approach proposed by @alamb in #1561 and runs with it.
Token
s as much #1558The main changes involved:
&TokenWithSpan
instead of a clonedTokenWithSpan
.Results on my M1 MacBook Pro: