-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce unsafe usage #28
Conversation
This is a neat trick. Let me work through how I think this works.
Instead:
To summarize, size of interner stays the same and much less unsafe by only changing an internal representation. Sounds great. Good catch that the docs don’t make it clear that leaking was an implementation detail and intaglio cleans up after itself. Thanks. I’d like to take this change. How would you like to proceed @CAD97? One change is like to make is to move the constructors for |
Yes, you've got it right. I've pushed a second commit to do the same improvment for the bytes interner. The implementation work is now ready to be merged, though I'd definitely suggest going through another pass to clean up documentation (internal and external) to make sure that it's not talking about leaking strings anymore. You've much more familiarity with the library to do the work than I. My quick tour of the bytes module to apply this diff suggests you might be better off just not using TL;DR: merge this PR, work from there. |
One quick note before I review: The only reason In Artichoke, while symbols are byte strings, they are conventionally UTF-8 and almost always real UTF-8 strings. Without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CAD97! This pointer wizardry is easier to understand than the wizardry it replaces.
I'm pleased that the changes are very localized in scope.
🎉 💎
@@ -818,8 +795,7 @@ impl<S> SymbolTable<S> { | |||
#[must_use] | |||
fn alloc(contents: Vec<u8>) -> Slice { | |||
let boxed_slice = contents.into_boxed_slice(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going from Vec<u8>
to BString
would require activating the std
feature.
I'd like to avoid doing that. One way we can work around this is by manually implementing Debug
and calling as_ref().as_bstr()
on the Owned
boxed slice.
@@ -79,19 +78,31 @@ use crate::{Symbol, SymbolOverflowError, DEFAULT_SYMBOL_TABLE_CAPACITY}; | |||
enum Slice { | |||
/// True `'static` references. | |||
Static(&'static BStr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll move away from storing BStr
s in the Slice
and only convert to impl Debug
. The reason to store the slices as &BStr
is that it lets me be lazy and derive debug.
If you're interested, the post is out now: https://dev.to/cad97/string-interners-in-rust-797 |
Thanks for checking out |
by storing
Box<str>
rather than&'static str
for the owned values in theSymbolTable
.This is mostly a proof-of-concept (as it does not adjust the bstr interner or any documentation that is now innacurate), and not strictly necessary for soundness or any feature addition. Rather, it's primarily because you don't need to do the
Box::leak
/Box::from_raw
dance, and this reduces room for error asSlice
no longer requires a manualDrop
implementation.I'm collecting some comparisons between some of the top string interner crates in the vein of Amos's comparison of small strings, and was surprised that an interner that says it works via leaking strings actually doesn't leak anything and properly cleans up after itself.
(Also, you're the only interner of the ones I'm checking that support interning
&'static str
special without copying the string, so that's cool!)