-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeSet L1 Work #122
UnicodeSet L1 Work #122
Conversation
Pull Request Test Coverage Report for Build c8c149675ccc01e08099abf610a5d8cee960de86-PR-122
💛 - Coveralls |
It occurs to me if we use ranges here then this can actually be made into a generic crate that has nothing to do with chars and applies to any Ord+Eq ranges. Not something we have to do, but something we can consider! |
@Manishearth @zbraniecki @nciric Thank you all for the comments! Unfortunately, I wasn't expecting to have my code reviewed so quickly, and failed to mention that most of I still took all of your feedback into consideration for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good! I left some nits/suggestions and request for benchmarks.
I see a number of tests that test for 'A'..'z'
or 'A'..
or 'a'..'z'
which I recognize from higher level uses like regexps, parsers, etc.
Some of those would work really well as "real world" benchmarks, but I don't know which.
If you think we should iron out benchmarks after we land it, I'm comfortable with an issue filed to kick off the conversation and not blocking landing of this PR any longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
One more optional nit: you include std::char::MAX
in multiple places and then reference it as MAX
.
It requires the reader to jump to the top of the file to find if MAX
is a local constant, or reference and which MAX
is it (char::MAX
, f64::MAX
, etc.).
When dealing with such generic variables (MAX
, MIN
, Error
) we usually keep its module name to make it easier to read: use std::char;
char::MAX
.
I took the liberty to mark resolved or obsolete threads from 20+ days ago from @filmil and @nciric as resolved. There are still a few open comments from @zbraniecki, like using I also agree with using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! All of my concerns are addressed.
We can have serializer/deserializer that reads
You have to continuously remember that the list has to be even length, you need to always know if you're on an even or odd number to understand in the start or end of a range etc. |
Yeah, but why complicate things when currently we can take ownership of the
Technically the list doesn't actually need to be an even length. A missing endpoint just means that the final range has no upper bound. I don't actually know why we're enforcing an even length. |
uniset.rs
Vec<u32>
to create the UnicodeSet orRangeBounds<Char>
contains(char)
andcontains_range(RangeBounds<char>)
size()
ranges()
iter()
isEmpty()