-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/emojis-and-vulgar-fractions: hunspell should not check it #150
Conversation
Just checking if the changes will be in this part of the code. Still need to find something for fractions, maybe https://crates.io/crates/unicode-segmentation |
The code location is correct 👍 , it would be nice to make it a separate module, since other backends might want to allow whitlisting emojis and graphemes as well. |
0a5cc8f
to
1d3b8ee
Compare
ok, not sure if the other unicode should also be included here besides emojis and vulgar fractions. I only focused on those two. Feel free to drop any tips about it. I could not find in unic a better way to handle with vulgar fractions, maybe I am missing something? Another possibility, would be to find if the char is in the ranges of vulgar fractions but maybe not so readable. |
5257966
to
205f207
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few performance concerns, the general approach looks really good 👍
8c9e3bf
to
aeaa40f
Compare
aeaa40f
to
3671d9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small set of changes, but other than that good to go, I took the liberty to fix the botched merge commit.
Hi @drahnr, thanks for fixing the test back to working 🙏 . So if regex is a too expensive operation, and also convert to the name as it was done before, then another option would be have the VULGAR FRACTIONS* hard-coded and then simple check if a char is on that list and use unic emoji for check of the emoji. What do you think? Which approach you propose and do you mean to add a flag to enable the check of chars and emoji? |
The expensive part is the refer compilation, not applying the compiled regexp :) so if you wrap that in a lazy static it should be fine. If you have a better/faster idea, go for it. If unsure, measure with i.e. a criterion bench or a simple cputime crate. |
I tend to prefer unic since the general plan would be to move in that direction anyways |
Ping 🙃 |
70eedbf
to
cfa1a87
Compare
@laysauchoa finally got around to taking another look at this, and I'd say this is good to go as is now :) |
a7ebd0b
to
fff36dc
Compare
Closes #141 If a string is composed of only emojis then the checker should ignore them and hence not report it as `possible spelling mistake found`. Currently only enabled for hunspell.
a981138
to
69d807e
Compare
The CI issue is unrelated to these changes, it's just the first time it popped up - filed bminixhofer/nlprule#68 but is most likely a |
607395a
to
a1599f9
Compare
Revert this or bump to the release 0.6.3 as soon as it's available.
a1599f9
to
cdcbf5a
Compare
What does this PR accomplish?
Closes #141 .
Changes proposed by this PR:
Notes to reviewer:
📜 Checklist
./demo
sub directoryTest for emojis
error: spellcheck(Hunspell)
I ran again and the possible mistake is not seen.
This PR refers to #141