Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/emojis-and-vulgar-fractions: hunspell should not check it #150

Merged
merged 5 commits into from
Apr 17, 2021

Conversation

laysauchoa
Copy link
Collaborator

@laysauchoa laysauchoa commented Feb 8, 2021

What does this PR accomplish?

  • 🩹 Bug Fix
  • 🦚 Feature

Closes #141 .

Changes proposed by this PR:

Notes to reviewer:

📜 Checklist

  • Works on the ./demo sub directory
  • Test coverage is excellent and passes
  • Documentation is thorough

Test for emojis

error: spellcheck(Hunspell)

  --> /home/tmhdev/Documents/rust_samples/recursion_rust/src/fibonnacci.rs:2
   |
 2 |  😘🤗🦀
   |  ^^^
   |   Possible spelling mistake found.
error: spellcheck(Hunspell)
  --> /home/tmhdev/Documents/rust_samples/recursion_rust/src/fibonnacci.rs:2
   |
 2 |  😘🤗🦀⅔⅔
   |  ^^^^^
   |   Possible spelling mistake found.

error: spellcheck(Hunspell)
  --> /home/tmhdev/Documents/rust_samples/recursion_rust/src/fibonnacci.rs:3
   |
 3 |  ⅔⅔
   |  ^^
   |   Possible spelling mistake found.

I ran again and the possible mistake is not seen.

This PR refers to #141

@laysauchoa laysauchoa marked this pull request as draft February 8, 2021 06:28
@laysauchoa
Copy link
Collaborator Author

Just checking if the changes will be in this part of the code. Still need to find something for fractions, maybe https://crates.io/crates/unicode-segmentation

@laysauchoa laysauchoa changed the title fix/emojis-and_fractions: pass it [wip] fix/emojis-and_fractions: pass it Feb 8, 2021
@drahnr
Copy link
Owner

drahnr commented Feb 8, 2021

The code location is correct 👍 , it would be nice to make it a separate module, since other backends might want to allow whitlisting emojis and graphemes as well.
Note that graphemes are multiple unicode characters, so a more fully future lib might make more sense i.e. unic

@laysauchoa laysauchoa force-pushed the laysa-emojis-are-not-mistakes branch 4 times, most recently from 0a5cc8f to 1d3b8ee Compare February 21, 2021 08:59
@laysauchoa laysauchoa changed the title [wip] fix/emojis-and_fractions: pass it fix/emojis-and-vulgar-fractions: hunspell should not check it Feb 21, 2021
@laysauchoa laysauchoa marked this pull request as ready for review February 21, 2021 09:00
@laysauchoa
Copy link
Collaborator Author

ok, not sure if the other unicode should also be included here besides emojis and vulgar fractions. I only focused on those two. Feel free to drop any tips about it. I could not find in unic a better way to handle with vulgar fractions, maybe I am missing something? Another possibility, would be to find if the char is in the ranges of vulgar fractions but maybe not so readable.

@laysauchoa laysauchoa force-pushed the laysa-emojis-are-not-mistakes branch 2 times, most recently from 5257966 to 205f207 Compare February 21, 2021 11:26
Copy link
Owner

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few performance concerns, the general approach looks really good 👍

src/checker/hunspell.rs Outdated Show resolved Hide resolved
src/checker/hunspell.rs Show resolved Hide resolved
src/checker/hunspell.rs Outdated Show resolved Hide resolved
src/checker/hunspell.rs Outdated Show resolved Hide resolved
src/checker/hunspell.rs Outdated Show resolved Hide resolved
@laysauchoa laysauchoa force-pushed the laysa-emojis-are-not-mistakes branch 2 times, most recently from 8c9e3bf to aeaa40f Compare February 25, 2021 00:08
@laysauchoa laysauchoa requested a review from drahnr February 25, 2021 00:09
@laysauchoa laysauchoa force-pushed the laysa-emojis-are-not-mistakes branch from aeaa40f to 3671d9a Compare February 25, 2021 00:10
src/checker/hunspell.rs Outdated Show resolved Hide resolved
src/checker/hunspell.rs Outdated Show resolved Hide resolved
src/checker/hunspell.rs Outdated Show resolved Hide resolved
drahnr
drahnr previously requested changes Feb 25, 2021
Copy link
Owner

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small set of changes, but other than that good to go, I took the liberty to fix the botched merge commit.

@laysauchoa
Copy link
Collaborator Author

Hi @drahnr, thanks for fixing the test back to working 🙏 . So if regex is a too expensive operation, and also convert to the name as it was done before, then another option would be have the VULGAR FRACTIONS* hard-coded and then simple check if a char is on that list and use unic emoji for check of the emoji. What do you think? Which approach you propose and do you mean to add a flag to enable the check of chars and emoji?

@drahnr
Copy link
Owner

drahnr commented Feb 27, 2021

The expensive part is the refer compilation, not applying the compiled regexp :) so if you wrap that in a lazy static it should be fine.

If you have a better/faster idea, go for it. If unsure, measure with i.e. a criterion bench or a simple cputime crate.

@drahnr
Copy link
Owner

drahnr commented Feb 27, 2021

I tend to prefer unic since the general plan would be to move in that direction anyways

@drahnr
Copy link
Owner

drahnr commented Mar 12, 2021

Ping 🙃

@drahnr drahnr force-pushed the laysa-emojis-are-not-mistakes branch from 70eedbf to cfa1a87 Compare April 17, 2021 08:04
@drahnr
Copy link
Owner

drahnr commented Apr 17, 2021

@laysauchoa finally got around to taking another look at this, and I'd say this is good to go as is now :)

@drahnr drahnr force-pushed the laysa-emojis-are-not-mistakes branch from a7ebd0b to fff36dc Compare April 17, 2021 08:19
src/checker/hunspell.rs Outdated Show resolved Hide resolved
Closes #141

If a string is composed of only emojis
then the checker should ignore them and hence
not report it as `possible spelling mistake found`.

Currently only enabled for hunspell.
@drahnr drahnr force-pushed the laysa-emojis-are-not-mistakes branch from a981138 to 69d807e Compare April 17, 2021 10:22
@drahnr
Copy link
Owner

drahnr commented Apr 17, 2021

The CI issue is unrelated to these changes, it's just the first time it popped up - filed bminixhofer/nlprule#68 but is most likely a lazycell issue.

@drahnr drahnr force-pushed the laysa-emojis-are-not-mistakes branch from 607395a to a1599f9 Compare April 17, 2021 16:36
@drahnr drahnr force-pushed the laysa-emojis-are-not-mistakes branch from a1599f9 to cdcbf5a Compare April 17, 2021 16:45
@drahnr drahnr merged commit c7c16f1 into master Apr 17, 2021
@drahnr drahnr deleted the laysa-emojis-are-not-mistakes branch April 17, 2021 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vulgar fractions and single emojis are detected as mistakes
2 participants