-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add General_Category and further predicates #40
Add General_Category and further predicates #40
Conversation
…teSpace, in order to be closer to Data.Char.
@Bodigrim would you like to review the changes to |
@wismill I appreciate the effort, but I think sub-byte operations are not worth it. There are 31 constructors, so you need 5 bits per one value. It's most likely faster just use a byte per value, but fetch all data in one go instead of conditional fetching of one or two words. |
@Bodigrim Ok, I will check that. Meanwhile, I added a benchmark and disable tests with incompatible GHC versions. |
@Bodigrim I have implemented your suggestion. It is roughly 60 to 80% faster than my previous implementation! If my benchmark is correct, we have a nice speedup of 16x compared to some functions from
|
Fix base bound for benchmark. Add default language.
I have merged latest changes from All tests successful, except Ubuntu+GHC7103, which seems to have a linker issue. Could you check it? I propose to bump the package version from 0.2 to 0.3, as there are some breaking changes. |
…egory # Conflicts: # appveyor.yml # lib/Unicode/Internal/Bits.hs
@harendra-kumar @adithyaov @Bodigrim I have merged master, added support for big-endian architectures (based on @Bodigrim work). There are some CI failures, although they seem unrelated to this PR. |
@adithyaov can you help with appveyor CI failure and GHC 9.2.1 CI, I think you fixed a similar issue elsewhere. GHC 9.2.1 CI example can be found here. |
Appveyor failure is a known issue with |
I will take a better look at the APIs, naming, structuring etc once I get some time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good. I have some minor comments.
We can add a cabal docspec
CI to check the properties in haddock. We have a similar CI in the streamly repo.
@wismill I apologize for the delayed review. The PR looks good overall. I don't have any major comments other than the minor scrutiny already mentioned. I've added in one small question along as well. You might need to rebase and resolve conflicts. I can send a PR addressing this minor scrutiny to this branch in your repo. |
The CI's are failing with this message:
You can add .editorconfig to .packcheck.ignore since we do not want to pack it in the distribution. |
Fixed |
@adithyaov can you suggest a fix for appveyor CI failure. I think you fixed a similar issue elsewhere. |
@wismill this looks good for merge. You can squash any commits if required before the merge. |
New benchmark with relative speedup:
|
The benchmarks look pretty good compared to Good stuff @wismill ! |
Ok, I think this PR is almost ready. Should I squash all the commits? Should I bump the package’s version to 0.3? |
You can either squash them into a single commit or maybe squash them into a few logically related commits, or just squash the trivial/fixup type commits and leave the rest as it is, whatever you deem fit. We can bump the package version, but let's give it a soak time of a week before we upload to hackage. We can also generate a bench-show report with a side-by-side benchmark comparison with base and put it in the readme, like we have in unicode-transforms. |
This needs to be added to the stack config.
|
Unfortunately it seems that
I will add it. |
I'll need to update |
Add benchmark result in README. Fix stack config Move ambiguous functions to compatibility module: Unicode.Char.General.Compat. Fix benchmark for old GHC Improve doc about isSpace and isWhiteSpace Improve benchmark by using bcompare. Add .editorconfig to .packcheck.ignore Fixes for review. Add .editorconfig Fixes for review appveyor: lts-18.17 → lts-18.18 Revert "Add support for big-endian architectures" This reverts commit 137f201. Update Changelog.md Simplify tests Add support for big-endian architectures
405118a
to
4260b7e
Compare
3068c75
to
6e68e5f
Compare
Ok. Apart from this and GHC 7.10.3 failing, I think this PR is ready to merge. |
Let’s merge and continue work in specific issues. |
UnicodeData.txt
lib/Unicode/Internal/Char/UnicodeData/GeneralCategory.hs
GeneralCategory
data type and correspondinggeneralCategoryAbbr, generalCategory
functions.isAlpha, isAlphaNum, isControl, isMark, isNumber, isPrint, isPunctuation, isSeparator, isSymbol
.isLetter
is renamedisAlphabetic
andisSpace
is renamedisWhiteSpace
.isLetter
andisSpace
now match the base’s Data.Char behaviour.lookupIntN
inUnicode.Internal.Bits
for low-level stuff.Data.Char
in order to makeUnicode.Char
a drop-in replacement.Unicode.Char
to ensure we get the same result thanData.Char
.Note that this true only if the compiler used have have the same Unicode version as this package.
Fixes #38, Fixes #39
Note:
ucd.sh generate
should be re-run if PR #36 is merge (before or after this PR).