-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use vector operations in text decoding #272
Comments
Doing some more research https://bitbucket.org/stgatilov/utf8lut seems quite good, it can validate and convert between utf8 <-> utf16/32 which would allow replacing a bunch of decoders/encoders in Text |
@Fuuzetsu ohai Just saw this issue. Lately I've been applying SSE2 to obvious places in encoding and decoding:
I haven't yet read the state of the art in SIMD conversions, thanks for the links. If we're to go into SSE4 and AVX, I wonder if GHC should support a |
@Lysxia can you comment on why the issue for closed? are we happy enough with current state of vectorisation in modern text? If so, great, just nice to have couple of words |
Sure thing. Given @ethercrow's SSE2 patches in text-1.2.5.0 and @Bodigrim's switch to UTF-8 (PR #365) which bundled the C++ I realize I tend to be trigger-happy in closing issues. I won't be opposed if you or @haskell/text maintainers would prefer this issue to remain open to keep track of long-term progress on vectorization in text. But my view is that it's a given that comparisons against the state of the art and further improvements in the area are always welcome. Hence I think issues have more utility when they are driven by more focused discussion and more actionable goals. |
Thank you. I don't have a problem with closing this – when I initially created the ticket, I think there was about zero vectorisation code and as you mention, a bunch has been added. It should be much easier to continue the trend now and we don't an issue explicitly anymore, probably. |
There's a fair amount of literature about decoding UTF8 as well as conversion to UTF16. It seems like cbits.c could probably benefit massively from a modern update.
See https://github.com/cyb70289/utf8 for example, also u8u16 library/paper &c.
The text was updated successfully, but these errors were encountered: