You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please, before submitting a new issue verify and check:
I tested it on latest raylib version from master branch
I checked there is no similar issue already reported
My code has no errors or misuse of raylib
Issue description
GetCodepointNext says it returns '?' when an invalid codepoint is read, however at no point is this true. '?' is only returned when '?' is the input.
When the first byte is 128..191 or 248..255 it's invalid UTF8, but the code returns a 1 byte result verbatim instead of '?'
The 10xxxxxx of multibyte encodings are also not checked
If these were fixed and the default returned, the default return size of *codepointSize = 0; which appears to be the intention would break existing code, for example LoadCodepoints would endlessly loop on any input with an invalid codepoint. If *codepointSize = 1; were the default then no existing code should break, potentially multiple ? chars would be output per invalid codepoint which seems fine.
Overlong encodings are not checked, which strictly speaking isn't allowed per the UTF-8 spec. However that's not a big deal IMO
Environment
Untested other than in below example.
Code Example
#include <stdio.h>
//function as it currently is in raylib verbatim
int GetCodepointNext(const char *text, int *codepointSize)
{
const char *ptr = text;
int codepoint = 0x3f; // Codepoint (defaults to '?')
*codepointSize = 0;
// Get current codepoint and bytes processed
if (0xf0 == (0xf8 & ptr[0]))
{
// 4 byte UTF-8 codepoint
codepoint = ((0x07 & ptr[0]) << 18) | ((0x3f & ptr[1]) << 12) | ((0x3f & ptr[2]) << 6) | (0x3f & ptr[3]);
*codepointSize = 4;
}
else if (0xe0 == (0xf0 & ptr[0]))
{
// 3 byte UTF-8 codepoint */
codepoint = ((0x0f & ptr[0]) << 12) | ((0x3f & ptr[1]) << 6) | (0x3f & ptr[2]);
*codepointSize = 3;
}
else if (0xc0 == (0xe0 & ptr[0]))
{
// 2 byte UTF-8 codepoint
codepoint = ((0x1f & ptr[0]) << 6) | (0x3f & ptr[1]);
*codepointSize = 2;
}
else
{
// 1 byte UTF-8 codepoint
codepoint = ptr[0];
*codepointSize = 1;
}
return codepoint;
}
int main(int argc, char *argv) {
char t=0xFF, t2[]={0xF0, 0x01, 0x01, 0x01}, t3[]={0xF0, 0x81, 0x81, 0x81};
int len, codepoint;
codepoint=GetCodepointNext(&t, &len);
printf("Invalid codepoint (first byte) not replaced with '?': codepoint %d length %d\n", codepoint, len);
codepoint=GetCodepointNext(t2, &len);
printf("Invalid codepoint (multibyte) not replaced with '?': character %d length %d\n", codepoint, len);
codepoint=GetCodepointNext(t3, &len);
printf("Overlong codepoint not replaced with '?': character %d length %d\n", codepoint, len);
}
Which outputs this
Invalid codepoint (first byte) not replaced with '?': codepoint -1 length 1
Invalid codepoint (multibyte) not replaced with '?': character 4161 length 4
Overlong codepoint not replaced with '?': character 4161 length 4
The text was updated successfully, but these errors were encountered:
raysan5
changed the title
[rtext] GetCodepointNext does not return '?' as intended on invalid input
[rtext] GetCodepointNext() does not return ? as intended on invalid input
Mar 31, 2023
If you don't want to validate fully then it looks like it's as simple as else if (!(0x80 & ptr[0])) or else if (0x00 == (0x80 & ptr[0])) for the last condition then fixing things like LoadCodepoints which don't check for a zero length result. I can do a PR to that effect if you like.
Please, before submitting a new issue verify and check:
Issue description
GetCodepointNext says it returns '?' when an invalid codepoint is read, however at no point is this true. '?' is only returned when '?' is the input.
*codepointSize = 0;
which appears to be the intention would break existing code, for example LoadCodepoints would endlessly loop on any input with an invalid codepoint. If*codepointSize = 1;
were the default then no existing code should break, potentially multiple ? chars would be output per invalid codepoint which seems fine.Environment
Untested other than in below example.
Code Example
Which outputs this
The text was updated successfully, but these errors were encountered: