Maniacs - String Variables - String encoding interpretation #3298

elsemieni · 2024-11-22T02:58:42Z

Tested with Player Master jenkins-1807.
I know Maniacs support is still a very extensive and WIP thing, but I guess it's worth (instead nothing) reporting stuff.

While playing with String Variables I noticed that Maniacs RPG_RT and Player handles String Variables differently when those contains not-so-typical characters. Basically, if a string contains some sort of particular characters Player reports more characters than RPG_RT.

Putting it in an example, if I set
T[1]=ß
Then I obtain the length of it, RPG_RT will report 1, but Player will report 2 instead, as seen at this image:

I believe it could be related about unicode interpretation/conversion of characters.

The text was updated successfully, but these errors were encountered:

Ghabry · 2024-11-22T08:19:33Z

Yeah this is our implementation leaking through. Maniac uses the local encoding (ß is 1 byte). We use UTF-8 (ß is 2 bytes).

There is also no easy way to fix this. Keeping the strings in the legacy encoding will break stuff as everything else in our Player is UTF-8 (e.g. assigning an actor name from a string).

For the translation feature it is also expected that everything is UTF-8 (so you can read a redirected text file)

Ghabry · 2024-11-24T11:59:59Z

Actually after some further testing just returning the bytes is incorrect. A better approximation is returning the codepoints.

As an example the string "XXひらがなXX". When converted to shift-jis (japanese encoding) this results in a size of 12 bytes (X = 1, Hiragana = 2).

When using 1252 as encoding (Western European) where every character is 1 byte Maniacs GetLen returns 12. (Note that you cannot run the game this way as the characters cannot be displayed)

When running with 932 (Shift-Jis) it gives 8.

So RPG_RT seems to use a wide character set when running with SJIS (= 2 bytes count as 1 character).

So just using Unicode Codepoints (A single encoded character) seems to give a pretty good approximation:

ß is one codepoint (WORKS)
XXひらがなXX has 8 codepoints (WORKS)

For Russian this should also work.

Conclusion:

Fortunately there are no games with any complex scripts because RPG_RT cannot render this. So this codepoint trick should work in 99.9% of all cases :).

So just sending everything that is string index/length related through our utf-8 codepoint iterator code will fix most of it.

Using UTF-8 codepoints also prevents any data loss from reencoding the string (Great).

Ghabry added the RPG_RT Patches label Nov 22, 2024

Ghabry added this to the 0.8.1 milestone Nov 24, 2024

fdelapena closed this as completed in 0697a61 Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maniacs - String Variables - String encoding interpretation #3298

Maniacs - String Variables - String encoding interpretation #3298

elsemieni commented Nov 22, 2024

Ghabry commented Nov 22, 2024 •

edited

Loading

Ghabry commented Nov 24, 2024 •

edited

Loading

Maniacs - String Variables - String encoding interpretation #3298

Maniacs - String Variables - String encoding interpretation #3298

Comments

elsemieni commented Nov 22, 2024

Ghabry commented Nov 22, 2024 • edited Loading

Ghabry commented Nov 24, 2024 • edited Loading

Ghabry commented Nov 22, 2024 •

edited

Loading

Ghabry commented Nov 24, 2024 •

edited

Loading