-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polish diacritic characters from map files are not displayed correctly in game #1741
Comments
This is actually a limitation of the map format. The names are decoded to ANSI and stored in the map in a OEM codepage. And not all characters are available in the ANSI codepage Where exactly are those maps from? Can you attach one of them? It might be impossible to fix this issue at all if e.g. BlueByte delivered different maps to different languages using the local OEM codepage. There would be no way for the decoder to know which one to use. |
In original Polish release those were displayed correctly - during that times I believe files were coded in ASCII (DOS). If the read function uses ANSI though, it probably should use 1250 (Eastern Europe Latin-2), not 1252 (Western Latin-1 only) Screenshots and example maps attached. Map: The Turtle "Żółw" |
or we could use this one: #1638 although I'm interested in what would happen if you created a new map using the polish editor using those characters |
I guess we need to check where the ANSI chars are converted to UTF-8. But when using the 1250 codepage we might run into the same issue just with another language. See the comparison: With 1250 we'd loose e.g. Is there any hint in the map about the encoding/codepage? @Spikeone do you happen to know if the maps are the same in different languages? If only the metadata is different between e.g. the German and Polish map release we can check if there are any differences that allow us to infer the encoding. |
@Flamefire sadly so far I wasn't aware that there are french or polish (original) versions out there at all - altough I may remember that someone once told me about the polish version. @Hirotaro do you happen to know the source for the version? |
@Spikeone Yes, Settlers II PL version and other Ubisoft games those times were officially prepared and released by CD PROJEKT Sp. z o.o. (currently known as CD PROJECT RED S.A.). In 1990-2015 CD PROJEKT (Publishing aka 'Blue' was responsible for hundreds of official Polish, Czech and Hungarian releases of games). Right now the GOG.COM version of the game (GOG is part of CD PROJEKT group) includes PL files in their release of the digital Settlers 2 Gold Edition. I was working in CD PROJEKT for 10 years, taking care of localization for most of the time. |
Thanks for the information! I checked the map "turtle" which in German is "Schildkröte" and the Polish one and the only difference is indeed the Name:
According to https://settlers2.net/archives/language-packs IBM CP437 or OEM CP850 or CP852 is used, with the another page at the same website states CP437 Other German characters: ü=0x81, ß=0xe1, Ö=0x99 That matches all 3 codepages but not CP 1250. So it looks like the original game did use either CP437, CP850 or CP852 but the Polish one used CP1250 The map format is expected to be in OEM format and we convert it to Windows-1250 during reading and back during writing. Unfortunately our code for the conversion isn't well enough documented to know which of the 3 OEM codepages is actually used. And in fact I wasn't able to find any codepage for which that mapping fits completely. There are also some unmapped characters such as Ź (0x80) which would have a mapping in CP852 but not CP850 or CP437 With all that being said: I don't see how we can reasonable implement support for the maps you posted as those seem to use CP1250 which would break the currently supported maps. |
In such case it seems to be unsolvable on the Map File side, as we would have to alter the format to add language data or codepage data to the header. I do not know how editor works though - it is probably also too expensive as a new feature, but... You could think on adding optional LUA per each map, where modders/creators could add all languages they want for title and description (like campaign files), that later could be displayed in game directly from corresponding LUA instead of WLD if LUA exists (otherwise WLD name would be displayed). But it may be to complex, and is not that high prio, as this is not critical error or crash, more quality/polishing. P.S. In such case I could deliver translations for all RttR maps and for Roman Campaign maps for English, Polish and Czech. As my German is not the best nowadays, I could try to gather from original and prepare v0.1 translations for review. |
Our editor doesn't seem to handle non-ASCII at all. That should be fixed
I assume our function was intended for CP437. With some checking I found that CP850 matches better as CP852. And CP850 has I don't see a disadvantage for using CP 850 over CP 437 as it supports letters where the latter has symbols. We can still include a way to translate map names in which case people can use the English maps which IIRC are freely available. |
Yes, hack/hardcoded solution for original maps could be a solution. Maybe not very subtle, though for sure effective. |
The issue connected is to map files.
Map files contain map names.
For Western EU languages special characters are displayed correctly
In case of Polish special characters like ĄŚĆŻŹĘŻŃŁÓ, those are rendered as ? when read from map file.
It may be that the CODEPAGE used for reading names from map files is set to CP850, while extended Latin characters that include Polish and Czech diacritic characters require CP852.
Note: All texts from LUA or MO/PO are displayed correctly.
Example:

Fuchsia - issue while reading from Map Files.
Green - correctly rendered from MO/PO text base.
The text was updated successfully, but these errors were encountered: