Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScummTR: V1EN games use a \x85 character that's only available in Windows-1252 #5

Open
dwatteau opened this issue Nov 14, 2020 · 2 comments

Comments

@dwatteau
Copy link
Owner

dwatteau commented Nov 14, 2020

There's a \x85 character in the Text::CT_V1EN array of ScummTR.

' ', '!', '"', '#', '$', '\x85', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/', // XXX: \x85

This file contains escape sequences for some ISO-8859-1 characters, such as \xa3 for a © symbol (original file was in Windows-1252, but I used escape sequences while converting it to UTF-8; it's also better for compiler portability).

The problem is that the \x85 value only makes sense in ISO-8859-1, not in Windows-1252. And nowadays, it's probably better to limit ourself to the identical subset between ISO-8859-1/Windows-1252 (this character isn't part of it).

@tc-hib
Copy link
Collaborator

tc-hib commented Nov 14, 2020

\x85 is the ellipsis char : …

Code pages actually start to differ after 127 ;)

Looks like I wrote that as if my local was universal, most probably out of ignorance.

This array is used in the Text class only.
Through _charset and its opposite _tesrahc (reversed name :/ )

If you replace '\x85' with 0, it won't be translated anymore, only escaped to \037.

You could make scummtr work with unicode, but it is a little more work.

@dwatteau
Copy link
Owner Author

dwatteau commented Nov 14, 2020

Oh, OK, thank you! (EDIT: my original comment got a lot of things wrong 😅)

The problem with Windows-1252 is that it's incompatible with ISO-8859-1 for some codepoints:
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html

The ellipsis is nice, but ISO-8859-1 doesn't have one. I think that, nowadays, restricting ourselves to the ISO-8859-1/Windows-1252 compatible subset is probably the safest, most interoperable choice.

I'll test with a V1-EN game, since this array is only for V1-EN games, anyway. Maybe I could borrow the ^-for-ellipsis convention if the V1-EN games don't already use a real ^ character for this. Or I could let it remain untranslated as you suggest, so that the output remains compatible with ISO-8859-1 without breaking anything.

As for Unicode, I could maybe use iconv in order to accept UTF-8 file imports/exports, as long as they properly translate to the internal codepage of the game (i.e. accept UTF-8 too, as long as all the characters in it will be exploitable by the game).

dwatteau added a commit that referenced this issue Mar 29, 2021
It's an incompatibility between Windows-1252 and ISO-8859-1, and it looks
like no V1-EN game uses it anyway, to the best of my knowledge.

It's still possible to use the associated escape sequence, if necessary.

Issue #5.
@dwatteau dwatteau changed the title ScummTR: make sure that the strange \x85 CT_V1EN character is not a mistake ScummTR: V1EN games use a \x85 character that's only available in Windows-1252 Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants