Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to UTF-8 prior to setting Tkinter path #425

Merged
merged 1 commit into from
Dec 17, 2024
Merged

Conversation

charliermarsh
Copy link
Member

Summary

In #421, @carljm pointed out that path was uninitialized in setenv("TCL_LIBRARY", path, 1);. It turns out we need to set it via PyUnicode_AsUTF8.

I confirmed that the existing code was running; it's just thatsetenv returns an error (rather than crashing), and we weren't checking the result (though the same is true in the Windows path). If you, e.g., try to printf path just before, you get a segmentation fault.

This code isn't necessary to fix the motivating Tkinter limitation, so that's another reason it wasn't caught.

@charliermarsh charliermarsh merged commit 4446f7d into main Dec 17, 2024
280 checks passed
@charliermarsh charliermarsh deleted the charlie/tkinter branch December 17, 2024 22:46
Copy link
Collaborator

@indygreg indygreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, the path encoding here is subtly bugged. But you may get lucky and nobody will complain.

Strictly speaking, all paths in the context of Python should be represented using the currently defined filesystem encoding, as exposed by sys.getfilesystemencoding(). Or in the C API, Py_EncodeLocale() and friends.

It is strictly wrong to assume that filesystem paths are UTF-8.

On Linux, a filesystem path (on most filesystems) is any byte sequence ending in \0. Linux has not concept of path encodings.

On Windows, there's likely an active encoding. That may or may not be something compatible with UTF-8.

What these patches should be doing is operating on the paths in the domain of the Python filesystem encoding, which isn't necessarily UTF-8.

Again, assumption of UTF-8 likely just works. But don't be surprised if someone comes out of the woodwork to complain about emojibake with their non-UTF-8 home directory or something along those lines.

@carljm
Copy link

carljm commented Dec 18, 2024

Strictly speaking, all paths in the context of Python should be represented using the currently defined filesystem encoding, as exposed by sys.getfilesystemencoding(). Or in the C API, Py_EncodeLocale() and friends.

Yes, great catch.

On Windows, there's likely an active encoding. That may or may not be something compatible with UTF-8.

The Windows code is already upstream and predates this PR. It just unconditionally uses PyUnicode_AsWideCharString and doesn't look up any system locale or encoding. I don't know enough about string encodings on Windows to know whether that's the right thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants