Convert to UTF-8 prior to setting Tkinter path #425

charliermarsh · 2024-12-17T15:57:26Z

Summary

In #421, @carljm pointed out that path was uninitialized in setenv("TCL_LIBRARY", path, 1);. It turns out we need to set it via PyUnicode_AsUTF8.

I confirmed that the existing code was running; it's just thatsetenv returns an error (rather than crashing), and we weren't checking the result (though the same is true in the Windows path). If you, e.g., try to printf path just before, you get a segmentation fault.

This code isn't necessary to fix the motivating Tkinter limitation, so that's another reason it wasn't caught.

indygreg

Strictly speaking, the path encoding here is subtly bugged. But you may get lucky and nobody will complain.

Strictly speaking, all paths in the context of Python should be represented using the currently defined filesystem encoding, as exposed by sys.getfilesystemencoding(). Or in the C API, Py_EncodeLocale() and friends.

It is strictly wrong to assume that filesystem paths are UTF-8.

On Linux, a filesystem path (on most filesystems) is any byte sequence ending in \0. Linux has not concept of path encodings.

On Windows, there's likely an active encoding. That may or may not be something compatible with UTF-8.

What these patches should be doing is operating on the paths in the domain of the Python filesystem encoding, which isn't necessarily UTF-8.

Again, assumption of UTF-8 likely just works. But don't be surprised if someone comes out of the woodwork to complain about emojibake with their non-UTF-8 home directory or something along those lines.

carljm · 2024-12-18T16:19:25Z

Strictly speaking, all paths in the context of Python should be represented using the currently defined filesystem encoding, as exposed by sys.getfilesystemencoding(). Or in the C API, Py_EncodeLocale() and friends.

Yes, great catch.

On Windows, there's likely an active encoding. That may or may not be something compatible with UTF-8.

The Windows code is already upstream and predates this PR. It just unconditionally uses PyUnicode_AsWideCharString and doesn't look up any system locale or encoding. I don't know enough about string encodings on Windows to know whether that's the right thing to do.

charliermarsh force-pushed the charlie/tkinter branch from 2a4a118 to ae0fa22 Compare December 17, 2024 15:57

zanieb approved these changes Dec 17, 2024

View reviewed changes

charliermarsh force-pushed the charlie/tkinter branch from ae0fa22 to 81429fa Compare December 17, 2024 16:00

carljm approved these changes Dec 17, 2024

View reviewed changes

Convert to UTF-8 prior to setting Tkinter path

4486631

zanieb force-pushed the charlie/tkinter branch from 81429fa to 4486631 Compare December 17, 2024 21:50

charliermarsh merged commit 4446f7d into main Dec 17, 2024
280 checks passed

charliermarsh deleted the charlie/tkinter branch December 17, 2024 22:46

indygreg reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert to UTF-8 prior to setting Tkinter path #425

Convert to UTF-8 prior to setting Tkinter path #425

charliermarsh commented Dec 17, 2024

indygreg left a comment

carljm commented Dec 18, 2024

Convert to UTF-8 prior to setting Tkinter path #425

Convert to UTF-8 prior to setting Tkinter path #425

Conversation

charliermarsh commented Dec 17, 2024

Summary

indygreg left a comment

Choose a reason for hiding this comment

carljm commented Dec 18, 2024