-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare for removing the legacy Unicode C API #80527
Comments
The legacy Unicode C API was deprecated in 3.3. Its support consumes resources: more memory usage by Unicode objects, additional code for handling Unicode objects created with the legacy C API. Currently every Unicode object has a cache for the wchar_t representation. The proposed PR adds two compile time options: HAVE_UNICODE_WCHAR_CACHE and USE_UNICODE_WCHAR_CACHE. Both are set to 1 by default. If USE_UNICODE_WCHAR_CACHE is set to 0, CPython will not use the wchar_t cache internally. The new wchar_t based C API will be used instead of the Py_UNICODE based C API. This can add small performance penalty for creating a temporary buffer for the wchar_t representation. On other hand, this will decrease the long-term memory usage. This build is binary compatible with the standard build and third-party extensions can use the legacy Unicode C API. If HAVE_UNICODE_WCHAR_CACHE is set to 0, the wchar_t cache will be completely removed. The legacy Unicode C API will be not available, and functions that need it (e.g. PyArg_ParseTuple() with the "u" format unit) will always fail. This build is binary incompatible with the standard build if you use the legacy or non-stable Unicode C API. I hope that these options will help third-party projects to prepare for removing the legacy Unicode C API in future. |
Thanks for implementing this, Serhiy. |
I think this is a good preparation that makes it clear what code will eventually be removed, and allows testing without it. No idea how happy Windows users will be about all of this, but I consider it quite an overall improvement for the Unicode implementation. Once this gets removed, that is. Removing the "unicode_internal" codec entirely (which is changed by this PR) is discussed in bpo-36297. |
I'd change the title of this bpo item to "Prepare for removing the whcar_t caching in the Unicode C API". Note that the wchar_t caching was put in place to allow for external applications and C code to easily and efficiently interface with Python. By removing it you will slow down such code significantly, esp. on Linux and Windows where wchar_t code is fairly common (one of the reasons we added UCS4 in Python was to make the interaction with Linux wchar_t code more efficient). This should be clearly mentioned as part of the change and the compile time flags. BTW: You have a few other changes in the PR which don't have anything to do with the intended removal: - envsize = PySequence_Fast_GET_SIZE(keys);
- if (PySequence_Fast_GET_SIZE(values) != envsize) {
+ envsize = PyList_GET_SIZE(keys);
+ if (PyList_GET_SIZE(values) != envsize) { |
I had also looked through the unrelated changes, and while, yes, they are unrelated, they seemed to be correct and reasonable modernisations of the code base while touching it. They could be moved to a separate PR, but there is a relatively high risk of conflicts, so I'm ok with keeping them in here for now. |
On 18.03.2019 22:33, Stefan Behnel wrote:
I don't think changing sequence iteration to list iteration only My guess is that these changes have made it into the PR by mistake. |
I'm not sure we need two options. |
I wrote this PR just to see how much code should be changed after removing the wchar_t cache, and what be performance impact. Get it, experiment with it, run tests and benchmarks. I think we could set USE_UNICODE_WCHAR_CACHE to 0 by default. If this will cause significant troubles, it is easy to set it to 1. I am going to add configure options for switching these options. On Windows you will still need to edit the config file manually.
Currently some of the legacy functions are not decorated with Py_DEPRECATED, because this would cause compiler warnings in the code that uses these functions. If USE_UNICODE_WCHAR_CACHE is 0, these functions will no longer used, so we can add compiler warnings for them.
getenvironment() is the function that has been rewritten to the new API without preserving the old variant. Since the code was rewritten so much, I performed some code clean up. PyMapping_Keys() and PyMapping_Values() always return a list now, so that using the PySequence_Fast API is superfluous. They could return a tuple in the past, but this provoked bugs because the user code used PyList API for it. I'll open a separate issue for this.
CPython configuration macros (like HAVE_ACOSH or USE_COMPUTED_GOTOS) do not have the PY_ prefix. |
FYI, I had created PR 12340 which removes use of deprecated API in ctypes. |
One thing to keep in mind: HAVE_UNICODE_WCHAR_CACHE == 1 and HAVE_UNICODE_WCHAR_CACHE == 0 have a different ABI due to a different struct layout. This should probably affect the ABI tag for extension modules. |
I don't think this is a good approach. Most projects and developers don't recompile Python. It's especially a chore when you have many dependencies with C extensions, because you'll have to recompile them all as well. I would recommend simply removing that cache. |
I think these ABI incompatible options are used many people. I had found ujson and MarkupSafe used legacy APIs. I fixed MarkupSafe. I suppose there are some other packages in PyPI, but I'm not sure. |
I closed bpo-38604 as a duplicate. Copy of my messages. msg355475 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-27 16:02 Python 3.3 deprecated the C API functions using Py_UNICODE type. Examples in the doc:
Currently, functions removal is scheduled for Python 4.0 but I would prefer that Python 4.0 doesn't have a long list of removed features, but no more than usual. So I'm trying to remove a few functions from Python 3.9, and try to prepare removal for others. Py_UNICODE C API was mostly kept for backward compatibility with Python 2. Since Python 2 support ends at the end of the year, can we start to organize Py_UNICODE C API removal? There are multiple questions:
I propose to:
Honestly, if the removal is causing too much issues, I'm fine to make slowdown the removal. It's just a matter of clearly communicating our intent. Maybe we should also announce the scheduled removal in What's in Python 3.9 and in the capi-sig mailing list. msg355478 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-27 16:15
I searched "4.0" in the documentation:
msg355524 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-28 11:06 A preleminary step was to modify PyUnicode_AsWideChar() and PyUnicode_AsWideCharString() to remove the internal caching: it has been done in Python 3.8.0 with bpo-30863. |
Thanks INADA-san! Another nail into Py_UNICODE coffin! |
This change broke test_distutils on multiple buildbots. Examples: |
Oh, why I can not use C99?
|
PEP-7 requires C99 to build Python, but I think that we can try to keep C89 compatibility for the public header files (Python C API). |
There is no need to deprecate _PyUnicode_AsUnicode. It is a private function. Undeprecating it will make the code clearer. |
Is there anything left to do here? |
No. I just waiting Python 3.11 become Bata. |
Deprecate functions: * PyUnicode_AS_DATA() * PyUnicode_AS_UNICODE() * PyUnicode_GET_DATA_SIZE() * PyUnicode_GET_SIZE() Previously, these functions were macros and so it wasn't possible to decorate them with Py_DEPRECATED().
The decorator now requires to be called: @support.requires_legacy_unicode_capi() instead of: @support.requires_legacy_unicode_capi The implementation now only imports _testcapi when the decorator is called, so "import test.support" no longer imports the _testcapi extension.
The decorator now requires to be called with parenthesis: @support.requires_legacy_unicode_capi() instead of: @support.requires_legacy_unicode_capi The implementation now only imports _testcapi when the decorator is called, so "import test.support" no longer imports the _testcapi extension.
The decorator now requires to be called with parenthesis: @support.requires_legacy_unicode_capi() instead of: @support.requires_legacy_unicode_capi The implementation now only imports _testcapi when the decorator is called, so "import test.support" no longer imports the _testcapi extension.
…GH-108438) The decorator now requires to be called with parenthesis: @support.requires_legacy_unicode_capi() instead of: @support.requires_legacy_unicode_capi The implementation now only imports _testcapi when the decorator is called, so "import test.support" no longer imports the _testcapi extension. (cherry picked from commit 995f4c4) Co-authored-by: Victor Stinner <[email protected]>
…8438) (#108446) gh-80527: Change support.requires_legacy_unicode_capi() (GH-108438) The decorator now requires to be called with parenthesis: @support.requires_legacy_unicode_capi() instead of: @support.requires_legacy_unicode_capi The implementation now only imports _testcapi when the decorator is called, so "import test.support" no longer imports the _testcapi extension. (cherry picked from commit 995f4c4) Co-authored-by: Victor Stinner <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: