Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.13: test_xml_etree fails on expat 2.2.5 (RHEL8 / Rocky linux 8 / Ubuntu 22.04) #125067

Open
JacobHenner opened this issue Oct 7, 2024 · 43 comments
Labels
3.13 bugs and security fixes build The build process and cross-build tests Tests in the Lib/test dir topic-XML type-bug An unexpected behavior, bug, or error

Comments

@JacobHenner
Copy link

JacobHenner commented Oct 7, 2024

Bug report

Bug description:

I'm attempting to build Python 3.13.0 for Rocky Linux 8, and several tests are failing:

LD_LIBRARY_PATH=/usr/src/python ./python -m test --pgo --timeout=
Using random seed: 1341012311
0:00:00 load avg: 15.81 Run 44 tests sequentially in a single process
0:00:00 load avg: 15.81 [ 1/44] test_array
0:00:02 load avg: 14.70 [ 2/44] test_base64
0:00:02 load avg: 14.70 [ 3/44] test_binascii
0:00:03 load avg: 14.70 [ 4/44] test_binop
0:00:03 load avg: 14.70 [ 5/44] test_bisect
0:00:03 load avg: 14.70 [ 6/44] test_bytes
0:00:13 load avg: 12.83 [ 7/44] test_bz2
0:00:14 load avg: 12.83 [ 8/44] test_cmath
0:00:14 load avg: 12.83 [ 9/44] test_codecs
0:00:17 load avg: 12.04 [10/44] test_collections
0:00:21 load avg: 12.04 [11/44] test_complex
0:00:22 load avg: 11.32 [12/44] test_dataclasses
0:00:23 load avg: 11.32 [13/44] test_datetime
0:00:31 load avg: 10.57 [14/44] test_decimal
0:00:41 load avg: 9.25 [15/44] test_difflib
0:00:44 load avg: 8.67 [16/44] test_embed
0:01:01 load avg: 7.35 [17/44] test_float
0:01:01 load avg: 7.35 [18/44] test_fstring
0:01:06 load avg: 7.00 [19/44] test_functools
0:01:08 load avg: 6.68 [20/44] test_generators
test test_generators failed
0:01:08 load avg: 6.68 [21/44] test_hashlib -- test_generators failed (1 failure)
0:01:10 load avg: 6.68 [22/44] test_heapq
0:01:11 load avg: 6.68 [23/44] test_int
0:01:14 load avg: 6.47 [24/44] test_itertools
0:01:25 load avg: 6.16 [25/44] test_json
0:01:33 load avg: 5.52 [26/44] test_long
0:01:40 load avg: 5.31 [27/44] test_lzma
0:01:41 load avg: 5.31 [28/44] test_math
0:01:49 load avg: 5.04 [29/44] test_memoryview
0:01:49 load avg: 5.04 [30/44] test_operator
0:01:50 load avg: 5.04 [31/44] test_ordered_dict
0:01:52 load avg: 5.04 [32/44] test_patma
0:01:52 load avg: 4.79 [33/44] test_pickle
0:02:07 load avg: 4.10 [34/44] test_pprint
0:02:08 load avg: 4.10 [35/44] test_re
0:02:10 load avg: 4.10 [36/44] test_set
0:02:24 load avg: 4.68 [37/44] test_sqlite3
0:02:26 load avg: 4.68 [38/44] test_statistics
0:02:45 load avg: 6.26 [39/44] test_str
0:02:50 load avg: 6.32 [40/44] test_struct
0:02:52 load avg: 6.32 [41/44] test_tabnanny
0:02:53 load avg: 6.45 [42/44] test_time
0:02:55 load avg: 6.45 [43/44] test_xml_etree
test test_xml_etree failed
0:02:57 load avg: 6.45 [44/44] test_xml_etree_c -- test_xml_etree failed (3 failures)
test test_xml_etree_c failed
test_xml_etree_c failed (3 failures)
Total duration: 2 min 59 sec
Total tests: run=9,361 failures=7 skipped=204
Total test files: run=44/44 failed=3

These tests are required to succeed in order for --enable-optimizations to work as expected.

I am using the following build options:

./configure \
  --enable-loadable-sqlite-extensions \
  --enable-optimizations \
  --enable-shared \
  --with-lto \
  --with-system-expat \
  --with-ensurepip \
  LDFLAGS=-Wl,-rpath=/usr/local/lib

3.9-3.12.6 do not exhibit this behavior when built with the same commands (aside from the Python version).

I'm curious whether there might be a regression for certain versions of expat, similar to #117187.

expat-devel version is expat-2.2.5-15.el8.

The specific test failures are:

======================================================================
FAIL: test_flush_reparse_deferral_disabled (test.test_xml_etree.XMLPullParserTest.test_flush_reparse_deferral_disabled)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1752, in test_flush_reparse_deferral_disabled
    self.assert_event_tags(parser, [('start', 'doc')])
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1483, in assert_event_tags
    self.assertEqual([(action, elem.tag) for action, elem in events],
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     expected)
                     ^^^^^^^^^
AssertionError: Lists differ: [] != [('start', 'doc')]

Second list contains 1 additional elements.
First extra element 0:
('start', 'doc')

- []
+ [('start', 'doc')]

======================================================================
FAIL: test_simple_xml_chunk_1 (test.test_xml_etree.XMLPullParserTest.test_simple_xml_chunk_1)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1508, in test_simple_xml_chunk_1
    self.test_simple_xml(chunk_size=1, flush=True)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1496, in test_simple_xml
    self.assert_event_tags(parser, [('end', 'element')])
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1483, in assert_event_tags
    self.assertEqual([(action, elem.tag) for action, elem in events],
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     expected)
                     ^^^^^^^^^
AssertionError: Lists differ: [] != [('end', 'element')]

Second list contains 1 additional elements.
First extra element 0:
('end', 'element')

- []
+ [('end', 'element')]

======================================================================
FAIL: test_simple_xml_chunk_5 (test.test_xml_etree.XMLPullParserTest.test_simple_xml_chunk_5)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1511, in test_simple_xml_chunk_5
    self.test_simple_xml(chunk_size=5, flush=True)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1496, in test_simple_xml
    self.assert_event_tags(parser, [('end', 'element')])
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/test/test_xml_etree.py", line 1483, in assert_event_tags
    self.assertEqual([(action, elem.tag) for action, elem in events],
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     expected)
                     ^^^^^^^^^
AssertionError: Lists differ: [] != [('end', 'element')]

Second list contains 1 additional elements.
First extra element 0:
('end', 'element')

- []
+ [('end', 'element')]

----------------------------------------------------------------------
Ran 198 tests in 0.423s

FAILED (failures=3, skipped=6)
test test_xml_etree failed
test_xml_etree failed (3 failures)

== Tests result: FAILURE ==

1 test failed:
    test_xml_etree

Total duration: 640 ms
Total tests: run=198 failures=3 skipped=6
Total test files: run=1/1 failed=1
Result: FAILURE

CPython versions tested on:

3.13

Operating systems tested on:

Linux

@JacobHenner JacobHenner added the type-bug An unexpected behavior, bug, or error label Oct 7, 2024
@ZeroIntensity ZeroIntensity added build The build process and cross-build 3.13 bugs and security fixes labels Oct 7, 2024
@picnixz picnixz added topic-XML tests Tests in the Lib/test dir labels Oct 7, 2024
@AA-Turner AA-Turner changed the title Python 3.13 Python 3.13: tests fail on rocky linux 8 Oct 7, 2024
@AA-Turner
Copy link
Member

Are you able to test with pre releases, or attempt git bisect to help us narrow down the issue?

A

@JacobHenner
Copy link
Author

Are you able to test with pre releases, or attempt git bisect to help us narrow down the issue?

Yes - for the pre-releases, are there specific versions I should attempt?

@AA-Turner
Copy link
Member

I would attempt a manual bisection -- start with eg eg beta 1, then iterate through depending on if the failures are present.

@JacobHenner
Copy link
Author

I would attempt a manual bisection -- start with eg eg beta 1, then iterate through depending on if the failures are present.

I've reproduced the test_xml_etree error on all rc releases, and also a1. I'm testing some beta releases now.

Oddly, the test_hashlib error only occurs within our CI build environment, but not when building locally. The reason this is odd is because the same Dockerfile is used in both cases, with no known difference in build context that'd explain why it'd fail in one case and succeed in the other. I'll follow up on this one next.

@JacobHenner
Copy link
Author

JacobHenner commented Oct 7, 2024

The failing tests (test_xml_etree*) were all touched in https://github.com/python/cpython/pull/115623/files#diff-cd2315ff25ab49e653d363de2d213e390ecb1b01bbc86e2be3b700db4b0bf454. Perhaps the changes in 9f74e86, intended to address test failures for expat < 2.6, were incomplete?

@AA-Turner
Copy link
Member

Have you tried with a more recent version of expat (2.6+)? Do the tests succeed?

@JacobHenner
Copy link
Author

Have you tried with a more recent version of expat (2.6+)? Do the tests succeed?

Just did - test_xml_etree* succeeds with expat 2.6.3 built from source on Rocky 8, with all else remaining equal.

@vkurilin
Copy link

vkurilin commented Oct 8, 2024

Oddly, the test_hashlib error only occurs within our CI build environment, but not when building locally.

+1 to @JacobHenner – I have the exact same issue with test_hashlib. For 3.13.0 in CI, tests are failing both on rhel and debian. However, for all versions from 3.8 to 3.12, the tests pass without errors.
Locally, the Dockerfile builds without errors.

@JacobHenner
Copy link
Author

Oddly, the test_hashlib error only occurs within our CI build environment, but not when building locally.

+1 to @JacobHenner – I have the exact same issue with test_hashlib. For 3.13.0 in CI, tests are failing both on rhel and debian. However, for all versions from 3.8 to 3.12, the tests pass without errors.
Locally, the Dockerfile builds without errors.

What are you using to build your containers? Docker, podman, Kaniko?

@vkurilin
Copy link

vkurilin commented Oct 8, 2024

I'm building with Kaniko.
All versions from 3.13.0a1 to 3.13.0rc3 are failing.

However, there is an important difference from your case: only test_hashlib fails. test_xml_etree_c works fine.

@JacobHenner
Copy link
Author

I'm building with Kaniko.

I suspect this is related. We're also building with Kaniko in the cases where it's failing. I'm analyzing this further now.

However, there is an important difference from your case: only test_hashlib fails. test_xml_etree_c works fine.

Which distro are you using, and with which version of expat?

@isuftin
Copy link

isuftin commented Oct 8, 2024

We're seeing test_set fail in Docker building on Alpine 3.19 w/ expat 2.6.3

test_memoryview is also failing

@JacobHenner
Copy link
Author

JacobHenner commented Oct 8, 2024

I think my previous comments about test_hashlib were misguided because of how the error lines are displayed. The actual error appears to be in test_generators (FAIL: test_raise_and_yield_from (test.test_generators.SignalAndYieldFromTest.test_raise_and_yield_from)). I wonder if this is Kaniko-specific, however Kaniko builds of 3.9-3.12 do not exhibit this issue.

For comparison, the test_xml_etree and test_xml_etree_c failures are not Kaniko-specific, and are instead associated with the version of expat installed.

We're seeing test_set fail in Docker building on Alpine 3.19 w/ expat 2.6.3

test_memoryview is also failing

Can you share the specific cases that are failing?

@hroncok
Copy link
Contributor

hroncok commented Oct 8, 2024

RHEL 8 (and so I assume Rocky Linux 8) has not updated expat, but it has patched it instead. The failing tests are only failing because they check for expat version:

if expat.version_info >= (2, 6, 0):

if pyexpat.version_info >= (2, 6, 0):

if pyexpat.version_info >= (2, 6, 0):

@vkurilin
Copy link

vkurilin commented Oct 8, 2024

Which distro are you using, and with which version of expat?

I'm gradually trimming the Dockerfile down to the minimal example. I've stopped at the latest version of debian:bookworm, but the problem also reproduces on ubi8 and rockylinux, so it seems that it's not dependent on the bistro.

The presence or absence of expat also does not affect the reproducibility of the error with test_hashlib (and primarily errors in test generation, as you correctly pointed out)

@isuftin
Copy link

isuftin commented Oct 9, 2024

Here's my Dockerfile in case anyone wants to verify the failing tests:

FROM alpine:3.19

ARG INSTALL_VERSION=3.13.0
ENV PATH="/usr/local/bin:${PATH}"
ENV LANG=C.UTF-8

# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"
# https://pypi.org/project/pip/
ARG PYTHON_PIP_VERSION="24.2"
ARG SETUPTOOLS_VERSION="75.1.0"
ARG PYTHONDONTWRITEBYTECODE=1

SHELL ["/bin/ash", "-eo", "pipefail", "-c"]

# hadolint ignore=DL3003
RUN --mount=type=cache,target=/var/cache/apk \
	--mount=type=cache,target=/tmp \
	--mount=type=cache,target=/usr/src/python \
	# Install fetch dependencies
	apk add --no-cache --virtual .fetch-deps \
		tar~=1.35 \
		xz~=5.4.5 \
	\
	&& mkdir -p /usr/src/python \
	# Fetch installation
	&& wget -q -O /tmp/python.tar.xz "https://www.python.org/ftp/python/${INSTALL_VERSION%%[a-z]*}/Python-$INSTALL_VERSION.tar.xz" \
	&& tar -xJC /usr/src/python --strip-components=1 -f /tmp/python.tar.xz \
	\
	# Delete fetch dependencies
	&& apk del --no-network .fetch-deps && \
	\
	# Install build dependencies
	apk add --no-cache --virtual .build-deps \
		bluez-dev~=5.70 \
		bzip2-dev~=1.0.8 \
		coreutils~=9.4 \
		dpkg-dev~=1.22.1 \
		dpkg~=1.22 \
		expat-dev~=2.6.3 \
		findutils~=4.9.0 \
		gcc~=13.2.1 \
		gdbm-dev~=1.23 \
		gnupg~=2.4.4 \
		libc-dev~=0.7.2 \
		libffi-dev~=3.4.4 \
		libnsl-dev~=2.0.1 \
		make~=4.4 \
		ncurses-dev~=6.4 \
		openssl-dev~=~=3.1.7 \
		pax-utils~=1.3.7 \
		readline-dev~=8.2 \
		sqlite-dev~=3.44 \
		tcl-dev~=8.6.13 \
		tk-dev~=8.6.13 \
		tk~=8.6.13 \
		xz-dev~=5.4.5 && \
	\
	# Install dependencies
	apk add --no-cache \
		expat~=2.6.3 \
		# CVE-2022-1304
		libcom_err~=1.47 \
		libuuid~=2.39 \
		openssl~=3.1.7 \
		ssl_client~=1.36 \
	\
	# Build Python
	&& cd /usr/src/python \
	&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \
	&& ./configure \
		--build="$gnuArch" \
		--enable-loadable-sqlite-extensions \
		--enable-optimizations \
		--with-lto \
		--enable-option-checking=fatal \
		--enable-shared \
		--with-system-expat \
		--without-ensurepip \
	&& make -j "$(nproc)" \
		# set thread stack size to 1MB so we don't segfault before we hit sys.getrecursionlimit()
		# https://github.com/alpinelinux/aports/commit/2026e1259422d4e0cf92391ca2d3844356c649d0
		EXTRA_CFLAGS="-DTHREAD_STACK_SIZE=0x100000" \
		LDFLAGS="-Wl,--strip-all" \
	&& make install \
	\
	# Install run dependencies
	&& find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec scanelf --needed --nobanner --format '%n#p' '{}' ';' \
		| tr ',' '\n' \
		| sort -u \
		| awk 'system("[ -e /usr/local/lib/" $1 " ]") == 0 { next } { print "so:" $1 }' \
		| xargs -rt apk add --no-cache --virtual .python-rundeps \
	\
	# Delete build dependencies
	&& apk del --no-network .build-deps \
	\
	# Clean up installation
	&& find /usr/local -depth \
		\( \
		\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
		-o \
		\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
		\) -exec rm -rf '{}' + \
	\
	# Test for proper python installation
	&& python3 --version | grep "${INSTALL_VERSION}" \
	\
	# Perform linking
	&& ln -s /usr/local/bin/idle3 /usr/local/bin/idle \
	&& ln -s /usr/local/bin/pydoc3 /usr/local/bin/pydoc \
	&& ln -s /usr/local/bin/python3 /usr/local/bin/python \
	&& ln -s /usr/local/bin/python3-config /usr/local/bin/python-config \
	\
	# Install PIP
	&& wget -q -O /tmp/get-pip.py "https://bootstrap.pypa.io/get-pip.py" \
	&& python /tmp/get-pip.py --disable-pip-version-check --no-cache-dir --no-compile \
		"setuptools==${SETUPTOOLS_VERSION}" \
		"pip==${PYTHON_PIP_VERSION}"

CMD ["python3"]

HEALTHCHECK NONE

@JacobHenner
Copy link
Author

JacobHenner commented Oct 9, 2024

It turns out that the test_xml_etree* and test_generators failures are not new, they're just no longer ignored in PROFILE_TASK when they fail (as of Python 3.13). I looked at a build of 3.12.7, and test_xml_etree, test_xml_etree_c, and test_generators are all failing on RockyLinux 8, but this was not noticed earlier because the results were ignored during build.

@edmorley
Copy link

edmorley commented Oct 9, 2024

I'm seeing test_xml_etree + test_xml_etree_c failures during the PGO profile task on Ubuntu 22.04 with Python 3.13 too. (Plus like the above, I see those failures also occurred on older Python versions - the difference now is that they aren't being silently ignored during PGO.)

These tests pass for us on Ubuntu 20.04 and 24.04, just not 22.04. We build with --with-system-expat, so presuming (based on earlier comments) that expat is the issue, that means:

  • Works:
    • Ubuntu 20.04 -> libexpat v2.2.9-1ubuntu0.7
    • Ubuntu 24.04 -> libexpat v2.6.1-2ubuntu0.1
  • Fails:
    • Ubuntu 22.04 -> libexpat v2.4.7-1ubuntu0.4

Example failing build log:
https://github.com/heroku/heroku-buildpack-python/actions/runs/11259777638/job/31310276989#step:4:1374

The Dockerfile/scripts used to build:
https://github.com/heroku/heroku-buildpack-python/blob/ebd222f304ed315e84b7c8c6b8a89898c05e88ca/builds/Dockerfile
https://github.com/heroku/heroku-buildpack-python/blob/ebd222f304ed315e84b7c8c6b8a89898c05e88ca/builds/build_python_runtime.sh

xref:
heroku/heroku-buildpack-python#1661 (comment)

@JacobHenner
Copy link
Author

JacobHenner commented Oct 9, 2024

I now believe the title of this issue is misleading. I've just confirmed that test_xml_etree, test_xml_etree_c, and test_generators all fail in the latest versions of Python 3.9-3.13. The reason this was noticed in 3.13 is because PROFILE_TASK failures are no longer ignored during build as of 3.13.

The specific test cases failures should probably be tracked separately. It's possible that users will see new build failures in other environments where the tests had always been failing, but not breaking the build.

@vstinner vstinner changed the title Python 3.13: tests fail on rocky linux 8 Python 3.13: test_xml_etree fails on rocky linux 8 Oct 10, 2024
@vstinner
Copy link
Member

@vkurilin:

For 3.13.0 in CI, tests are failing both on rhel and debian.

Strange, we have Debian and RHEL buildbots and the whole test suite pass there.

@edmorley
Copy link

edmorley commented Oct 10, 2024

RHEL 8 (and so I assume Rocky Linux 8) has not updated expat, but it has patched it instead. The failing tests are only failing because they check for expat version:

The version checks not matching Ubuntu's backported versions appear to be the cause in one case (the failure in test_flush_reparse_deferral_disabled), however, two other tests are failing (test_simple_xml_chunk_1 and test_simple_xml_chunk_5) which don't have any expat version checks?

For failure output and links to the full logs, see:
heroku/heroku-buildpack-python#1661 (comment)

Also, cross-linking some prior history in this area:

@edmorley
Copy link

In the meantime we've had to resort to disabling the three affecting tests on Ubuntu 22.04 with Python 3.13.0:
heroku/heroku-buildpack-python@47c9f3a

@hroncok
Copy link
Contributor

hroncok commented Oct 11, 2024

test_simple_xml_chunk_1 and test_simple_xml_chunk_5 used to have expat version check, I am not entirely sure why it was removed in e5e4033 (#115623)

@edmorley
Copy link

edmorley commented Nov 5, 2024

What are the next steps here? It would be great to have a fix for this for Python 3.13.1 so we don't need custom patches to build from source with PGO enabled :-)

@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

Can someone try if reverting e5e4033 fix the issue?

@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

Strange, we have Debian and RHEL buildbots and the whole test suite pass there.

Ah, RHEL8 buildbots don't use --with-system-expat but the embedded copy of libexpat: expat 2.6.3. In this case, test_xml_etree pass.

When using --with-system-expat on RHEL 8.10 (expat version 2.2.5), I get 3 test failures in test_xml_etree:

  • test_flush_reparse_deferral_disabled()
  • test_simple_xml_chunk_1()
  • test_simple_xml_chunk_5()

Can someone try if reverting e5e4033 fix the issue?

Reverting the change doesn't change anything for expat 2.2.5.

Before this change, test_simple_xml_chunk_1() and test_simple_xml_chunk_5() were skipped on expat 2.6 and newer.

@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

Please open a separated issue for test_generators and test_hashlib.

@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

@serhiy-storchaka: Would you mind to have a look at this expat issue? Tests fail on old expat 2.2.5.

@vstinner vstinner changed the title Python 3.13: test_xml_etree fails on rocky linux 8 Python 3.13: test_xml_etree fails on expat 2.2.5 (RHEL8 / Rocky linux 8) Nov 6, 2024
@serhiy-storchaka
Copy link
Member

There is no way to know whether reparse deferral was enabled in Expat. pyexpat maintains its own flag which is set by default to True if and only if Expat version is >= 2.6.0, and then update it after calling XML_SetReparseDeferralEnabled(). If reparse deferral was backported to Expat < 2.6.0, pyexpat still think that it is disabled. xmlparser.SetReparseDeferralEnabled is no-op because XML_SetReparseDeferralEnabled does not exist in Expat < 2.6.0. Therefore, we cannot disable it and test for disabled reparse deferral.

I do not see how we can fix this on our side without skipping some tests on vanilla Expat < 2.6.0. I think that if you use a patched Expat, you should also patch Python (at least tests) and all user code which depends on the old Expat behavior.

@hartwork, I afraid this issue will haunt us for years.

@vstinner
Copy link
Member

vstinner commented Nov 8, 2024

expat 2.2.5 was released at November 1, 2017. expat 2.6.0 was released at February 6, 2024.

What do you think of skipping the 3 failing tests on expat < 2.6.0?

@serhiy-storchaka
Copy link
Member

As far as I understand, this is not a vanilla Expat 2.2.5, but Expat 2.2.5 patched with some changes from 2.6. We do not want to skip tests on platforms that use Expat < 2.6 which does not include such changes (it may be a majority of computer for now).

I suggest to only skip them on platforms where the patched old Expat is used. We cannot know what changes were backported in the particular distribution, so we should leave this for maintainers of these distributions.

@hartwork
Copy link
Contributor

hartwork commented Nov 8, 2024

We cannot know what changes were backported in the particular distribution, so we should leave this for maintainers of these distributions.

Full ack, with both my upstream and my downstream hat on.

@vstinner
Copy link
Member

I suggest to close this issue as "WONT FIX".

@edmorley
Copy link

edmorley commented Dec 4, 2024

Please could someone edit the title of this issue to add "Ubuntu 22.04" to the affected distros list?

We cannot know what changes were backported in the particular distribution, so we should leave this for maintainers of these distributions.

I suggest to close this issue as "WONT FIX".

If that's the outcome, would you recommend we (independent maintainers of Linux builds of Python that need to run on Linux, but that are not a distro package, so we really don't want to be in the patching-packages game) stop using --with-system-expat?

It sounds like at the moment that --with-system-expat is not really tested/supported in a way that makes it safe for anyone other than the distro maintainers themselves to use? If so, should a warning be added to:
https://docs.python.org/3.13/using/configure.html#cmdoption-with-system-expat

@vstinner vstinner changed the title Python 3.13: test_xml_etree fails on expat 2.2.5 (RHEL8 / Rocky linux 8) Python 3.13: test_xml_etree fails on expat 2.2.5 (RHEL8 / Rocky linux 8 / Ubuntu 22.04) Dec 4, 2024
@vstinner
Copy link
Member

vstinner commented Dec 4, 2024

Please could someone edit the title of this issue to add "Ubuntu 22.04" to the affected distros list?

Done.

@hartwork
Copy link
Contributor

hartwork commented Dec 4, 2024

It sounds like at the moment that --with-system-expat is not really tested/supported in a way that makes it safe for anyone other than the distro maintainers themselves to use? If so, should a warning be added to: https://docs.python.org/3.13/using/configure.html#cmdoption-with-system-expat

@edmorley --with-system-expat is the right choice with regard to security. I'm saying that with both my upstream and downstream hat on. So I see more damage than help in making people re-consider that option through new warnings: please don't, security first. Please either accept the rightfully broken test or patch it on your level of distribution.

@edmorley
Copy link

edmorley commented Dec 4, 2024

@hartwork The problem is that we don't really have another choice though :-)

I'm not a distro maintainer (edit: that is, a "maintainer of a distribution of the Linux OS"), I've never personally used the xml.parsers.expat API and I'm not familiar with the expat library. I just build binaries for use by end-users on our platform. It's not practical (or sensible) for me to be patching anything expat that's non-test related. Disabling PGO is also not an option.

As a temporary workaround to our Python 3.13 PGO builds failing I had to disable the affected tests:
heroku/heroku-buildpack-python@47c9f3a

My concerns are now:

  1. I'd prefer not to have to maintain any patches
  2. (More significantly) Is there an actual functionality regression when using xml.parsers.expat with Ubuntu 22.04's expat (that has partial backports), or is this a false-positive-test-only issue? Plus even if it's only a false positive now, how will we know there won't be a real regression in the future?

Given that:

  • CPython now bundles a fairly up to date expat
  • CPython's CI (afaik) tests the bundled expat and not distro versions
  • --with-system-expat is disabled by default

...then stopping using --with-system-expat seems like the only viable option to me?

The Docker Hub official Python image maintainers seem to have come to the same conclusion:
https://github.com/docker-library/python/pull/980/files#diff-b3c0fcf605d397bd3fce8ea61588d2c9da8b24a9fe6886b2e640d97914f1d01fL191

@hartwork
Copy link
Contributor

hartwork commented Dec 4, 2024

@edmorley from what I read you're not "just building binaries": to me it sounds like you're a distributor and you're facing the cost of the backports approach (that saves you cost elsewhere, a tradeoff) and of being a distributor. I think you can pick from:

  • a) use the CPython bundled Expat (and be vulnerable until that bundle is updated and your CPython is updated)
  • b) use the distro Expat (and be vulnerable until backports are available and you update) and use distro CPython also (to match the distro's backported Expat)
  • c) use the distro Expat (and be vulnerable until backports are available and you update) and patch non-distro CPython yourself (to match the distro's backported Expat)
  • d) install the latest upstream Expat release system-wide and update that pinning yourself for every release (and be vulnerable until you update)

All other approaches I can think of put the effort to CPython upstream where it does not belong.

@mcepl
Copy link
Contributor

mcepl commented Dec 6, 2024

I'm not a distro maintainer, …

@edmorley , I have to agree with @hartwork , yes, you are a distributor. If you deal with the problem of backporting to different version of libexpat, welcome on the board.

And yes, I could add openSUSE Leap 15.6 to the list of failing distros, for the same reason as what’s described in #125067 (comment) : our libexpat has been modified so heavily, that even though it gets some version number, it absolutely doesn’t correspond to the functionality provided by the upstream clean library of that version.

We were dealing with these errors to the various degrees of success by this patch https://build.opensuse.org/projects/devel:languages:python:Factory/packages/python314/files/CVE-2023-52425-libexpat-2.6.0-backport-15.6.patch?expand=1 (basically skipping the problematic tests).

@edmorley
Copy link

edmorley commented Dec 6, 2024

Thank you for the replies.

However, there's been a bit of a misunderstanding. By "distro maintainer" I was referring to my not being a "maintainer of a distribution of the Linux OS", and not anything about being a "distributor of Python binaries".

Ideally I wouldn't have to be building Python binaries at all, however:

Perhaps things will improve in the future, given recent news about python-build-standalone, but for now we're stuck with building Python binaries.

Given our use-case, it seems likely we'll pick:

a) use the CPython bundled Expat (and be vulnerable until that bundle is updated and your CPython is updated)

...especially given that this is the configuration that CPython defaults to, and tests in its own CI.

@serhiy-storchaka
Copy link
Member

Do you also distribute Python tests? If yes, then you should distribute modified tests which ignore such failure (on selected platforms or always). If no, then you should not worry at all.

@edmorley
Copy link

edmorley commented Dec 6, 2024

Do you also distribute Python tests? If yes, then you should distribute modified tests which ignore such failure (on selected platforms or always).

No we don't distribute tests to reduce resultant container image size (we build with --disable-test-modules).

If no, then you should not worry at all.

The failing tests with Ubuntu expat are still something that affects us, since:

  1. We need PGO, and PGO now requires the tests to pass (quite reasonably so!)
  2. If we patch the tests so that the failing tests are skipped, then it could just be masking a current or future expat incompatibility, per:

(More significantly) Is there an actual functionality regression when using xml.parsers.expat with Ubuntu 22.04's expat (that has partial backports), or is this a false-positive-test-only issue? Plus even if it's only a false positive now, how will we know there won't be a real regression in the future?

(from #125067 (comment))

@hartwork
Copy link
Contributor

hartwork commented Dec 6, 2024

Do you also distribute Python tests? [..]. If no, then you should not worry at all.

I believe this picture is missing that CPython compiled against backported system Expat will use less of the recent Expat API internally (because the CPython build system assumes that API to be missing based on its version, which is legit) and so then even outside of tests pull requests like #115623 may not work as expected any more because system Expat is behaving differently than what CPython is rightfully expecting from the given Expat version number. So if I'm not missing something it would need patching where binaries are built against backported Expat, and not just the tests but also C code (or even just C code). A cheap trick that may work is to fake XML_COMBINED_VERSION to a different value but it only works when the given backport did not omit any behavior-changing patches in between. It will need close evaluation (and then patching) in any case. Compared to the pile of patches to CPython that other distributors need to apply, one patch is not too bad for a distribution of CPython with regard to cost. On a personal note, I think it's interesting that 99% of the time backporting appears to work just fine and then that other 1% shows how flawed the backporting approach is by design and that most of the time it just doesn't show.

CC @serhiy-storchaka @edmorley

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes build The build process and cross-build tests Tests in the Lib/test dir topic-XML type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests