[CI][Python] test_arrow_cogrouped_map and test_arrow_grouped_map from pyspark fail on integration job #44986

raulcd · 2024-12-10T09:13:41Z

Describe the bug, including details regarding any error messages, version, and platform.

As seen here:
#44981 (comment)
When I tried to run "pyspark.sql.tests.arrow.test_arrow_grouped_map" and "pyspark.sql.tests.arrow.test_arrow_cogrouped_map" they fail due to missing pandas:

 Traceback (most recent call last):
  File "/spark/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py", line 264, in test_self_join
    df2 = df.groupby("k").applyInArrow(arrow_func, schema="x long, y long")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/spark/python/pyspark/sql/pandas/group_ops.py", line 809, in applyInArrow
    udf = pandas_udf(
          ^^^^^^^^^^^
  File "/spark/python/pyspark/sql/pandas/functions.py", line 372, in pandas_udf
    require_minimum_pandas_version()
  File "/spark/python/pyspark/sql/pandas/utils.py", line 43, in require_minimum_pandas_version
    raise PySparkImportError(
pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] Pandas >= 2.0.0 must be installed; however, it was not found.

Those tests were never executed in the past but might be worth to include them on the job as they are arrow related.

Component(s)

Continuous Integration, Python

The text was updated successfully, but these errors were encountered:

raulcd added the Type: bug label Dec 10, 2024

github-actions bot added Component: Python Component: Continuous Integration labels Dec 10, 2024

raulcd mentioned this issue Dec 10, 2024

GH-44980: [CI] Remove retrieval of Arrow version from Java on Spark integration and update test structure for PySpark #44981

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI][Python] test_arrow_cogrouped_map and test_arrow_grouped_map from pyspark fail on integration job #44986

[CI][Python] test_arrow_cogrouped_map and test_arrow_grouped_map from pyspark fail on integration job #44986

raulcd commented Dec 10, 2024

[CI][Python] test_arrow_cogrouped_map and test_arrow_grouped_map from pyspark fail on integration job #44986

[CI][Python] test_arrow_cogrouped_map and test_arrow_grouped_map from pyspark fail on integration job #44986

Comments

raulcd commented Dec 10, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)