Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Python] test_arrow_cogrouped_map and test_arrow_grouped_map from pyspark fail on integration job #44986

Open
raulcd opened this issue Dec 10, 2024 · 0 comments

Comments

@raulcd
Copy link
Member

raulcd commented Dec 10, 2024

Describe the bug, including details regarding any error messages, version, and platform.

As seen here:
#44981 (comment)
When I tried to run "pyspark.sql.tests.arrow.test_arrow_grouped_map" and "pyspark.sql.tests.arrow.test_arrow_cogrouped_map" they fail due to missing pandas:

 Traceback (most recent call last):
  File "/spark/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py", line 264, in test_self_join
    df2 = df.groupby("k").applyInArrow(arrow_func, schema="x long, y long")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/spark/python/pyspark/sql/pandas/group_ops.py", line 809, in applyInArrow
    udf = pandas_udf(
          ^^^^^^^^^^^
  File "/spark/python/pyspark/sql/pandas/functions.py", line 372, in pandas_udf
    require_minimum_pandas_version()
  File "/spark/python/pyspark/sql/pandas/utils.py", line 43, in require_minimum_pandas_version
    raise PySparkImportError(
pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] Pandas >= 2.0.0 must be installed; however, it was not found.

Those tests were never executed in the past but might be worth to include them on the job as they are arrow related.

Component(s)

Continuous Integration, Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant