You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhancement Request: Custom Operator Support for PyArrow Extension Types in Compute Functions
Hello!
I have been using the PyArrow extension capability to define custom types, which is extremely useful for extending Arrow's functionality. However, a significant limitation arises when using these custom types with compute functions.
For example, the FixedShapeTensorType type, designed as an extension type for ndarrays, triggers an error when used with the pc.equal function to compare arrays:
return func.call(args, None, memory_pool)
File "pyarrow\\_compute.pyx", line 385, in pyarrow._compute.Function.call
File "pyarrow\\error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\\error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel matching input types (extension<arrow.fixed_shape_tensor[value_type=int32, shape=[2,2]]>, extension<arrow.fixed_shape_tensor[value_type=int32, shape=[2,2]]>)
In this example, the PythonObjectArrowScalar class defines an __eq__ method, enabling custom equality comparisons for the scalar elements. Similarly, the PythonObjectArrowArray class can provide custom implementations for data conversion and manipulation.
Challenges
While defining __eq__ in the scalar class is straightforward, I am uncertain how this would integrate into compute functions like pc.equal. It may require exposing additional hooks or mechanisms in PyArrow to allow users to register their operator implementations.
Please let me know if additional details or examples are needed.
Best,
Logan Lang
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered:
I think there are two different things mentioned in this issue, one is compute kernels and the support of them for extension types. The other is the Python comparison operators for the Extension arrays.
I am quite sure defining __eq__ method on Scalar object will not solve the fact that some kernels, equals in this example, are not supported for Extension types in C++.
There is an issue opened that covers kernel support for ExtensionTypes and I think it would be worth moving it forward, see #22304. Also connected to the kernels: #33452.
On the other hand it would be worth investigating a bit more, how using Python equality operators could be improved for Extension arrays. Currently we check type equality and value of the storage separately in the tests.
I think the issue connected to this might be: #24348.
Enhancement Request: Custom Operator Support for PyArrow Extension Types in Compute Functions
Hello!
I have been using the PyArrow extension capability to define custom types, which is extremely useful for extending Arrow's functionality. However, a significant limitation arises when using these custom types with compute functions.
For example, the
FixedShapeTensorType
type, designed as an extension type forndarrays
, triggers an error when used with thepc.equal
function to compare arrays:Example Code
Error Message
Proposed Solution
I believe it would be highly useful for PyArrow to allow users to define custom operator support for extension types, similar to how Pandas enables operator support for
ExtensionArray
.Suggested Implementation
Here’s an example for the interface:
In this example, the
PythonObjectArrowScalar
class defines an__eq__
method, enabling custom equality comparisons for the scalar elements. Similarly, thePythonObjectArrowArray
class can provide custom implementations for data conversion and manipulation.Challenges
While defining
__eq__
in the scalar class is straightforward, I am uncertain how this would integrate into compute functions likepc.equal
. It may require exposing additional hooks or mechanisms in PyArrow to allow users to register their operator implementations.Please let me know if additional details or examples are needed.
Best,
Logan Lang
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: