You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, compute kernels don't recognize extensions types so that if you were to define semantic types to indicate things like "this string column is an image label", you then cannot do things like equals on it.
for query systems that push some of the compute down to Arrow (e.g., DuckDB), it also means that it's much harder for users to work with datasets with extension types because you don't know which functions will actually work.
Instead, if we can make the compute kernels default to the storage type, it would make the extension system a lot easier to work with in Arrow.
Miles Granger / @milesgranger:
I think this makes good sense, although I'm not sure about the implementation details of it. I think many (all?) kernels specify their allowed input types before runtime, but perhaps there is a way match based on storage type as well?
cc @jorisvandenbossche
Currently, compute kernels don't recognize extensions types so that if you were to define semantic types to indicate things like "this string column is an image label", you then cannot do things like equals on it.
For example, take the LabelType from https://github.com/apache/arrow/blob/c3824db8530075e0f8fd26974c193a310003c17a/python/pyarrow/tests/test_extension_type.py
for query systems that push some of the compute down to Arrow (e.g., DuckDB), it also means that it's much harder for users to work with datasets with extension types because you don't know which functions will actually work.
Instead, if we can make the compute kernels default to the storage type, it would make the extension system a lot easier to work with in Arrow.
Reporter: Chang She / @changhiskhan
Note: This issue was originally created as ARROW-18273. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: