Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when trying to use List(Categorical) set_intersection with concat_list of other column with nulls or empty frame #16405

Closed
2 tasks done
Object905 opened this issue May 22, 2024 · 2 comments · Fixed by #16730
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@Object905
Copy link
Contributor

Object905 commented May 22, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

pl.enable_string_cache()

df = pl.DataFrame(
    [(None, "foo")], # empty list here fails too with different error
    schema={
        "match": pl.List(pl.Categorical),
        "what": pl.Categorical,
    },
)

# checking for not disjoint with concat_list on one of the columns
q = pl.col("match").list.set_intersection(pl.concat_list(pl.col("what"))).list.len() != 0

print(df.filter(q))

Log output

thread '<unnamed>' panicked at crates/polars-core/src/series/from.rs:92:37:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0:     0x704bf2ec3108 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc65a86809eb3aa65
   1:     0x704bf080394b - core::fmt::write::hcd5b8dd8febb96a0
   2:     0x704bf2e91c4e - std::io::Write::write_fmt::hc422b42d0849f877
   3:     0x704bf2ec8169 - std::sys_common::backtrace::print::h286fd4354e2ba39e
   4:     0x704bf2ec7a79 - std::panicking::default_hook::{{closure}}::hf9d4b516f8220f92
   5:     0x704bf2ec8c25 - std::panicking::rust_panic_with_hook::h124d9722759d43e1
   6:     0x704bf2ec84ba - std::panicking::begin_panic_handler::{{closure}}::h1123a3c792c1da95
   7:     0x704bf2ec8449 - std::sys_common::backtrace::__rust_end_short_backtrace::h33bd6640824974d0
   8:     0x704bf2ec8436 - rust_begin_unwind
   9:     0x704bef6c90b2 - core::panicking::panic_fmt::hea6c49867823d75c
  10:     0x704bef6c9184 - core::panicking::panic::hfd7eccb65c6169e0
  11:     0x704bef6c9548 - core::option::unwrap_failed::h86dc8dafdfc76144
  12:     0x704bf0c29d10 - polars_core::series::from::<impl polars_core::series::Series>::from_chunks_and_dtype_unchecked::h6569a4bf891363c5
  13:     0x704bf134c250 - polars_core::chunked_array::list::<impl polars_core::chunked_array::ChunkedArray<polars_core::datatypes::ListType>>::get_inner::h032f121255d414ae
  14:     0x704bf274af16 - <F as polars_plan::dsl::expr_dyn_fn::SeriesUdf>::call_udf::h74ee36510b316339
  15:     0x704bf14ed0db - polars_expr::expressions::apply::ApplyExpr::eval_and_flatten::h98176396bb121518
  16:     0x704bf14eca5c - <polars_expr::expressions::apply::ApplyExpr as polars_expr::expressions::PhysicalExpr>::evaluate::hb68873540056f1c4
  17:     0x704bf14ec627 - <polars_expr::expressions::apply::ApplyExpr as polars_expr::expressions::PhysicalExpr>::evaluate::hb68873540056f1c4
  18:     0x704bf1504f98 - <polars_expr::expressions::cast::CastExpr as polars_expr::expressions::PhysicalExpr>::evaluate::h1fe022b10016f305
  19:     0x704bf14fe7f2 - <polars_expr::expressions::binary::BinaryExpr as polars_expr::expressions::PhysicalExpr>::evaluate::h7b8c0275fc33c361
  20:     0x704bf1b2bedc - polars_lazy::physical_plan::executors::filter::FilterExec::execute_hor::h931d6b25035513b4
  21:     0x704bf1b2bc46 - <polars_lazy::physical_plan::executors::filter::FilterExec as polars_lazy::physical_plan::executors::executor::Executor>::execute::{{closure}}::h5ddfcebc32137cd8
  22:     0x704bf1b2b865 - <polars_lazy::physical_plan::executors::filter::FilterExec as polars_lazy::physical_plan::executors::executor::Executor>::execute::hadf594c4bc2cb118
  23:     0x704bf1986166 - polars_lazy::frame::LazyFrame::collect::h4511692174470228
  24:     0x704bf0698064 - polars::lazyframe::PyLazyFrame::__pymethod_collect__::h04bde37260916fbd
  25:     0x704beff06321 - pyo3::impl_::trampoline::trampoline::h16e18577987bc07b
  26:     0x704bf06adac1 - polars::lazyframe::_::__INVENTORY::trampoline::h22e0fc0ea1f9163f
  27:     0x704bf89a8ea2 - PyWeakref_GET_OBJECT
                               at /usr/src/debug/python312/Python-3.12.2/./Include/cpython/weakrefobject.h:51:8
  28:     0x704bf89a8ea2 - proxy_checkref
                               at /usr/src/debug/python312/Python-3.12.2/Objects/weakrefobject.c:394:9
  29:     0x704bf89a8ea2 - proxy_divmod
                               at /usr/src/debug/python312/Python-3.12.2/Objects/weakrefobject.c:512:1
  30:     0x704bf899a844 - _Py_bytes_lower
                               at /usr/src/debug/python312/Python-3.12.2/Objects/bytes_methods.c:307:19
  31:     0x704bf899a844 - _Py_bytes_capitalize
                               at /usr/src/debug/python312/Python-3.12.2/Objects/bytes_methods.c:369:9
  32:     0x704bf899a844 - _Py_bytes_capitalize
                               at /usr/src/debug/python312/Python-3.12.2/Objects/bytes_methods.c:365:1
  33:     0x704bf899a844 - stringlib_capitalize
                               at /usr/src/debug/python312/Python-3.12.2/Objects/stringlib/ctype.h:101:5
  34:     0x704bf888bdfa - <unknown>
  35:     0x704bf89d53ac - _PyLong_Size_t_Converter
                               at /usr/src/debug/python312/Python-3.12.2/Objects/longobject.c:1501:12
  36:     0x704bf89d4e7e - format_float_short
                               at /usr/src/debug/python312/Python-3.12.2/Python/pystrtod.c:1118:21
  37:     0x704bf89b7dd0 - <unknown>
                               at /usr/src/debug/python312/Python-3.12.2/Python/ceval.c:2287:1
  38:     0x704bf888cb8e - <unknown>
  39:     0x704bf89d53ac - _PyLong_Size_t_Converter
                               at /usr/src/debug/python312/Python-3.12.2/Objects/longobject.c:1501:12
  40:     0x704bf89d4e7e - format_float_short
                               at /usr/src/debug/python312/Python-3.12.2/Python/pystrtod.c:1118:21
  41:     0x704bf89b7dd0 - <unknown>
                               at /usr/src/debug/python312/Python-3.12.2/Python/ceval.c:2287:1
  42:     0x704bf888cb8e - <unknown>
  43:     0x704bf89d53ac - _PyLong_Size_t_Converter
                               at /usr/src/debug/python312/Python-3.12.2/Objects/longobject.c:1501:12
  44:     0x704bf89d4e7e - format_float_short
                               at /usr/src/debug/python312/Python-3.12.2/Python/pystrtod.c:1118:21
  45:     0x704bf89b7dd0 - <unknown>
                               at /usr/src/debug/python312/Python-3.12.2/Python/ceval.c:2287:1
  46:     0x704bf888cb8e - <unknown>
  47:     0x704bf8a3d767 - PyByteArray_FromStringAndSize
                               at /usr/src/debug/python312/Python-3.12.2/Objects/bytearrayobject.c:149:12
  48:     0x704bf8a3d767 - bytearray_subscript
                               at /usr/src/debug/python312/Python-3.12.2/Objects/bytearrayobject.c:397
  49:     0x704bf8a608b7 - _PyThreadState_GET
                               at /usr/src/debug/python312/Python-3.12.2/./Include/internal/pycore_pystate.h:97:12
  50:     0x704bf8a608b7 - PyErr_SetString
                               at /usr/src/debug/python312/Python-3.12.2/Python/errors.c:304:29
  51:     0x704bf8a608b7 - convert_sched_param
                               at /usr/src/debug/python312/Python-3.12.2/./Modules/posixmodule.c:7857
  52:     0x704bf8a5b9dc - _PyThreadState_GET
                               at /usr/src/debug/python312/Python-3.12.2/./Include/internal/pycore_pystate.h:97
  53:     0x704bf8a5b9dc - _Py_EnterRecursiveCall
                               at /usr/src/debug/python312/Python-3.12.2/./Include/internal/pycore_ceval.h:139
  54:     0x704bf8a5b9dc - obj2ast_excepthandler
                               at /usr/src/debug/python312/Python-3.12.2/Python/Python-ast.c:10976
  55:     0x704bf8a74f33 - stringlib_replace_delete_substring
                               at /usr/src/debug/python312/Python-3.12.2/Objects/stringlib/transmogrify.h:425
  56:     0x704bf8a74f33 - stringlib_replace
                               at /usr/src/debug/python312/Python-3.12.2/Objects/stringlib/transmogrify.h:711
  57:     0x704bf8a74f33 - bytes_replace_impl
                               at /usr/src/debug/python312/Python-3.12.2/Objects/bytesobject.c:2211
  58:     0x704bf8a74f33 - bytes_replace
                               at /usr/src/debug/python312/Python-3.12.2/Objects/clinic/bytesobject.c.h:592
  59:     0x704bf8a74346 - stringlib_rjust
                               at /usr/src/debug/python312/Python-3.12.2/Objects/stringlib/clinic/transmogrify.h.h:168
  60:     0x704bf8a73f88 - bytes_split_impl
                               at /usr/src/debug/python312/Python-3.12.2/Objects/bytesobject.c:1741
  61:     0x704bf8a73f88 - bytes_split
                               at /usr/src/debug/python312/Python-3.12.2/Objects/clinic/bytesobject.c.h:109
  62:     0x704bf8a6cc67 - PyLong_AsUnsignedLongLongMask
                               at /usr/src/debug/python312/Python-3.12.2/Objects/longobject.c:1320
  63:     0x704bf8a28fab - PyFile_WriteObject
                               at /usr/src/debug/python312/Python-3.12.2/Objects/fileobject.c:120
  64:     0x704bf8639c88 - <unknown>
  65:     0x704bf8639d4c - __libc_start_main
  66:     0x58360720c045 - _start
  67:                0x0 - <unknown>
Traceback (most recent call last):
  File "/home/obj/Dev/provider-cabinet/./manage.py", line 34, in <module>
    main()
  File "/home/obj/Dev/provider-cabinet/./manage.py", line 30, in main
    execute_from_command_line(sys.argv)
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django_extensions/management/email_notifications.py", line 65, in run_from_argv
    super().run_from_argv(argv)
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django_extensions/management/email_notifications.py", line 77, in execute
    super().execute(*args, **options)
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django_extensions/management/utils.py", line 62, in inner
    ret = func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django_extensions/management/commands/runscript.py", line 281, in handle
    run_script(script_mod, *script_args)
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/django_extensions/management/commands/runscript.py", line 159, in run_script
    exit_code = mod.run(*script_args)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/obj/Dev/provider-cabinet/scripts/test.py", line 145, in run
    first()
  File "/home/obj/Dev/provider-cabinet/scripts/test.py", line 165, in first
    print(df.filter(q))
          ^^^^^^^^^^^^
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/polars/dataframe/frame.py", line 4204, in filter
    return self.lazy().filter(*predicates, **constraints).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/obj/Dev/provider-cabinet/.venv/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 1816, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value

Issue description

Got panic on new version.
When tried to reproduce, cached another related panic, but with different trace.

Panic with null value is reproducible from 0.20.15 (didn't bother checking earlier versions) - only appears when match columns is null.

Panic on this expression when dataframe is completely empty is new, appeas on 0.20.27, while 0.20.26 is ok.
Edit: 0.20.28 is also affected.

Expected behavior

No crashes

Installed versions

--------Version info---------
Polars:               0.20.27
Index type:           UInt32
Platform:             Linux-6.9.1-arch1-1-x86_64-with-glibc2.39
Python:               3.12.3 (main, Apr 23 2024, 09:16:07) [GCC 13.2.1 20240417]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            0.10.4
fsspec:               <not installed>
gevent:               24.2.1
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              16.1.0
pydantic:             2.7.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.30
torch:                <not installed>
xlsx2csv:             0.8.2
xlsxwriter:           3.2.0
@Object905 Object905 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 22, 2024
@Object905 Object905 changed the title Panic when trying to use List(Categorical) set_intersection with concat_list of other column Panic when trying to use List(Categorical) set_intersection with concat_list of other column with nulls May 22, 2024
@Object905 Object905 changed the title Panic when trying to use List(Categorical) set_intersection with concat_list of other column with nulls Panic when trying to use List(Categorical) set_intersection with concat_list of other column with nulls or empty frame May 22, 2024
@cmdlineluser
Copy link
Contributor

Can reproduce.

It seems any usage of the List column causes the panic:

df = pl.DataFrame(schema={"match": pl.List(pl.Categorical)})

df.select(pl.col("match") == pl.col("match"))
# PanicException: called `Option::unwrap()` on a `None` value

@Object905
Copy link
Contributor Author

Sadly that's a blocker for update for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants