Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with join and selectors in LazyFrames (equivalent code works for eager DataFrames) #19822

Closed
2 tasks done
madneuron opened this issue Nov 16, 2024 · 5 comments · Fixed by #19974
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@madneuron
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
data = pl.DataFrame(data={'a': [0,1], 'b':[2,3]}).lazy()
df_full = data.join(data, how='left', left_on=pl.coalesce('b','a'),right_on='a',validate="m:1")
df=df_full.select(~pl.selectors.ends_with("_right"))
df.collect()

Log output

found multiple sources; run comm_subplan_elim
thread '<unnamed>' panicked at crates\polars-plan\src\utils.rs:360:79:
called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("a"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "c:\Users\DragonDad\PythonProjects\pyro\src\pyro\data\enrichers\bug.py", line 8, in <module>
    df.collect()
  File "C:\Users\DragonDad\AppData\Local\pypoetry\Cache\virtualenvs\pyro-c_S8C29v-py3.12\Lib\site-packages\polars\lazyframe\frame.py", line 2021, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("a"))

Issue description

Join in combination with selectors doesn't work for LazyFrames when trying to drop some the resulting columns

Expected behavior

This should behave similarly to how it works with eager dataframes:

    data = pl.DataFrame(data={"a": [0, 1], "b": [2, 3]})
    df_full = data.join(
        data, how="left", left_on=pl.coalesce("b", "a"), right_on="a", validate="m:1"
    )
    df = df_full.select(~pl.selectors.ends_with("_right"))

Installed versions

--------Version info---------
Polars:              1.13.1
Index type:          UInt32
Platform:            Windows-11-10.0.22631-SP0
Python:              3.12.6 (tags/v3.12.6:a4a2d2b, Sep  6 2024, 20:11:23) [MSC v.1940 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                2.5.1+cpu
xlsx2csv             <not installed>
xlsxwriter           <not installed>```

</details>
@madneuron madneuron added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 16, 2024
@cmdlineluser
Copy link
Contributor

It seems to only happen when passing an expression directly to left_on=

(data
  .join(data, 
      left_on=pl.coalesce('b','a'),
      right_on='a'
  )
  .select('a')
  .collect()
)
# PanicException: called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("a"))
(data
  .with_columns(left_on=pl.coalesce('b','a'))
  .join(data, 
      left_on='left_on',
      right_on='a'
  )
  .select('a')
  .collect()
)
# shape: (0, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ i64 │
# ╞═════╡
# └─────┘

@ritchie46
Copy link
Member

This is a bug in projection pushdown. Will get to this next week.

@madneuron
Copy link
Author

This is a bug in projection pushdown. Will get to this next week.

Thanks @ritchie46. This was my crude attempt to enable coalesce=True in this situation. Any reason not to have enabled properly?

@ritchie46
Copy link
Member

Thanks @ritchie46. This was my crude attempt to enable coalesce=True in this situation. Any reason not to have enabled properly?

I don't understand what you mean?

@madneuron
Copy link
Author

Thanks @ritchie46. This was my crude attempt to enable coalesce=True in this situation. Any reason not to have enabled properly?

I don't understand what you mean?

nevermind... Having thought it through, it doesn't make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants