-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic on polars.scan_parquet().filter().columns
#16147
Comments
Have you got a repro with a dummy file? |
No. Tried this but it works so I'd have to look a bit deeper what triggers the problem: # this works
df = polars.DataFrame({"col1": range(10)})
df.write_parquet("test.parquet")
lf = polars.scan_parquet("test.parquet")
lf.filter(polars.col("col1") > 3).columns The original parquet dataset is partitioned. And this fails: # prquet dataset structured as /mnt/data/schema_name/partition1/partition2/year/month.parquet
lf = polars.scan_parquet(os.sep.join(["/mnt/data",
"some_schema",
f"patition1={var1}",
f"patition2={var2}",
"*", "*.parquet"]))
lf.filter(polars.col("col1") > 123).columns |
@jmakov any chance you could try using polars 0.20.19 and see if your parquet file works with that version? I recently upgraded from that version to 0.20.25 and am now having the same issue you're seeing (using the same parquet files that worked with 0.20.19). Unfortunately, I'm having a hard time creating a MRE. |
@ATL2001 thanks for the tip. You're right, 0.20.19 works. I also had hard time investigating and recreating a MRE, don't have enough time for that. But at least we know now it's a regression. Thanks! |
Still present in version 0.20.29 |
There is a minimal repro here for a different issue: But it is also about partitioned datasets, and the same error. It may be the same underlying problem as described here. |
My repro for this seems to have been fixed on Not 100% sure if this is the case, but I believe this gets fixed by #16549 (notably the removal of Default::default() for the hive partition info. I have integration tests in my code which encountered this exact bug and it seems to have been fixed when I compiled main too. |
Yes, that's the case. |
Thanks everyone! I just upgraded to 0.20.31, and the panic is gone! 😀 |
Checks
Reproducible example
Log output
Issue description
Panic on
polars.scan_parquet().filter().columns
. Also why are you callingunwrap()
in production code?Expected behavior
Printout of columns
Installed versions
The text was updated successfully, but these errors were encountered: