Skip to content

Python Polars 1.13.0

Compare
Choose a tag to compare
@github-actions github-actions released this 12 Nov 12:19
7f0b3e0

🚀 Performance improvements

  • Improve DataFrame.sort().limit/top_k performance (#19731)
  • Improve cloud scan performance (#19728)
  • Fix quadratic 'with_columns' behavior (#19701)
  • Improve hive partition pruning with datetime predicates from SQL (#19680)
  • Allow for arbitrary skips in Parquet Dictionary Decoding (#19649)
  • Reorder conditions in is_leap_year (#19602)
  • Rechunk in DataFrame.rows if needed (#19628)
  • Dispatch Parquet Primitive PLAIN decoding to faster kernels when possible (#19611)
  • Use faster iteration in 'starts_with'/'ends_with' (#19583)
  • Branchless Parquet Prefiltering (#19190)
  • Reduce size of IdxVec from 24 -> 16 bytes (#19550)

✨ Enhancements

  • Try to support native SAP HANA driver via read_database (#19733)
  • Implement max/min methods for dtypes (#19494)
  • Improve n_chunks typing (#19727)
  • Improve hive partition pruning with datetime predicates from SQL (#19680)
  • Identify inefficient use of Python string removeprefix, removesuffix, and zfill in map_elements (#19672)
  • Automatically use boto3 / google-auth if installed when scanning cloud (#19677)
  • Identify inefficient use of Python string replace in map_elements (#19668)
  • Parallel IPC sink for the new streaming engine (#19622)
  • Add SQL support for RIGHT JOIN, fix an issue with wildcard aliasing (#19626)
  • Add show_graph to display a GraphViz plot for expressions (#19365)
  • Streamline use of predicates connected by & with IEJoin (join_where) (#19552)
  • Support use of is_between range predicate with IEJoin operations (join_where) (#19547)

🐞 Bug fixes

  • Use cls for to_python (#19726)
  • Fix validation for inner and left join when join_nulls unflaged (#19698)
  • SQL ELSE clause should be implicitly NULL when omitted (#19714)
  • Improve n_chunks typing (#19727)
  • Ensure NoDataError raised consistently between engines for Excel reads (#19712)
  • In group_by_dynamic, period and every were getting applied in reverse order for the window upper boundary (#19706)
  • Only allow list.to_struct to be elementwise when width is fixed (#19688)
  • Make Array arithmetic ops fully elementwise (#19682)
  • Address inconsistency with use of Python types in frame-level cast (#19657)
  • Update line-splitting logic in batched CSV reader (#19508)
  • Fix incorrect lazy schema for explode() in agg() (#19629)
  • Fix fill null types (#19656)
  • Fix filter incorrectly pushed past struct unnest when unnested column name matches upper column name (#19638)
  • Fix typing for SchemaDefinition (#19647)
  • Ensure mean_horizontal raises on non-numeric input (#19648)
  • Reorder conditions in is_leap_year (#19602)
  • Copy height in .vstack() for empty dataframes (#19641) (#19642)
  • Correct wildcard and input expansion for some more functions (#19588)
  • Allow .struct.with_fields inside list.eval (#19617)
  • Sortedness was incorrectly being preserved in dt.offset_by when offsetting by non-constant durations in the timezone-naive case (#19616)
  • Fix incorrect scan_parquet().with_row_index() with non-zero slice or with streaming collect (#19609)
  • Fix mask and validity confusion in Parquet String decoding (#19614)
  • Parquet decoding of nested dictionary values (#19605)
  • Do not attempt to load default credentials when credential_provider is given (#19589)
  • Fix gather len in group-by state (#19586)
  • Added input validation for explode operation in the array namespace (#19163)
  • Improve error message (#19546)
  • Fix predicate pushdown into inequality joins (#19582)
  • Correct categorical namespace error message (#19558)
  • Fix performance regression for sort/gather on list/array columns (#19564)
  • Ignore quoted newlines when skipping lines in CSV (#19543)
  • Incorrect gather for FixedSizeList with outer validity but no inner validities (#19489)
  • Make Duration parsing fallible and not panic (#19490)

📖 Documentation

  • Revise and rework user-guide/expressions (#19360)
  • Update Excel page of user guide to refer to fastexcel as the default engine (#19691)
  • Alter examples for round_sig_figs to make behaviour clearer (#19667)
  • Assorted fixes to Rust API docs (#19664)
  • Improve replace and replace_all docstring explanation of the "$" character with reference to capture groups (vs use as a literal) (#19529)
  • Add credential provider section and examples to user guide (#19487)
  • Fix various instances of repeated words in docs and comments (#19516)

📦 Build system

  • Bump Rust toolchain to nightly-2024-10-28 (#19492)

🛠️ Other improvements

  • Remove unused Excel code (#19710)
  • Use Column for the {try,}_apply_columns{_par,} functions on DataFrame (#19683)
  • Remove more @scalar-opt (#19666)
  • Move Series bitops to std::ops::Bit... (#19673)
  • Mark test_parquet.py test_dict_slices as slow (#19675)
  • Get Column into polars-expr (#19660)
  • Streamline internal SQL join condition processing (#19658)
  • Factor out logic for re-use by new streaming CSV source (#19637)
  • Configure grouped Dependabot updates (#19604)
  • Fix PyO3 error in CI (#19545)
  • Update nightly compiler version (#19590)
  • Added input validation for explode operation in the array namespace (#19163)
  • Fix lint (#19584)
  • Add a Column::Partitioned variant (#19557)
  • Move to fast-float2 (#19578)
  • Only run remote bench on rust changes (#19581)
  • Remove unsafe *_release functions (#19554)
  • Fix test_rolling_by_integer not using parameterized dtype (#19555)
  • Add mindebug-dev rust profile (#19524)
  • Add CI step to process benchmark results (#19530)
  • Add CI benchmark on merge (#19518)
  • Skip client check with env var (#19517)
  • Improve makefile build commands (#19498)

Thank you to all our contributors for making this release possible!
@3tilley, @HansBambel, @MarcoGorelli, @alexander-beedie, @barak1412, @braaannigan, @cmdlineluser, @coastalwhite, @corwinjoy, @dependabot, @dependabot[bot], @eitsupi, @janpipek, @jqnatividad, @letkemann, @max-muoto, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego and @wence-