Releases: pola-rs/polars
Python Polars 0.15.6
🐞 Bug fixes
🛠️ Other improvements
- remove unused
cmake-rs
patch (#5794)
Thank you to all our contributors for making this release possible!
@OneRaynyDay, @messense, @ritchie46 and @universalmind303
Python Polars 0.15.3
🚀 Performance improvements
- set_sorted flag when creating from literal (#5728)
- use sorted fast path in streaming groupby (#5727)
✨ Enhancements
- push down predicates to pyarrow datasets (#5780)
- Support for reading delta lake tables (#5761)
- Add DataFrame.glimpse() (#5622)
- allow expression as quantile input (#5751)
- accept expression in str.extract_all (#5742)
- tz-aware strptime (#5736)
- lazy diagonal concat. (#5647)
- to_struct add upper_bound (#5714)
🐞 Bug fixes
- fix(rust, python) Summation on empty series evaluates to
Some(0)
(#5773) - empty concat utf8 (#5768)
- projection pushdown with union and asof join (#5763)
- check null values in asof_join + groupby (#5756)
- fix generic streaming groupby on logical types (#5752)
- fix date_range on expressions (#5750)
- fix dtypes in join_asof_by (#5746)
- fix group order in binary aggregation (#5744)
- implement min/max aggregation for utf8 in groupby (#5737)
- fix all_null/sorted into_groups panic (#5733)
- address several edge-cases found when asserting NaN equality (#5732)
- asof join 'by', 'forward' combination (#5720)
🛠️ Other improvements
- add DataFrame.pearson_corr to reference (#5772)
- Parse fixed timezone offsets without pytz (#5769)
- chore(rust,python) Change allow_streaming to streaming (#5747)
- Remove pyarrow nightlies requirement. (#5719)
- fix incorrect accepted type in df.write_csv (#5715)
Thank you to all our contributors for making this release possible!
@AnatolyBuga, @MarcoGorelli, @alexander-beedie, @andrewpollack, @braaannigan, @chitralverma, @ghuls, @ritchie46, @sa- and @zundertj
Python Polars 0.15.2
🚀 Performance improvements
- ensure fast_explode propagates (#5676)
✨ Enhancements
- Series.get_chunks (#5701)
- inversely scale chunk_size with thread count in s… (#5699)
- add streaming minmax (#5693)
- Support large page sizes on aarch64 linux builds (#5694)
- improve dynamic inference of anyvalues and structs (#5690)
- support is_in for boolean dtype (#5682)
- add notebook html repr for Series (#5653)
🐞 Bug fixes
- fix pivot on floating point indexes (#5704)
- fix arange with column/literal input (#5703)
- fix double projection that leads to uneven union d… (#5700)
- Fix Series -> Expr dispatch for @Property methods (#5689)
- fix asof join schema (#5686)
- fix owned arithmetic schema (#5685)
- take glob into account in scan_csv 'with_schema_mo… (#5683)
- fix boolean schema in agg_max/min (#5678)
- fix boolean arg-max if all equal (#5680)
- respect python objects read method even if filename is f… (#5677)
- Fix
DataFrame.n_chunks
return type (#5650)
🛠️ Other improvements
- Parametrize
test_parquet_datetime
(#5696) - Function and lazy function doctrings (#5657)
- Fix formatting (#5658)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @ankane, @braaannigan, @ghais, @ghuls, @jjerphan, @pickfire, @ritchie46, @stinodego and @zundertj
Python Polars 0.15.1
⚠️ Breaking changes
- Update
Expr.sample
signature and change random seeding (#4648) - rollup breaking changes (#5602)
- iso weekday (#5598)
- Change
null_equal
default toTrue
forSeries.series_equal
(#5051) - rollup breaking changes (#5602)
🚀 Performance improvements
- fix quadratic time complexity of groupby in stream… (#5614)
- Improve performance of indexing operations on Series. (#5610)
- Aggregate projection pushdown (#5556)
✨ Enhancements
- add a cache to strptime (#5628)
- add nearest interpolation strategy (#5626)
- Update
Expr.sample
signature and change random seeding (#4648) - Change
null_equal
default toTrue
forSeries.series_equal
(#5051) - make cast recursive (#5596)
- add arg_min/arg_max for series of dtype boolean (#5592)
🐞 Bug fixes
- early error on duplicate names in streaming groupby (#5638)
- fix streaming groupby aggregate types (#5636)
- convert panic to err in concat_list (#5637)
- fix dot diagram of single nodes (#5624)
- fix dynamic struct inference (#5619)
- tz-aware filtering (#5603)
- keep dtype when eval on empty list (#5597)
- fix ternary with list output on empty frame (#5595)
- fix tz-awareness of truncate (#5591)
- check chunks before doing chunked_id join optimiza… (#5589)
- invert cast_time_zone conversion (#5587)
- asof join ensure join column is not dropped when '… (#5585)
🛠️ Other improvements
- Remaining docstring examples for frame and lazyframe (#5630)
- use xxhash3 for string types (#5617)
- only trigger build.rs file if that file itself has cha… (#5618)
- iso weekday (#5598)
- Merge release workflows (#5564)
- Fix broken lint workflow (#5584)
Thank you to all our contributors for making this release possible!
@Kuhlwein, @braaannigan, @ghuls, @matteosantama, @ritchie46 and @stinodego
Python Polars 0.14.31
🚀 Performance improvements
- improve streaming primitve groupby (#5575)
- vectorize integer vec-hash by using very simple, … (#5572)
✨ Enhancements
- prefer streaming groupby if partitionable (#5580)
🐞 Bug fixes
- fix ub due to invalid dtype on splitting dfs (#5579)
🛠️ Other improvements
- Remove old Python changelog file (#5577)
- namespace registration docs update (#5565)
- Improve contributing guide (#5558)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @ritchie46 and @stinodego
Python Polars 0.14.29
🚀 Performance improvements
- specialized utf8 groupby in streaming (#5535)
✨ Enhancements
- add dataframe.pearson_corr (#5533)
- support namespace registration (#5531)
- make map_alias fallible (#5532)
- pl.min & pl.max accept wildcard similar to pl.sum (#5511)
- additional support for using
timedelta
with duration-type arguments (#5487)
🐞 Bug fixes
- fix(rust, python); fix projection pushdown in asof joins (#5542)
- streaming hstack allow duplicates (#5538)
- fix streaming empty join panic (#5534)
- fix duplicate caches in cse and prevent quadratic … (#5528)
- allow appending categoricals that are all null (#5526)
- tz-aware strftime (#5525)
- make 'truncate' tz-aware (#5522)
- fix coalesce expreession expansion (#5521)
- fix nested aggregatin in when then and window expr… (#5520)
- fix sort_by expression if groups already aggregated (#5518)
- fix bug in batched parquet reader that dropped dfs… (#5506)
- preserve
Series
name when exporting topandas
(#5498) - Refactor is_between (#5491)
- fix bugs in skew and kurtosis (#5484)
🛠️ Other improvements
Thank you to all our contributors for making this release possible!
@alexander-beedie, @braaannigan, @ghuls, @ritchie46, @sorhawell and @zundertj
Python Polars 0.14.27
✨ Enhancements
- additional autocomplete affordances for
IPython
users (#5477) - make streaming work with multiple sinks in a sing… (#5474)
- add streaming slice operation (#5466)
- run partial streaming queries (#5464)
- streaming left joins (#5456)
- file statistics so we only (try to) keep smallest table in memory (#5454)
- streaming inner joins. (#5400)
🐞 Bug fixes
- compute correct offset for streaming join on multi… (#5479)
- return error on invalid sortby expression (#5478)
- use json for expr pickle (#5476)
- improved namespace/accessor behaviour (resolves VSCode autocomplete issue) (#5469)
- further improved lazy loading (#5459)
- fix for categorical inserts from row-oriented data (#5462)
- use of
fill_null
with temporal literals (#5440)
🛠️ Other improvements
- don't panic if part of query cannot run strea… (#5458)
- add build_info() to the API doc (#5442)
- Improved structure for
DataFrame
andLazyFrame
API docs, misc design improvements (#5433)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @dannyvankooten, @ritchie46, @s1ck, @slonik-az, @stinodego and @universalmind303
Python Polars 0.14.26
✨ Enhancements
- build_info() provides detailed information how polars was built (#5423)
- add missing
width
property toLazyFrame
(#5431) - enhanced
Series.dot
method and related interop (#5428) - allow regex and wildcard in groupby (#5425)
- support
DataFrame
init from generators (#5424) - support
Series
init from generator (#5411)
🐞 Bug fixes
- fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
- properly handle json with unclosed strings (#5427)
- fix null poisoning in rank operation (#5417)
- correct expr::diff dtype for temporal columns (#5416)
- fix cse for nested caches (#5412)
- don't set sorted flag in argsort (#5410)
🛠️ Other improvements
Thank you to all our contributors for making this release possible!
@CalOmnie, @alexander-beedie, @ghuls, @ritchie46, @slonik-az, @stinodego and @universalmind303
Python Polars 0.14.25
✨ Enhancements
- 30x speedup initialising
Series
from pythonrange
object (#5397) - r-associative support for commutative
DataFrame
operators (#5394) - pl.from_epoch function (#5330)
- Streaming joins architecture and Cross join implementation. (#5339)
- enable frame init from sequence of pandas series, and improve lazy typechecks (handle subclasses) (#5383)
- add support for am/pm notation in parse_dates read_csv (#5373)
- add reduce/cumreduce expression as an easier fold (#5364)
🐞 Bug fixes
- explicit nan comparison in min/max agg (#5403)
- lazy proxy module does not require global registration (#5390)
- Correct CSV row indexing (#5385)
🛠️ Other improvements
- Docstrings for frame, lazyframe and time series (#5398)
- add integrated support for copying API examples, and auto-parallelise docs build (#5393)
- improve rendering of API docs type signatures, mark PivotOps as deprecated, misc tidy-ups (#5388)
- Expression docstrings (#5377)
- minor navbar improvements; adds discord and twitter links, fixes github icon (#5379)
- improve structure of sphinx-generated API docs (#5376)
- Add with_time_zone to reference guide (#5369)
Thank you to all our contributors for making this release possible!
@YuRiTan, @alexander-beedie, @braaannigan, @owrior, @ritchie46 and @zundertj
Rust Polars 0.25.0
Most notable mention this release is the start of Out Of Core support in polars, meaning we are able to process larger than RAM datasets. This is currently supported for parts of queries that read from csv
or parquet
and are limited to select
, filter
, and groupby
operations. Many more operations will follow in next releases.
See #5139 (comment) where we were able to process a 80GB dataset on a laptop with only 16GB RAM.
Thanks to everyone who contributed to another release! 🙌
⚠️ Breaking changes
- rename expand_at_index -> new_from_index (#5259)
🚀 Performance improvements
- lower contention in out of core filter (#5311)
- improve pivot performance by using faster series… (#5172)
- improve streaming performance (~15%) (#5170)
- don't block projection pushdown on unnest (#5123)
- more conservative JIT sort settings (#5080)
- sort and unsort join key if other side is sorted (#5069)
- do not rechunk left joins (#5066)
- Prune unneeded projections (#5032)
- Improve predicate pushdown + with_columns (#5029)
- Don't execute unused with_column expressions (#5026)
✨ Enhancements
- shrink_type expression (#5351)
- tz_localize expression (#5340)
- accept expr in arr.get (#5337)
- Implement forward strategy in groupby join_asof (#5335)
- improve dynamic inference of struct types (#5297)
- Add newline to Aggregate..FROM describe_optimization_plan (#5253)
- date_range expression (#5267)
- show expression where error originated if raised … (#5263)
- improve error msg if window expressions length do… (#5262)
- Add round for date and datetime (#5153)
- new
n_chars
functionality for utf8 strings (#5252) - added new
Config
formatting optionset_tbl_column_data_type_inline
, fixed reading of env vars, improved interaction between formatting options (#5243) - make date_range timezone aware (#5234)
- Rust functions for typed JsonPath implementation (#5140)
- allow polars Config options to be serialised/shared, and more easily unset (#5219)
- batched csv reader (#5212)
- accept expressions in arr.slice (#5191)
- is_sorted aggregation fast path for Utf8Chunked (#5184)
- hybrid streaming query engine (#5139)
- add binary dtype (#5122)
- improve function expansion (#5110)
- add struct arithmetics (#5107)
- add cumfold/cumsum expression (#5103)
- error on invalid asof join inputs (#5100)
- small plan and profile chart improvements (#5067)
- Initial implementation of histogram algorithm (#4752)
🐞 Bug fixes
- unnest only pushdown column if there are projections (#5360)
- block is_null predicate in asof join (#5358)
- ensure that no-projection is seen as select all in… (#5356)
- resolve duplicated column names in pivot (#5349)
- fix serde of expression (pickle) (#5333)
- don't set auto-explode in apply_multiple (#5265)
- export anonymousscan in lazy prelude (#5295)
- fix explicit list + sort aggregation in groupby co… (#5317)
- fix sort-merge dispatch of utf8 (#5315)
- properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)
- don't block non matching groups in binary expression (#5273)
- fix logical type of nested take (#5271)
- tag IntoSeries trait as unsafe (#5258)
- include single null value in global cat builder (#5254)
- include slice in sort fast path (#5247)
- determine supertype of datetimes with timezones an… (#5240)
- fix groupby dynamic truncate for > days resolution (#5235)
- set timezone on groupby_dynamic boundaries (#5233)
- fix incorrect duration dtype (#5226)
- set string cache if lazy schema contains categorical (#5225)
- fix pipeline dtypes (#5224)
- fix asof_join schema (#5213)
- fix single thread loop if schema lenght is off by 1 (#5210)
- improve numeric stability of rolling_variance (#5207)
- fix overflow in partitioned groupby mean of int32/… (#5204)
- don't allow categorical append that is not under s… (#5195)
- include offset in arr.get (#5193)
- fix rolling_float in case closure returns None (#5180)
- Implement missing
extract
conversion forTime
datatype (#5161) - implement missing conversion to python
time
object (#5152) - microsecond noise on
date
>>time
cast (add00:00:00
fast-path) (#5149) - wrong operator mapped for LtEq (#5120)
- unique include null (#5112)
- don't recurse assign uniuns as it SO > 5k files (#5098)
- block projection pushdown on unnest (#5093)
- projection_node always do projection locally if no… (#5090)
- fix iso_year for Date dtype (#5074)
- fix bug in unneeded projection pruning (#5071)
- Improve printing controls of DataFrame and Series (#5047)
- Double projections should be checked on input schema (#5058)
- Apply flat overlapping row groups when possible (#5039)
- Ensure all predicates use same key function when inserting… (#5034)
- Only consider dt series equal if they have the same tz (#5025)
- Special-case
ewm_mean(alpha=1)
(#5019) - Time zone conversion bug (NY -> UTC works, UTC -> NY doesn't) (#5014)
- Fix timezone cast (#5016)
🛠️ Other improvements
- update to rustc to nightly-2022-10-24 (#5312)
- update ahash and add nightly features of hashbrown (#5310)
- Update comfy-table and memchr. (#5276)
- rename expand_at_index -> new_from_index (#5259)
- ensure streaming groupby take slice into account (#5178)
- move polars-sql under polars folder (#5176)
- remove aggregate pushdown optimization (#5173)
- relax sync requirement on Executor trait impls (#5142)
- Get rid of unnecessary check in SplitLines iterator (#5141)
- Constant instead of literal (#5088)
- Use
release-drafter
to draft releases with changelogs (#5033) - Fix docs by activating docfg feature (#5028)
- Split up
polars-lazy
crate. (#5020)
Thank you to all our contributors for making this release possible!
@AlecZorab, @YuRiTan, @alexander-beedie, @cjermain, @dannyvankooten, @dpatton-gr, @egorchakov, @ghuls, @hpux735, @matteosantama, @mcrumiller, @owrior, @ritchie46, @slonik-az, @sorhawell, @stinodego, @thatlittleboy, @universalmind303 and @zundertj