Releases: J535D165/recordlinkage
Releases · J535D165/recordlinkage
Release v0.16
A new release of recordlinkage
after a long time (too long, I'm sorry). This release bumps the minor version to 0.16. This version supports pandas
2 and pandas
1. It doesn't contain any structural changes or improvements to the API.
What's Changed
- Fix typo by @havardox in #184
- Fix usage examples by @martinhohoff in #190
- Fix links by @andyjessen in #186
- add threshold None and label docstrings for String by @davidggphy in #189
- Add support for pandas==2 by @J535D165 in #192
- Replace setup.py by pyproject.toml by @J535D165 in #195
- Lint with Ruff and format with Black by @J535D165 in #196
- Update CI docs generation and CI pipeline by @J535D165 in #197
- Update the docs CI pipeline by @J535D165 in #198
- Add pre-commit hooks by @J535D165 in #199
New Contributors
- @havardox made their first contribution in #184
- @martinhohoff made their first contribution in #190
- @andyjessen made their first contribution in #186
- @davidggphy made their first contribution in #189
Full Changelog: v0.15...v0.16
Release v0.15 (19 Apr 2022)
- Remove deprecated recordlinkage classes (#173)
- Bump min Python version to 3.6, ideally 3.8+ (#171)
- Bump min pandas version to >=1
- Resolve deprecation warnings for numpy and pandas
- Happy lint, sort imports, format code with yapf
- Remove unnecessary np.sort in SNI algorithm (#141)
- Fix bug for cosine and qgram string comparisons with threshold (#135)
- Fix several typos in docs (#151)(#152)(#153)(#154)(#163)(#164)
- Fix random indexer (#158)
- Fix various deprecation warnings and broken docs build (#170)
- Fix broken docs build due to pandas depr warnings (#169)
- Fix broken build and removed warning messages (#168)
- Update narrative
- Replace Travis by Github Actions (#132)
- Fix broken test NotFittedError
- Fix bug in low memory random sampling and add more tests (#130)
- Add extras_require to setup.py for deps management
- Add banner to README and update title
- Add Binder and Colab buttons at tutorials (#174)
Special thanks to Tomasz Waleń @twalen and other contributors for their work on this release.
Version 0.14 (1 Dec 2019)
- Drop Python 2.7 and Python 3.4 support. (#91)
- Upgrade minimal pandas version to 0.23.
- Simplify the use of all cpus in parallel mode. (#102)
- Store large example datasets in user home folder or use environment variable. Before, example datasets were stored in the package. (see issue #42) (#92)
- Add support to write and read annotation files for recordlinkage ANNOTATOR. See the docs and https://github.com/J535D165/recordlinkage-annotator for more information.
- Replace
.labels
by.codes
forpandas.MultiIndex
objects for newer versions of pandas (>0.24). (#103) - Fix totals for pandas.MultiIndex input on confusion matrix and accuracy metrics. (see issue #84) (#109)
- Initialize Compare with (a list of) features (Bug). (#124)
- Various updates in relation to deprecation warnings in third-party libraries such as sklearn, pandas and networkx.
Version 0.13.2 (27 Mar 2019)
Fix distribution problem.
Version 0.13 (15 Mar 2019)
resolve conflict with threshold and missing value (#85) Closes #70
Version 0.11.2 (4 Jan 2018)
- Minor installation improvement. Exclude unwanted files
Version 0.11.1 (4 Jan 2018)
- Fix installation issue. Submodule 'preprocessing' was not added to the
source distribution.
Version 0.11.0 (22 Dec 2017)
- The submodule 'standardise' is renamed. The new name is 'preprocessing'.
The submodule 'standardise' will get deprecated in a next version. - Deprecation errors were not visible for many users. In this version, the
errors are better visible. - Improved and new logs for indexing, comparing and classification.
- Faster comparing of string variables. Thanks Joel Becker.
- Changes make it possible to pickle Compare and Index objects. This makes it
easier to run code in parallel. Tests were added to ensure that pickling
remains possible. - Important change. MultiIndex objects with many record pairs were split into
pieces to lower memory usage. In this version, this automatic splitting is
removed. Please split the data yourself. - Integer indexing. Blog post will follow on this.
- The metrics submodule has changed heavily. This will break with the previous
version. - repr() and str() will return informative information for index and compare
objects. - It is possible to use abbreviations for string similarity methods. For example
'jw' for the Jaro-Winkler method. - The FEBRL dataset loaders can now return the true links as a
pandas.MultIndex for each FEBRL dataset. This option is disabled by default.
See the FEBRL datasets for details. - Fix issue with automatic recognision of license on Github.
- Various small improvements.
Note: In the next release, the Pairs class will get removed. Migrate now.
Version 0.10.1 (30 Aug 2017)
- print statement in the geo compare algorithm removed.
- String, numeric and geo compare functions now raise directly when an
incorrect algorithm name is passed. - Fix unit test that failed on Python 2.7.
Version 0.10.0 (30 Aug 2017)
- A new compare API. The new Compare class no longer takes the datasets and
pairs as arguments. The actual computation is now performed when calling
.compute(PAIRS, DF1, DF2)
. The documentation is updated as well, but
still needs improvement. - Two new string similarity measures are added: Smith Waterman
(smith_waterman) and Longest Common Substring (lcs). Thanks to Joel Becker
and Jillian Anderson from the Networks Lab of the University of Waterloo. - Added and/or updated a large amount of unit tests.
- Various small improvements.