Skip to content

Releases: J535D165/recordlinkage

Release v0.16

20 Jul 13:00
b93d976
Compare
Choose a tag to compare

A new release of recordlinkage after a long time (too long, I'm sorry). This release bumps the minor version to 0.16. This version supports pandas 2 and pandas 1. It doesn't contain any structural changes or improvements to the API.

What's Changed

New Contributors

Full Changelog: v0.15...v0.16

Release v0.15 (19 Apr 2022)

19 Apr 14:00
Compare
Choose a tag to compare
  • Remove deprecated recordlinkage classes (#173)
  • Bump min Python version to 3.6, ideally 3.8+ (#171)
  • Bump min pandas version to >=1
  • Resolve deprecation warnings for numpy and pandas
  • Happy lint, sort imports, format code with yapf
  • Remove unnecessary np.sort in SNI algorithm (#141)
  • Fix bug for cosine and qgram string comparisons with threshold (#135)
  • Fix several typos in docs (#151)(#152)(#153)(#154)(#163)(#164)
  • Fix random indexer (#158)
  • Fix various deprecation warnings and broken docs build (#170)
  • Fix broken docs build due to pandas depr warnings (#169)
  • Fix broken build and removed warning messages (#168)
  • Update narrative
  • Replace Travis by Github Actions (#132)
  • Fix broken test NotFittedError
  • Fix bug in low memory random sampling and add more tests (#130)
  • Add extras_require to setup.py for deps management
  • Add banner to README and update title
  • Add Binder and Colab buttons at tutorials (#174)

Special thanks to Tomasz Waleń @twalen and other contributors for their work on this release.

Version 0.14 (1 Dec 2019)

01 Dec 15:54
Compare
Choose a tag to compare
  • Drop Python 2.7 and Python 3.4 support. (#91)
  • Upgrade minimal pandas version to 0.23.
  • Simplify the use of all cpus in parallel mode. (#102)
  • Store large example datasets in user home folder or use environment variable. Before, example datasets were stored in the package. (see issue #42) (#92)
  • Add support to write and read annotation files for recordlinkage ANNOTATOR. See the docs and https://github.com/J535D165/recordlinkage-annotator for more information.
  • Replace .labels by .codes for pandas.MultiIndex objects for newer versions of pandas (>0.24). (#103)
  • Fix totals for pandas.MultiIndex input on confusion matrix and accuracy metrics. (see issue #84) (#109)
  • Initialize Compare with (a list of) features (Bug). (#124)
  • Various updates in relation to deprecation warnings in third-party libraries such as sklearn, pandas and networkx.

Version 0.13.2 (27 Mar 2019)

27 Mar 21:51
Compare
Choose a tag to compare

Fix distribution problem.

Version 0.13 (15 Mar 2019)

15 Mar 15:07
702899e
Compare
Choose a tag to compare
resolve conflict with threshold and missing value (#85)

Closes #70

Version 0.11.2 (4 Jan 2018)

04 Jan 15:34
Compare
Choose a tag to compare
  • Minor installation improvement. Exclude unwanted files

Version 0.11.1 (4 Jan 2018)

04 Jan 15:26
Compare
Choose a tag to compare
  • Fix installation issue. Submodule 'preprocessing' was not added to the
    source distribution.

Version 0.11.0 (22 Dec 2017)

04 Jan 09:07
Compare
Choose a tag to compare
  • The submodule 'standardise' is renamed. The new name is 'preprocessing'.
    The submodule 'standardise' will get deprecated in a next version.
  • Deprecation errors were not visible for many users. In this version, the
    errors are better visible.
  • Improved and new logs for indexing, comparing and classification.
  • Faster comparing of string variables. Thanks Joel Becker.
  • Changes make it possible to pickle Compare and Index objects. This makes it
    easier to run code in parallel. Tests were added to ensure that pickling
    remains possible.
  • Important change. MultiIndex objects with many record pairs were split into
    pieces to lower memory usage. In this version, this automatic splitting is
    removed. Please split the data yourself.
  • Integer indexing. Blog post will follow on this.
  • The metrics submodule has changed heavily. This will break with the previous
    version.
  • repr() and str() will return informative information for index and compare
    objects.
  • It is possible to use abbreviations for string similarity methods. For example
    'jw' for the Jaro-Winkler method.
  • The FEBRL dataset loaders can now return the true links as a
    pandas.MultIndex for each FEBRL dataset. This option is disabled by default.
    See the FEBRL datasets for details.
  • Fix issue with automatic recognision of license on Github.
  • Various small improvements.

Note: In the next release, the Pairs class will get removed. Migrate now.

Version 0.10.1 (30 Aug 2017)

28 Dec 12:58
Compare
Choose a tag to compare
  • print statement in the geo compare algorithm removed.
  • String, numeric and geo compare functions now raise directly when an
    incorrect algorithm name is passed.
  • Fix unit test that failed on Python 2.7.

Version 0.10.0 (30 Aug 2017)

28 Dec 12:58
Compare
Choose a tag to compare
  • A new compare API. The new Compare class no longer takes the datasets and
    pairs as arguments. The actual computation is now performed when calling
    .compute(PAIRS, DF1, DF2). The documentation is updated as well, but
    still needs improvement.
  • Two new string similarity measures are added: Smith Waterman
    (smith_waterman) and Longest Common Substring (lcs). Thanks to Joel Becker
    and Jillian Anderson from the Networks Lab of the University of Waterloo.
  • Added and/or updated a large amount of unit tests.
  • Various small improvements.