Releases · bminixhofer/nlprule

20 Feb 12:36

bminixhofer

0.4.5

44871a1

Release 0.4.5

New features

A transform function in nlprule-build to transform binaries immediately after acquiring them. Suited for e. g. compressing the binaries before caching them.

Fixes

Require srx=^0.1.2 to include a patch for out of bounds access.

Assets 8

17 Feb 10:53

bminixhofer

0.4.4

b14f7d2

Release 0.4.4

Breaking changes

This is a patch release but there are some small breaking changes to the public API:

from_reader and new methods of the Tokenizer and Rules now return an nlprule::Error instead of bincode:Error.
tag_store and word_store methods of the Tagger are now private.

New features

The nlprule-build crate now has a postprocess method to allow e.g. compression of the produced binaries (#32, thanks @drahnr!).

Internal improvements

Newtypes for PosIdInt and WordIdInt to clarify use of ids in the tagger (#31).
Newtype for indices into the match graph (GraphId). All graph ids are validated at build-time now (also fixed an error where invalid graph ids in the XML files were ignored through this) (#31).
Reduced size of the English tokenizer through better serialization of the chunker. From 15MB (7.7MB gzipped) to 11MB (6.9MB gzipped).
Reduce allocations through making more use of iterators internally (#30). Improves speed but there is no significant benchmark improvement on my machine.
Improve error handling by propagating more errors in the compile module instead of panicking and better build-time validation. Reduces unwraps from ~80 to ~40.

Assets 8

12 Feb 14:32

bminixhofer

0.4.3

0ae4f93

Release 0.4.3

Breaking changes

nlprule does sentence segmentation internally now using srx. The Python API has changed, removing the SplitOn class and the *_sentence methods:

tokenizer = Tokenizer.load("en")
rules = Rules.load("en", tokenizer)

rules.correct("He wants that you send him an email.") # this takes an arbitrary text

new_from is now called from_reader in the Rust API (thanks @drahnr!)
Token.text and IncompleteToken.text are now called Token.sentence / IncompleteToken.sentence to avoid confusion with Token.word.text.
Tokenizer.tokenize is now private. Use Tokenizer.pipe instead (also does sentence segmentation).

New features

Support for Spanish (experimental).
A new multiword tagger improves tagging of e. g. named entities for English and Spanish.
Adds the nlprule-build crate which makes using the correct binaries in Rust easier (thanks @drahnr for the suggestion and discussion!)
Scripts and docs in build/README.md to make creating the nlprule build directories easier and more reproducible.
Full support for LanguageTool unifications.
Binary size of the Tokenizer improved a lot. Now roughly x6 smaller for German and x2 smaller for English.
New iterator helpers for Rules (thanks @drahnr!)
A method .sentencize on the Tokenizer which does only sentence segmentation and nothing else.

Assets 8

12 Feb 13:20

bminixhofer

0.4.0

330355e

Release 0.4.0

fix build.rs recommendation

Assets 2

17 Jan 10:09

bminixhofer

0.3.0

577482b

Release 0.3.0

BREAKING: suggestion.text is now more accurately called suggestion.replacements

Lots of speed improvements: NLPRule is now roughly 2.5x to 5x faster for German and English, respectively.

Rules have more information in the public API now: See #5

Assets 6

10 Jan 10:12

bminixhofer

0.2.2

2964bef

0.2.2

Python 3.9 support (fixes #7)

Assets 6

08 Jan 21:14

bminixhofer

0.2.1

fe56971

0.2.1

Fix precedence of Rule IDs over Rule Group IDs.

Assets 6

07 Jan 09:58

bminixhofer

0.2.0

51dd867

0.2.0

Updated to LT version 5.2.
Suggestions now have a message and source attribute (#5):

suggestions = rules.suggest_sentence("She was not been here since Monday.")
for s in suggestions:
  print(s.start, s.end, s.text, s.source, s.message)

# prints:
# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?

NLPRule is parallelized by default now. Parallelism can be turned off by setting the NLPRULE_PARALLELISM environment variable to false.

Assets 6

04 Jan 16:35

bminixhofer

0.1.9

2406d89

Release 0.1.9

Testing new distribution of binaries.

Assets 6

04 Jan 16:19

bminixhofer

0.1.8

1b08b8c

Release 0.1.8

Testing new distribution of binaries.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features

Fixes

Breaking changes

New features

Internal improvements

Breaking changes

New features

Releases: bminixhofer/nlprule

Release 0.4.5

New features

Fixes

Release 0.4.4

Breaking changes

New features

Internal improvements

Release 0.4.3

Breaking changes

New features

Release 0.4.0

Release 0.3.0

0.2.2

0.2.1

0.2.0

Release 0.1.9

Release 0.1.8