Make rules more inspectable #5

neoscopio · 2021-01-05T16:04:53Z

It would be nice to return JSON (like LT HTTP server). It's frequent to not want a specific correction, some are even just suggestions.

bminixhofer · 2021-01-05T16:59:01Z

Hi, thanks for the feature request!

Returning JSON doesn't make sense to me - I assume you mean returning a list of suggestions with start, end, replacements and message?

This is already implemented to some degree (copied from the Readme) in v0.1.9:

suggestions = rules.suggest_sentence("She was not been here since Monday.")
for s in suggestions:
  print(s.start, s.end, s.text)
  
# prints:
# 4 16 ['was not', 'has not been']

On master the message and source is parsed too:

suggestions = rules.suggest_sentence("She was not been here since Monday.")
for s in suggestions:
  print(s.start, s.end, s.text, s.source, s.message)

# prints:
# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?

This will be part of v0.2.0 which I'll release soon. Let me know if that's what you meant.

bminixhofer · 2021-01-07T10:19:21Z

Release 0.2.0 is now out so the code in the sample above works now.

neoscopio · 2021-01-07T12:17:00Z

Great, thanks!
I did really mean a json object return. My use case is to use nprule as a library to my own code, so I can inform the user about the correction, including information about type (like grammar, style, wordiness, etc), examples of correct and incorrect use, and eventually links to rule justification. Just like LT Httpserver does. It could be used to implement a simple webapp, but also to integrate in other apps, like libreoffice or browser extensions.

bminixhofer · 2021-01-07T19:17:56Z

Hi, I looked into setting __dict__ so it is easier to get all attributes as a dictionary but I'm not willing to add the extra complexity this would require (unless there's an easier way I'm missing).

Is there any reason something like this doesn't work for you:

import json

# ...

suggestions = rules.suggest_sentence("She was not been here since Monday.")

for s in suggestions:
    print(
        json.dumps(
            {
                "start": s.start,
                "end": s.end,
                "text": s.text,
                "source": s.source,
                "message": s.message,
            }
        )
    )

I believe that's good enough, unless you really care about never repeating yourself.

neoscopio · 2021-01-09T16:43:26Z

but that wouldn't give me:

Type of rule
Examples
Url of additional information on the rule
help message coded on the rule

I know nothing about rust, other than it's a cool name, but if it's OOP, then I would expect an object with all properties of a rule, and if that exists, then it would be rather easy to convert the rule to json and return that with the word, something like ruleO.serialize() ? Anyway, it was just a suggestion if you don't see the point, maybe someone else can contribute a pull request. Thanks.

bminixhofer · 2021-01-09T16:48:40Z

Hi, sorry, I was thrown of a bit by the "JSON" then. ~~So you want suggestions to contain more information about the match.~~ see below. That's a very valid issue and something I will work on. Point 4 you listed, "help message coded on the rule" actually already works in v0.2.0 via suggestion.message. I will add the rest.

Edit: Actually these things make more sense as attributes on a rule (as you suggested) so:

I'll add a way to retrieve rules by ID
Rules will have more information retrievable via the public API e. g. examples, category, etc.

neoscopio · 2021-01-09T17:00:13Z

Cool, thank you, I'll be looking forward to that.

bminixhofer · 2021-01-09T17:34:08Z

Let's keep this issue open to remind me and thanks for bringing it up :)

Theelx · 2021-01-10T00:12:00Z

Hi, I looked into setting __dict__ so it is easier to get all attributes as a dictionary but I'm not willing to add the extra complexity this would require (unless there's an easier way I'm missing).

I ran into this problem a while back, if you feel comfortable enabling slots, this could work (slots provide a speed and memory boost, but you can't assign attributes to instances unless the underlying class has that attribute in slots):

    def __iter__(self):
        for attr in itertools.chain.from_iterable(getattr(cls, '__slots__', []) for cls in self.__class__.__mro__):
            yield attr, getattr(self, attr)

It might work for __dict__ also if you change some stuff though. If you call dict(<instance of this class>), it'll return every attr/value pair in slots. If an instance doesn't have every slot attribute defined though, you'll need to add a try/except AttributeError around the yield though.

bminixhofer · 2021-01-10T09:51:56Z

Thanks! That looks interesting. The thing is that I always have to look at that in the context of PyO3 as the Python bindings are written in Rust as well so how to set __slots__ and __dict__ depends on how PyO3 handles them. I'll look a bit closer into __slots__.

I'd definitely like dict(suggestion) to work. It's "nice to have" but not necessary though so currently optimizing for speed has higher priority.

…ule metadata (#5)

bminixhofer · 2021-01-16T16:10:09Z

Just finished implementing this. This is the API:

suggestion = rules.suggest_sentence("She was not been here since Monday.")[0]

# .rule(..) finds a rule by id
rule = rules.rule(suggestion.source)

print(rule.url, rule.short, rule.name, rule.category_id, rule.category_name, rule.category_type)

for example in rule.examples:
    print(example.text, example.suggestion)

A more detailed example is in the unit tests:

nlprule/bindings/python/test.py

Lines 54 to 89 in a9b7f40

    
           def test_rules_inspectable(tokenizer_and_rules): 
        
               (tokenizer, rules) = tokenizer_and_rules 
        
               suggestion = rules.suggest("He was taken back by my response.")[0] 
        
               rule = rules.rule(suggestion.source) 
        
               assert rule.id == suggestion.source 
        
               # metadata of the rule itself 
        
               assert rule.short == "Commonly confused word" 
        
               assert rule.url == "https://www.merriam-webster.com/dictionary/take%20aback" 
        
               assert rule.id == "BACK_ABACK" 
        
               assert rule.name == "taken back (aback) by" 
        
               # category related metadata 
        
               assert rule.category_id == "CONFUSED_WORDS" 
        
               assert rule.category_name == "Commonly Confused Words" 
        
               assert rule.category_type == "misspelling" 
        
               # data related to rule examples 
        
               assert len(rule.examples) == 2 
        
               assert rule.examples[0].text == "He was totally taken back by my response." 
        
               assert rule.examples[0].suggestion is not None 
        
               assert ( 
        
                   rules.apply_suggestions(rule.examples[0].text, [rule.examples[0].suggestion]) 
        
                   == "He was totally taken aback by my response." 
        
               ) 
        
               assert rule.examples[1].text == "He was totally taken a bag by my response." 
        
               assert rule.examples[1].suggestion is not None 
        
               assert ( 
        
                   rules.apply_suggestions(rule.examples[0].text, [rule.examples[0].suggestion]) 
        
                   == "He was totally taken aback by my response." 
        
               )

This will be part of release v0.3.0 which I'll release in a couple of days. NLPRule will also be roughly x4 faster for English and x2.5 faster for German with that Release :)

Theelx · 2021-01-16T16:14:01Z

Thanks! The speed boosts and additional functionality look super fun :)

bminixhofer · 2021-01-17T10:54:42Z

I just released v0.3.0 so the code above works now. I'll close this issue for now, let me know if I forgot anything in the API.

bminixhofer closed this as completed Jan 7, 2021

bminixhofer reopened this Jan 7, 2021

neoscopio closed this as completed Jan 9, 2021

bminixhofer reopened this Jan 9, 2021

bminixhofer self-assigned this Jan 9, 2021

bminixhofer changed the title ~~[Feature request] Option to return json~~ Make rules more inspectable Jan 9, 2021

bminixhofer added the enhancement New feature or request label Jan 9, 2021

bminixhofer added a commit that referenced this issue Jan 16, 2021

make rules more inspectable (#5), add Python tests

7057e90

bminixhofer added a commit that referenced this issue Jan 16, 2021

more cleanly propagate group / category information, add additional r…

a9b7f40

…ule metadata (#5)

bminixhofer closed this as completed Jan 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make rules more inspectable #5

Make rules more inspectable #5

neoscopio commented Jan 5, 2021

bminixhofer commented Jan 5, 2021 •

edited

Loading

bminixhofer commented Jan 7, 2021 •

edited

Loading

neoscopio commented Jan 7, 2021 •

edited

Loading

bminixhofer commented Jan 7, 2021

neoscopio commented Jan 9, 2021 •

edited by bminixhofer

Loading

bminixhofer commented Jan 9, 2021 •

edited

Loading

neoscopio commented Jan 9, 2021

bminixhofer commented Jan 9, 2021

Theelx commented Jan 10, 2021

bminixhofer commented Jan 10, 2021

bminixhofer commented Jan 16, 2021

Theelx commented Jan 16, 2021

bminixhofer commented Jan 17, 2021

Make rules more inspectable #5

Make rules more inspectable #5

Comments

neoscopio commented Jan 5, 2021

bminixhofer commented Jan 5, 2021 • edited Loading

bminixhofer commented Jan 7, 2021 • edited Loading

neoscopio commented Jan 7, 2021 • edited Loading

bminixhofer commented Jan 7, 2021

neoscopio commented Jan 9, 2021 • edited by bminixhofer Loading

bminixhofer commented Jan 9, 2021 • edited Loading

neoscopio commented Jan 9, 2021

bminixhofer commented Jan 9, 2021

Theelx commented Jan 10, 2021

bminixhofer commented Jan 10, 2021

bminixhofer commented Jan 16, 2021

Theelx commented Jan 16, 2021

bminixhofer commented Jan 17, 2021

bminixhofer commented Jan 5, 2021 •

edited

Loading

bminixhofer commented Jan 7, 2021 •

edited

Loading

neoscopio commented Jan 7, 2021 •

edited

Loading

neoscopio commented Jan 9, 2021 •

edited by bminixhofer

Loading

bminixhofer commented Jan 9, 2021 •

edited

Loading