Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rules more inspectable #5

Closed
neoscopio opened this issue Jan 5, 2021 · 13 comments
Closed

Make rules more inspectable #5

neoscopio opened this issue Jan 5, 2021 · 13 comments
Assignees
Labels
enhancement New feature or request

Comments

@neoscopio
Copy link

It would be nice to return JSON (like LT HTTP server). It's frequent to not want a specific correction, some are even just suggestions.

@bminixhofer
Copy link
Owner

bminixhofer commented Jan 5, 2021

Hi, thanks for the feature request!

Returning JSON doesn't make sense to me - I assume you mean returning a list of suggestions with start, end, replacements and message?

This is already implemented to some degree (copied from the Readme) in v0.1.9:

suggestions = rules.suggest_sentence("She was not been here since Monday.")
for s in suggestions:
  print(s.start, s.end, s.text)
  
# prints:
# 4 16 ['was not', 'has not been']

On master the message and source is parsed too:

suggestions = rules.suggest_sentence("She was not been here since Monday.")
for s in suggestions:
  print(s.start, s.end, s.text, s.source, s.message)

# prints:
# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?

This will be part of v0.2.0 which I'll release soon. Let me know if that's what you meant.

@bminixhofer
Copy link
Owner

bminixhofer commented Jan 7, 2021

Release 0.2.0 is now out so the code in the sample above works now.

@neoscopio
Copy link
Author

neoscopio commented Jan 7, 2021

Great, thanks!
I did really mean a json object return. My use case is to use nprule as a library to my own code, so I can inform the user about the correction, including information about type (like grammar, style, wordiness, etc), examples of correct and incorrect use, and eventually links to rule justification. Just like LT Httpserver does. It could be used to implement a simple webapp, but also to integrate in other apps, like libreoffice or browser extensions.

@bminixhofer bminixhofer reopened this Jan 7, 2021
@bminixhofer
Copy link
Owner

Hi, I looked into setting __dict__ so it is easier to get all attributes as a dictionary but I'm not willing to add the extra complexity this would require (unless there's an easier way I'm missing).

Is there any reason something like this doesn't work for you:

import json

# ...

suggestions = rules.suggest_sentence("She was not been here since Monday.")

for s in suggestions:
    print(
        json.dumps(
            {
                "start": s.start,
                "end": s.end,
                "text": s.text,
                "source": s.source,
                "message": s.message,
            }
        )
    )

I believe that's good enough, unless you really care about never repeating yourself.

@neoscopio
Copy link
Author

neoscopio commented Jan 9, 2021

but that wouldn't give me:

  • Type of rule
  • Examples
  • Url of additional information on the rule
  • help message coded on the rule

I know nothing about rust, other than it's a cool name, but if it's OOP, then I would expect an object with all properties of a rule, and if that exists, then it would be rather easy to convert the rule to json and return that with the word, something like ruleO.serialize() ? Anyway, it was just a suggestion if you don't see the point, maybe someone else can contribute a pull request. Thanks.

@bminixhofer
Copy link
Owner

bminixhofer commented Jan 9, 2021

Hi, sorry, I was thrown of a bit by the "JSON" then. So you want suggestions to contain more information about the match. see below. That's a very valid issue and something I will work on. Point 4 you listed, "help message coded on the rule" actually already works in v0.2.0 via suggestion.message. I will add the rest.

Edit: Actually these things make more sense as attributes on a rule (as you suggested) so:

  1. I'll add a way to retrieve rules by ID
  2. Rules will have more information retrievable via the public API e. g. examples, category, etc.

@neoscopio
Copy link
Author

Cool, thank you, I'll be looking forward to that.

@bminixhofer
Copy link
Owner

Let's keep this issue open to remind me and thanks for bringing it up :)

@bminixhofer bminixhofer reopened this Jan 9, 2021
@bminixhofer bminixhofer self-assigned this Jan 9, 2021
@bminixhofer bminixhofer changed the title [Feature request] Option to return json Make rules more inspectable Jan 9, 2021
@bminixhofer bminixhofer added the enhancement New feature or request label Jan 9, 2021
@Theelx
Copy link

Theelx commented Jan 10, 2021

Hi, I looked into setting __dict__ so it is easier to get all attributes as a dictionary but I'm not willing to add the extra complexity this would require (unless there's an easier way I'm missing).

I ran into this problem a while back, if you feel comfortable enabling slots, this could work (slots provide a speed and memory boost, but you can't assign attributes to instances unless the underlying class has that attribute in slots):

    def __iter__(self):
        for attr in itertools.chain.from_iterable(getattr(cls, '__slots__', []) for cls in self.__class__.__mro__):
            yield attr, getattr(self, attr)

It might work for __dict__ also if you change some stuff though. If you call dict(<instance of this class>), it'll return every attr/value pair in slots. If an instance doesn't have every slot attribute defined though, you'll need to add a try/except AttributeError around the yield though.

@bminixhofer
Copy link
Owner

Thanks! That looks interesting. The thing is that I always have to look at that in the context of PyO3 as the Python bindings are written in Rust as well so how to set __slots__ and __dict__ depends on how PyO3 handles them. I'll look a bit closer into __slots__.

I'd definitely like dict(suggestion) to work. It's "nice to have" but not necessary though so currently optimizing for speed has higher priority.

@bminixhofer
Copy link
Owner

Just finished implementing this. This is the API:

suggestion = rules.suggest_sentence("She was not been here since Monday.")[0]

# .rule(..) finds a rule by id
rule = rules.rule(suggestion.source)

print(rule.url, rule.short, rule.name, rule.category_id, rule.category_name, rule.category_type)

for example in rule.examples:
    print(example.text, example.suggestion)

A more detailed example is in the unit tests:

def test_rules_inspectable(tokenizer_and_rules):
(tokenizer, rules) = tokenizer_and_rules
suggestion = rules.suggest("He was taken back by my response.")[0]
rule = rules.rule(suggestion.source)
assert rule.id == suggestion.source
# metadata of the rule itself
assert rule.short == "Commonly confused word"
assert rule.url == "https://www.merriam-webster.com/dictionary/take%20aback"
assert rule.id == "BACK_ABACK"
assert rule.name == "taken back (aback) by"
# category related metadata
assert rule.category_id == "CONFUSED_WORDS"
assert rule.category_name == "Commonly Confused Words"
assert rule.category_type == "misspelling"
# data related to rule examples
assert len(rule.examples) == 2
assert rule.examples[0].text == "He was totally taken back by my response."
assert rule.examples[0].suggestion is not None
assert (
rules.apply_suggestions(rule.examples[0].text, [rule.examples[0].suggestion])
== "He was totally taken aback by my response."
)
assert rule.examples[1].text == "He was totally taken a bag by my response."
assert rule.examples[1].suggestion is not None
assert (
rules.apply_suggestions(rule.examples[0].text, [rule.examples[0].suggestion])
== "He was totally taken aback by my response."
)

This will be part of release v0.3.0 which I'll release in a couple of days. NLPRule will also be roughly x4 faster for English and x2.5 faster for German with that Release :)

@Theelx
Copy link

Theelx commented Jan 16, 2021

Thanks! The speed boosts and additional functionality look super fun :)

@bminixhofer
Copy link
Owner

I just released v0.3.0 so the code above works now. I'll close this issue for now, let me know if I forgot anything in the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants