Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lower_ and upper_ properties and map fn to Span class #883

Merged
merged 3 commits into from
Mar 16, 2017
Merged

Add lower_ and upper_ properties and map fn to Span class #883

merged 3 commits into from
Mar 16, 2017

Conversation

ericzhao28
Copy link
Contributor

Added lower_ and upper_ properties as described in #669.
I also added a generic map function to execute on the string representations of each token in a span.

Types of changes

I added some simple features via very minor additions to spacy/tokens/span.pyx, and added corresponding tests to spacy/tests/spans/test_span.py.

Checklist:

  • My change requires a change to spaCy's documentation. (Not sure about this, sorry)
  • I have updated the documentation accordingly. (Should I do so in this case?)
  • I have added tests to cover my changes.
  • All new and existing tests passed.

This is my first commit to a major open-source project - please let me know if I should fix/add anything, and apologies ahead of time in case I somehow break something.

  • Eric

Copy link
Member

@honnibal honnibal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I'm not really convinced by the mapStr function. There are two cosmetic things, but I'm also leaning towards saying it's not necessary.

I'd like to keep the bar for putting things in the interface relatively high. I think the current:

''.join([my_func(token.text) for token in span])

Isn't difficult to read or write. If we needed to do something about this, I'd favour having a span.texts attribute. Then we can do:

map(my_func, span.texts)

Which is to me much better than:

span.map_str(my_func)

I think that it's even better to not have the span.texts attribute, though. If we don't have this, then the user writes:

''.join([my_func(token.text) for token in span])

And this works whether Span is a Doc, list, tuple etc -- any container.

If we do decide to have the mapStr method, we need:

  • Rename: Probably map_str or map_text
  • The .string attribute is deprecated. The .text attribute is the preferred one.
  • The Doc object will need the method as well.

@ericzhao28
Copy link
Contributor Author

Hi,
I removed mapStr and pushed that commit with only upper_ and lower_. I guess mapStr is a bit extraneous.

  • Eric

@honnibal honnibal merged commit 28bb546 into explosion:master Mar 16, 2017
@honnibal
Copy link
Member

Thanks!

@ines ines added the enhancement Feature requests and improvements label Sep 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants