-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove __getitem__
code duplication in gensim.models.phrases
#2206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, thanks @jenishah. Please fix mentioned issues & make sure that all CI passed successfully (now all ok except flake8 check in Travis).
Thanks for the review @menshikh-iv . This was my first PR, and I need to learn a lot about proper documentation. Your reviews were a great help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jenishah only small things missing until I can merge this, please finalize PR, good work 👍
gensim/models/phrases.py
Outdated
@@ -232,6 +232,51 @@ def load(cls, *args, **kwargs): | |||
return model | |||
|
|||
|
|||
def _sentence2token(phrase_class, sentence): | |||
""" returns tokens after from a sentence by joining the phrases with delimiter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in form "Do X", not "does X"
gensim/models/phrases.py
Outdated
|
||
Returns | ||
------- | ||
{list of str, :class:`gensim.interfaces.TransformedCorpus`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed ~
for TransformedCorpus
(as in other examples)
{list of str, :class:`gensim.interfaces.TransformedCorpus`} | ||
`sentence` with detected phrase bigrams merged together, or a streamed corpus of such sentences | ||
if the input was a corpus. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty line at the end of docstring
Nice work, thanks @jenishah, congratz with first contribution and happy hacktoberfest 🌟 |
__getitem__
code duplication in gensim.models.phrases
__getitem__
code duplication in gensim.models.phrases
__getitem__
code duplication in gensim.models.phrases
Fix #2149
wrote a single function
sentence2token
that can be used by_getitem_
of Phrase and Phrases class.