Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix documentation for gensim.corpora. Partial fix #1671 #1729

Merged
merged 54 commits into from
Jan 22, 2018
Merged
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
b260d4b
Fix typo
anotherbugmaster Sep 30, 2017
36d98d1
Make `save_corpus` private
anotherbugmaster Oct 2, 2017
981ebbb
Annotate `bleicorpus.py`
anotherbugmaster Oct 2, 2017
3428113
Make __save_corpus weakly private
anotherbugmaster Oct 2, 2017
69fc7e0
Fix _save_corpus in tests
anotherbugmaster Oct 2, 2017
b65a69a
Fix _save_corpus[2]
anotherbugmaster Oct 3, 2017
6fa92f3
Merge remote-tracking branch 'upstream/develop' into develop
anotherbugmaster Oct 15, 2017
78e207d
Document bleicorpus in Numpy style
anotherbugmaster Oct 24, 2017
7519382
Document indexedcorpus
anotherbugmaster Oct 24, 2017
ae69867
Annotate csvcorpus
anotherbugmaster Nov 3, 2017
c2765ed
Add "Yields" section
anotherbugmaster Nov 3, 2017
40add21
Make `_save_corpus` public
anotherbugmaster Nov 3, 2017
e044c3a
Annotate bleicorpus
anotherbugmaster Nov 3, 2017
123327d
Fix indentation in bleicorpus
anotherbugmaster Nov 3, 2017
2382d01
`_save_corpus` -> `save_corpus`
anotherbugmaster Nov 21, 2017
42409bf
Annotate bleicorpus
anotherbugmaster Nov 21, 2017
7cb5bbf
Convert dictionary docs to numpy style
anotherbugmaster Nov 21, 2017
56f19e6
Convert hashdictionary docs to numpy style
anotherbugmaster Nov 21, 2017
9162a7e
Convert indexedcorpus docs to numpy style
anotherbugmaster Nov 21, 2017
5eaaac4
Convert lowcorpus docs to numpy style
anotherbugmaster Nov 21, 2017
3b6b076
Convert malletcorpus docs to numpy style
anotherbugmaster Nov 21, 2017
d7f3fc8
Convert mmcorpus docs to numpy style
anotherbugmaster Nov 21, 2017
c46bff4
Convert sharded_corpus docs to numpy style
anotherbugmaster Nov 21, 2017
7823546
Convert svmlightcorpus docs to numpy style
anotherbugmaster Nov 21, 2017
9878133
Convert textcorpus docs to numpy style
anotherbugmaster Nov 21, 2017
dba4429
Convert ucicorpus docs to numpy style
anotherbugmaster Nov 21, 2017
6a95c94
Convert wikicorpus docs to numpy style
anotherbugmaster Nov 21, 2017
6dcfb07
Add sphinx tweaks
anotherbugmaster Nov 21, 2017
2f61fc3
Merge remote-tracking branch 'upstream/develop' into develop
anotherbugmaster Nov 21, 2017
ac01abb
Merge branch 'develop' into fix_1605
anotherbugmaster Nov 21, 2017
833ec64
Remove trailing whitespaces
anotherbugmaster Nov 21, 2017
e656609
Merge branch 'develop' into fix_1605
anotherbugmaster Nov 23, 2017
3e597fe
Annotate wikicorpus
anotherbugmaster Nov 28, 2017
da1d5c2
SVMLight Corpus annotated
anotherbugmaster Dec 5, 2017
89f6098
Fix TODO
anotherbugmaster Dec 5, 2017
9eeea21
Fix grammar mistake
anotherbugmaster Dec 6, 2017
2b6aeaf
Undo changes to dictionary
anotherbugmaster Dec 7, 2017
9b17057
Undo changes to hashdictionary
anotherbugmaster Dec 7, 2017
de3ea0f
Document indexedcorpus
anotherbugmaster Dec 9, 2017
dafc373
Document indexedcorpus[2]
anotherbugmaster Dec 10, 2017
ff980bc
Merge upstream
anotherbugmaster Jan 9, 2018
0189d8d
Remove redundant files
anotherbugmaster Jan 11, 2018
943406c
Merge upstream
anotherbugmaster Jan 16, 2018
57cb5a3
Add more dots. :)
anotherbugmaster Jan 16, 2018
08ca492
Fix monospace
anotherbugmaster Jan 16, 2018
381fb97
remove useless method
menshikh-iv Jan 18, 2018
5b5701a
fix bleicorpus
menshikh-iv Jan 18, 2018
0e5c0cf
fix csvcorpus
menshikh-iv Jan 18, 2018
627c0e5
fix indexedcorpus
menshikh-iv Jan 18, 2018
b771bb5
fix svmlightcorpus
menshikh-iv Jan 18, 2018
d76af8d
fix wikicorpus[1]
menshikh-iv Jan 18, 2018
7fe753f
fix wikicorpus[2]
menshikh-iv Jan 18, 2018
a9eb1a3
fix wikicorpus[3]
menshikh-iv Jan 18, 2018
e3a8ebf
fix review comments
menshikh-iv Jan 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
fix review comments
menshikh-iv committed Jan 22, 2018
commit e3a8ebf71fb9f6f8b248bad16d0d94baabf9a28d
17 changes: 11 additions & 6 deletions gensim/corpora/bleicorpus.py
Original file line number Diff line number Diff line change
@@ -41,9 +41,14 @@ def __init__(self, fname, fname_vocab=None):
Parameters
----------
fname : str
File path to Serialized corpus.
Path to corpus.
fname_vocab : str, optional
Vocabulary file. If `fname_vocab` is None, searching for the vocab.txt or `fname_vocab`.vocab file.
Vocabulary file. If `fname_vocab` is None, searching one of variants:

* `fname`.vocab
* `fname`/vocab.txt
* `fname_without_ext`.vocab
* `fname_folder`/vocab.txt

Raises
------
@@ -120,9 +125,9 @@ def save_corpus(fname, corpus, id2word=None, metadata=False):
Parameters
----------
fname : str
Path to output filename.
Path to output file.
corpus : iterable of iterable of (int, float)
Input corpus
Input corpus in BoW format.
id2word : dict of (str, str), optional
Mapping id -> word for `corpus`.
metadata : bool, optional
@@ -160,8 +165,8 @@ def save_corpus(fname, corpus, id2word=None, metadata=False):
return offsets

def docbyoffset(self, offset):
"""Get document corresponding to `offset`,
offset can be given from :meth:`~gensim.corpora.bleicorpus.BleiCorpus.save_corpus`.
"""Get document corresponding to `offset`.
Offset can be given from :meth:`~gensim.corpora.bleicorpus.BleiCorpus.save_corpus`.

Parameters
----------
2 changes: 1 addition & 1 deletion gensim/corpora/csvcorpus.py
Original file line number Diff line number Diff line change
@@ -34,7 +34,7 @@ def __init__(self, fname, labels):
Parameters
----------
fname : str
Path to corpus in CSV format.
Path to corpus.
labels : bool
If True - ignore first column (class labels).

6 changes: 3 additions & 3 deletions gensim/corpora/indexedcorpus.py
Original file line number Diff line number Diff line change
@@ -49,7 +49,7 @@ def __init__(self, fname, index_fname=None):
Parameters
----------
fname : str
Path to indexed corpus.
Path to corpus.
index_fname : str, optional
Path to index, if not provided - used `fname.index`.

@@ -73,9 +73,9 @@ def serialize(serializer, fname, corpus, id2word=None, index_fname=None,
Parameters
----------
fname : str
Path to output filename
Path to output file.
corpus : iterable of iterable of (int, float)
Corpus in BoW format
Corpus in BoW format.
id2word : dict of (str, str), optional
Mapping id -> word.
index_fname : str, optional
10 changes: 5 additions & 5 deletions gensim/corpora/svmlightcorpus.py
Original file line number Diff line number Diff line change
@@ -49,7 +49,7 @@ def __init__(self, fname, store_labels=True):
Parameters
----------
fname: str
Path to corpus in SVMlight format.
Path to corpus.
store_labels : bool, optional
Whether to store labels (~SVM target class). They currently have no application but stored
in `self.labels` for convenience by default.
@@ -138,8 +138,8 @@ def docbyoffset(self, offset):
# TODO: it brakes if gets None from line2doc

def line2doc(self, line):
"""Get a document from a single line in SVMlight format,
inverse of :meth:`~gensim.corpora.svmlightcorpus.SvmLightCorpus.doc2line`.
"""Get a document from a single line in SVMlight format.
This method inverse of :meth:`~gensim.corpora.svmlightcorpus.SvmLightCorpus.doc2line`.

Parameters
----------
@@ -166,8 +166,8 @@ def line2doc(self, line):

@staticmethod
def doc2line(doc, label=0):
"""Convert BoW representation of document in SVMlight format,
inverse of :meth:`~gensim.corpora.svmlightcorpus.SvmLightCorpus.line2doc`.
"""Convert BoW representation of document in SVMlight format.
This method inverse of :meth:`~gensim.corpora.svmlightcorpus.SvmLightCorpus.line2doc`.

Parameters
----------
8 changes: 4 additions & 4 deletions gensim/corpora/wikicorpus.py
Original file line number Diff line number Diff line change
@@ -168,7 +168,7 @@ def remove_template(s):
Parameters
----------
s : str
String containing markup template
String containing markup template.

Returns
-------
@@ -250,7 +250,7 @@ def tokenize(content, token_min_len=TOKEN_MIN_LEN, token_max_len=TOKEN_MAX_LEN,
token_min_len : int
Minimal token length.
token_max_len : int
Maximal token length
Maximal token length.
lower : bool
If True - convert `content` to lower case.

@@ -299,7 +299,7 @@ def extract_pages(f, filter_namespaces=False):
f : file
File-like object.
filter_namespaces : list of str or bool
Namespaces that will be extracted
Namespaces that will be extracted.

Yields
------
@@ -517,7 +517,7 @@ def get_texts(self):
Notes
-----
This iterates over the **texts**. If you want vectors, just use the standard corpus interface
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dot

instead of this method
instead of this method:

>>> for vec in wiki_corpus:
>>> print(vec)