Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add paragraph describing dictionary.dfs and dictionary.compactify()
Browse files Browse the repository at this point in the history
In code snippet 13 there are two new concepts introduced that have not
been explained yet. In addition the workflow to create the dictionary
here is completely different from the workflow described in code
snippets 4 and 5. I've added a paragraph that tries to explain the new
workflow and concepts.
oonska authored and tmcmurphy-cradlepoint committed May 22, 2017

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
1 parent 0635638 commit 2d9f777
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/notebooks/Corpora_and_Vector_Spaces.ipynb
Original file line number Diff line number Diff line change
@@ -340,7 +340,7 @@
"source": [
"Although the output is the same as for the plain Python list, the corpus is now much more memory friendly, because at most one vector resides in RAM at a time. Your corpus can now be as large as you want.\n",
"\n",
"Similarly, to construct the dictionary without loading all texts into memory:"
"We are going to create the dictionary from the mycorpus.txt file without loading the entire file into memory. Then, we will generate the list of token ids to remove from this dictionary by querying the dictionary for the token ids of the stop words, and by querying the document frequencies dictionary (dictionary.dfs) for token ids that only appear once. Finally, we will filter these token ids out of our dictionary and call dictionary.compactify() to remove the gaps in the token id series."
]
},
{

0 comments on commit 2d9f777

Please sign in to comment.