Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NgramMap service #3

Closed
s-bose7 opened this issue Apr 16, 2024 · 0 comments · Fixed by #7
Closed

Implement NgramMap service #3

s-bose7 opened this issue Apr 16, 2024 · 0 comments · Fixed by #7
Labels
enhancement New feature or request

Comments

@s-bose7
Copy link
Owner

s-bose7 commented Apr 16, 2024

NgramMap API

The NGramMap class will provide various convenient methods for interacting with Google’s NGrams dataset.

Input File Formats:

The NGram dataset comes in two different file types. The first type is a “words file”. Each line of a words file provides tab separated information about the history of a particular word in English during a given year. i.e.

Word        Year    Occurrence     Sources
airport     2007    175702         32788
airport     2008    173294         31271
request     2005    646179         81592
request     2006    677820         86967
request     2007    697645         92342
request     2008    795265         125775
wandered    2005    83769          32682
wandered    2006    87688          34647
wandered    2007    108634         40101
wandered    2008    171015         64395

The other type of file is a “counts file”. Each line of a counts file provides comma separated information about the total corpus of data available for each calendar year. i.e.

Year, Total words, Total pages, Sources 
1470,    984,         10,         1
1472,    117652,      902,        2
1475,    328918,      1162,       1
1476,    20502,       186,        2
1477,    376341,      2479,       2

Related to #1

@s-bose7 s-bose7 added the enhancement New feature or request label May 19, 2024
s-bose7 added a commit that referenced this issue Jun 30, 2024
Responsible for parsing through the CSV files and storing the data to appropriate structures.
Provides various convenient methods for interacting with Google’s NGrams dataset.
s-bose7 added a commit that referenced this issue Jun 30, 2024
Responsible for parsing through the CSV files and storing the data to appropriate structures.
Provides various convenient methods for interacting with Google’s NGrams dataset.
@s-bose7 s-bose7 changed the title Implementing NgramMap service Implement NgramMap service Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant