Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac-Morpho (corpus of Brazilian Portuguese texts annotated with part-of-speech tags) #6

Open
fititnt opened this issue May 31, 2018 · 0 comments
Labels
nltk-data http://www.nltk.org/nltk_data/

Comments

@fititnt
Copy link
Member

fititnt commented May 31, 2018

Mac-Morpho is a corpus of Brazilian Portuguese texts annotated with part-of-speech tags. Its first version was released in 2003 [1], and since then, two revisions have been made in order to improve the quality of the resource [2, 3].

The corpus is available for download split into train, development and test sections. These are 76%, 4% and 20% of the corpus total, respectively (the reason for the unusual numbers is that the corpus was first split into 80%/20% train/test, and then 5% of the train section was set aside for development). This split was used in [3], and new POS tagging research with Mac-Morpho is encouraged to follow it in order to make consistent comparisons possible.

  • Download Mac-Morpho
  • Download annotation manual (in Portuguese)
    NOTE: the manual was written for its original annotation, i.e., before the changes in the tagset were
    introduced. Therefore, it does not reflect the current state of the corpus.

Disclaimer: Mac-Morpho versions 1, 2 and 3 are licensed under a Creative Commons Attribution 4.0 International License. This means you can distribute, remix, tweak, and build upon Mac-Morpho versions, even commercially, as long as you give us the credit for the original creation. Mac-Morpho License.

@fititnt fititnt added the nltk-data http://www.nltk.org/nltk_data/ label May 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nltk-data http://www.nltk.org/nltk_data/
Projects
None yet
Development

No branches or pull requests

1 participant