Skip to content
dnmilne edited this page Aug 22, 2013 · 3 revisions

Below are some tutorial exercises that demonstrate some of the cool things you can do using the Wikipedia Miner API.

You can get an exhaustive list of the available classes and methods from the [Javadoc]|(../../doc).

###Building a command line thesaurus

Tutorial 1: Build an application that lets users type in terms, and receive synonyms, definitions, related topics, and other things you would expect to get from an interactive thesaurus.

Tutorial 2: Extend the thesaurus to resolve conflation issues like CasE vAriaTions, âćçëňŧș, and plural(s).

Tutorial 3: Extend the thesaurus to get better lists of related topics.

Tutorial 4: Extend the thesaurus to get cleaner definitions.

###Building a command line document annotator

Tutorial 5: Build an application that allows users to type in snippets of text, and returns that text annotated with links to the relevant Wikipedia topics.

Tutorial 6: Build a workbench for annotation experiments, including generating training and testing data, and evaluating different settings and classifiers.

###Doing stuff at scale

Tutorial 7: Learn how to parallelize a task that takes Wikipedia xml dumps as input.

Tutorial 8: Learn how to parallelize a task in which each node must talk to the toolkit, without creating a bottleneck.