This chapter gives a detailed tutorial for the development of NLP components using APIs in ELIT. We will develop components for two NLP tasks, part-of-speech tagging and named entity recognition, that can be approached as a sequence tagging problem. Given a sequence of tokens, the goal of sequence tagging is to label each token with a certain tag (e.g., part-of-speech tag) such that it generates a sequence of tags corresponding to those tokens.
An NLP component is an object that takes input text, makes predictions on the input text for a target task, and generates output inferred by those predictions. An NLP component defines a decoding strategy that guides it to process through the input text and an inference model that makes predictions for each state during the process. The following sections first explain the component APIs in ELIT, then describe how to implement a part-of-speech tagger and a named entity recognizer that are simple yet give state-of-the-art performance.
{% hint style="info" %} See the actual implementations of the component APIs, the part-of-speech tagger, and the named entity recognizer described in this chapter. {% endhint %}