=================
The task is to develop low-level NLP tools for Magahi and Bhojpuri. Both Magahi and Bhojpuri are Eastern Indo-Aryan languages spoken largely in the Eastern states of Bihar, Jharkhand and Uttar Pradesh in India. These languages are part of what is considered a dialect continuum running the Eastern part of India to its Weatern part and consisting of approximately 50 languages / varieties. Hindi, the official language of India, is part of the same continuum and as such these are closely related to each other. However, despite this similarity, these languages have large divergences in terms of lexicon as well as morphological make-up. As such most of the tools developed for Hindi do not perform very well with the other languages. For this task, we are providing small annotated datasets for Magahi and Bhojpuri in order to develop part-of-speech tagger and morphological analyser for these languages. The dataset is annotated with the part of speech categories and morphological features from Universal Dependencies tagset.
===========
The task has 2 sub-tasks - a. POS tagger for each language b. Number, Gender, Person, Tense, Aspect, Honorificity and Case relation analyser for each language
========
We will provide 5,000 annotated sentences (in CONLL-U format) for each of the 2 languages. In addition to this, participants are also encouraged to use the Hindi dataset available with Universal Dependencies project. Additionally they are free to use any other dataset as long as the dataset is freely available for research
=========================
The standard evaluation metrics for evaluating and ranking the teams will be macro-averaged F1 scores.
=============
The simple probabilistic baseline (the most frequent tags get assigned to each token) will be provided by the organisers.
====================
Training dataset will be made available by 15th April, 2019. Other deadlines are as per the workshop schedule.
============
Results will be made available as per the workshop schedule
=====================
Paper submission instructions will be same as for the workshop
Query If you have any queries regarding this task, please raise [Issue](https://github.com/shashwatup9k/nsurl-2019/issues).
=== Machine-readable metadata (DO NOT REMOVE!) ===================================================== Data available since: Low-level NLP Tools for Magahi and Bhojpuri Shared Task-2019 License: CC BY-NC-SA 4.0 ======= Includes text: yes Shared Task Organisers: Kumar; Ritesh and Ojha, Atul Kr. Contributor/©holder: Panlingua Language Processing LLP, N. Delhi, India and KMI-Linguistics, Dr. Bhimrao Ambedkar University, Agra =======================================================================================================