Skip to content

Submit Answers to Annotation Assist

DharmendraVaghela edited this page Jul 5, 2016 · 3 revisions

A human annotator needs to judge whether the answers to the questions returned by the various systems are correct.The Annotation Assist tool provides a visual interface to achieve this.

Annotation Assist takes corpus and question/answer pairs files as input. But first these files need to be converted into Annotation Assist format.

Create the JSON corpus file used by the Annotation Assist tool:

themis judge corpus <corpus.csv> > <annotation-assist.corpus.json>

here corpus.csv is the corpus file downloaded from XMGR.

This command will generate annotation-assist.corpus.json file as an output.

generate the question-answer pairs file:

themis judge pairs <answers.wea.csv> <answers.solr.csv> <answers.nlc.csv> > <annotation-assist.pairs.csv>

here answers.wea.csv, answers.solr.csv and 'answers.nlc.csv' are answer files generated by querying respective systems. annotation-assist.pairs.csv is the output file.

Annotating all the questions in the data-set is quite time consuming task. If small portion of the data(representative data) need to be annotated then following commands will be useful.

generate sample question file:

themis question sample <qa-pairs.csv> <size> > <sample.csv>

here qa-pairs.csv is question-answer pairs extracted from usage log by the 'question extract' command. size is number of unique questions to sample. sample.csv is the final sample file.

This command will sample questions without replacement according to a distribution determined by their frequency, so more frequently asked questions are more likely to be in the sample.

generate the question-answer pairs file for Annotation Assist:

themis judge pairs --questions <sample.csv> <answers.wea.csv> <answers.solr.csv> <answers.nlc.csv> > <annotation-assist.pairs.csv>

here sample.csv is the sample file generated by previous command.answers.wea.csv, answers.solr.csv and 'answers.nlc.csv' are answer files generated by querying respective systems. annotation-assist.pairs.csv is the output file.

truncate answers:

themis util truncate-answers <annotation-assist.pairs.csv> <length>

here annotation-assist.pairs.csv is the Annotation Assist file. length describe the length of answer till the point it should be shortened.

This command will truncate answers to given length to be in the accordance with Annotation Assist system.

After annotation, results can be put together into single csv file. This csv file need to be converted into themis format by following command.

themis judge interpret <annotation-assist.judgments.csv> > <judgments.csv>

here annotation-assist.judgments.csv is the csv file from annotation assist tool and judgement.csv is the themis formatted file.