hw1

Austin Matthews

Jan 22, 2015

3517ce0 · Jan 22, 2015

Name	Name	Last commit message	Last commit date
parent directory ..
data	data	set up for hw1	Jan 22, 2015
README.md	README.md	set up for hw1	Jan 22, 2015
align	align	set up for hw1	Jan 22, 2015
check	check	set up for hw1	Jan 22, 2015
grade	grade	modified grade script	Jan 22, 2015

README.md

There are three Python programs here (-h for usage):

The commands are designed to work in a pipeline. For instance, this is a valid invocation:

./align -t 0.9 -n 1000 | ./check | ./grade -n 5

The data/ directory contains a fragment of the German/English Europarl corpus.

data/dev-test-train.de-en is the German/English parallel data to be aligned. The first 150 sentences are for development; the next 150 is a blind set you will be evaluated on; and the remainder of the file is unannotated parallel data.
data/dev.align contains 150 manual alignments corresponding to the first 150 sentences of the parallel corpus. When you run ./check these are used to compute the alignment error rate. You may use these in any way you choose. The notation i-j means the word at position i (0-indexed) in the German sentence is aligned to the word at position j in the English sentence; the notation i?j means they are "probably" aligned.