Code for 'Dependency Grammar Induction with a Neural Variational Transition-based Parser' (AAAI2019)
Brown Clustering
After clustering, add extra two fields (cluster index and token index inside the cluster) to the UD/WSJ dataset
Customized TorchText 0.2.3
Since WSJ corpus is not publicly available, training and evaluating scripts for UD are as below.
Train the encoder ./ud_scripts/ud_train_encoder.sh
Train the decoder ./ud_scripts/ud_train_decoder.sh
Note:
Set no length limitation for preprocessing to keep a full vocabulary;
Set random seed to be -1
Rule settings:
Universal Ruels: --pr_fname "./data/pr_rules/ud_c/"$LANGUAGE"_0.5.txt"
Weakly Supervised: --pr_fname "./data/pr_rules/ud_c/"$LANGUAGE"_10_gt.txt"
Pretrain: cd ud_scripts && ./ud_pre.sh
Finetune: cd ud_scripts && ./ud_ft.sh
cd ud_scripts && ./ud_test.sh