This is the repo of Team DNAKG for NeurIPS'22 OGB-LSC challenge.
Team member: Xu Chen, Xiaojun Ma, Lun Du, Yue Fan, Jiayi Liao, Chongjian Yue, Qiang Fu, Shi Han.
We use Python=3.7, Pytorch=1.12 in our experiments.
pip install dgl==0.4.3 ogb nni==2.8
Download data from ogb and save to dataset/
To get structure-based candidates, please run
sh get_candidate/scripts/generate_structure_candidate.sh
The result is saved as a npy
file.
Please go to get_candidate/code/rule_mining
for more details.
To combine above candidates, please run
get_candidate/code/combine_candidate.py
Remember to revise your path saving candidate: your_path_to_structure_candidate
and your_path_to_rulemining_candidate
.
For all the candidate set, it remove the candidate tail
arguments:
--path_train_hrt path of the train_hrt.npy
--path_val_hr path of the val_hr.npy
--path_test_hr path of the test_hr.npy
--candidate_type candidate type, choice in ['test', 'valid']
--candidate_path_list used to append the path of the candidate
--combine_can_path path of the processed and concatenated candidate
--folder_prev_check folder used to store some temporary files
easy-use scripts:
sh get_candidate/scripts/combine_candidate.sh
To train 13 KGE methods, please
cd scoring/DGL-KE/training_scripts
then run each scripts. For example, to train the Transe_a, run
sh transe_a.sh
The trained model is saved in dglke_path/
.
Note that valdation candidate should be generated to evaluate model performance, and pay attention to the path of validation candidate.
To generate scores for candidates, plsase
cd DGL-KE/scoring_scripts
sh infer_all.sh
The Python script DGL-KE/dglke/eval.py
is used to evaluate the model on the validation set and test set. You can run it to infer a new models instead of inferring all trained models.
arguments:
--load_json path of the config.json which is config file for training
--valid_can_path path of the validation candidate
--test_can_path path of the test candidate
--rst_prename prefix of the infer result
Note that we fixed some parameters in the eval.py
as:
args.eval = True
args.test = True
args.num_test_proc = 30
args.num_proc = 30
args.num_thread = 1
args.gpu = [-1]
The code references the solution of 《NOTE: Solution for KDD-Cup 2021 WikiKG90M-LSC》
To get the features, you should first generate the candidate and save it in your_candidate_path
. Then, please run
cd scoring/feature_generation
python ./feature4lsc2/walk_probability/h2r.py
python ./feature4lsc2/walk_probability/h2t.py
python ./feature4lsc2/walk_probability/r2t.py
python ./feature4lsc2/walk_probability/r2h.py
python ./feature4lsc2/walk_probability/t2r.py
python ./feature4lsc2/walk_probability/t2h.py
python {py_filename} --candidate_path {your_candidate_path} --save_path {your_save_path}
# run py files in ./feature4lsc2/dump_feat/
This will give you the manual feature score.
- To speed up hyperparameter searching, we preprocess the data with
cd ensemble/manual_feature_nni
python3 ../code/manualfeature_getmrr.py --preprocess --save_path '../testmanual_smore_rule/'
and search the best weights with
python3 ../code/manualfeature_getmrr.py --nni --save_path '../testmanual_smore_rule/'
- use the best weights to generate the test results for submission
python3 ../code/test_mrr.py --preprocess --save_path '../final_manual_smore_rule/' --data_path '../final_manual_smore_rule/'
Our implementation is based on the winning solutions LSC@KDD Cup 2021. Please refer to https://ogb.stanford.edu/kddcup2021/results/#awardees_wikikg90m for more details.