You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But now, as part of the final step, I am facing an issue on running create_self_training_dataset.sh. As per the following output that I am getting, all the .sa.tok files in the selected_functions folder are empty.
Repository root: .
python codegen_sources/test_generation/select_java_inputs.py --local True --input_path /home/xyz/CodeGen-data/java-FULL/ --output_path /home/xyz/CodeGen-data/dataset//selected_functions/ --rerun True
adding /project/6001889/xyz/CodeGen to path
adding to path /project/6001884/xyz/CodeGen
########## Selecting input functions ##########
100%|██████████| 500/500 [10:08:19<00:00, 73.00s/it]
Writing 0 lines to /home/xyz/CodeGen-data/dataset/selected_functions/java.000000000000.sa.tok
Writing 0 lines to /home/xyz/CodeGen-data/dataset/selected_functions/java.000000000001.sa.tok
...
Writing 0 lines to /home/xyz/CodeGen-data/dataset/selected_functions/java.000000000497.sa.tok
Writing 0 lines to /home/xyz/CodeGen-data/dataset/selected_functions/java.000000000498.sa.tok
Writing 0 lines to /home/xyz/CodeGen-data/dataset/selected_functions/java.000000000499.sa.tok
On debugging, I found that is_simple_standalone_func(func) in line 67 of at Link is returning False for all the Java functions. As such, the mask in line 114 in select_functions(funcpath) is an all-False list. Please suggest what to do in this case.
Also, it would be great if the authors can please release the training dataset of 135,000 parallel functions (as mentioned in the paper) between Java, Python, and C++, in the form of a shareable link.
The text was updated successfully, but these errors were encountered:
I am trying to create the self-training dataset, as per the instructions at https://github.com/facebookresearch/CodeGen/blob/main/docs/TransCoder-ST.md.
From Google BigQuery, I got 500
.json.gz
files. Thereafter I preprocessed them and got the following symlinks successfully:But now, as part of the final step, I am facing an issue on running
create_self_training_dataset.sh
. As per the following output that I am getting, all the.sa.tok
files in the selected_functions folder are empty.On debugging, I found that
is_simple_standalone_func(func)
in line 67 of at Link is returningFalse
for all the Java functions. As such, the mask in line 114 inselect_functions(funcpath)
is an all-False list. Please suggest what to do in this case.Also, it would be great if the authors can please release the training dataset of 135,000 parallel functions (as mentioned in the paper) between Java, Python, and C++, in the form of a shareable link.
The text was updated successfully, but these errors were encountered: