This page details instructions for using the brat_to_bert.Dockerfile to convert manually annotated sentences in the BRAT format into training data for BERT models.
docker run --rm -v /path/to/brat/directory/on/local/host/:/brat_files -v /path/to/output/directory/where/bert/file/will/be/created/:/bert_files ucdenverccp/brat-to-bert:0.1 [BIOLINK_ASSOCIATION_NAME] [RECURSE] [OUTPUT_FILE_NAME]
where,
/path/to/brat/directory/on/local/host/
is the path on the local machine where the BRAT files to process are located/path/to/output/directory/where/bert/file/will/be/created/
is the path on the local machine where the BERT file will be created[BIOLINK_ASSOCIATION_NAME]
is the name of the association that has been annotated in the BRAT files. It must match the names of the BiolinkAssociation enum, e.g.bl_chemical_to_disease_or_phenotypic_feature
.[RECURSE]
is YES or NO, indicating if the process should recurse through the BRAT file directory structure[OUTPUT_FILE_NAME]
is the name of the output file that will contain training data for BERT models after calling thedocker run
command.
Note that the image has been published on Dockerhub. If you want to build it locally however, run the following command from the base directory of this project:
docker build -t brat-to-bert -f brat_to_bert.Dockerfile .