Skip to content

Repository for the EMNLP 2023 Demo Paper "Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data"

Notifications You must be signed in to change notification settings

moleculemaker/ReactionMiner

 
 

Repository files navigation

🧪 ReactionMiner

Official Repository for the EMNLP 2023 Demo Paper
Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data

🛠️ Environment

To get started, install the necessary packages:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

For using the PDF-to-Text module in Reaction Miner, ensure Maven and Java 1.8 are installed. Then, execute:

cd pdf2text/SymbolScraper
git submodule update --init
make

📖 How to Use Reaction Miner

Given a PDF file, please refer to example.py to run our entire system. It can be broken down into the following three steps:

Step 1: PDF-to-Text Conversion

This step transforms a PDF file into text, saving a json file:

from pdf2text.generalParser import parseFile
pdf_path = "copper_acetate.pdf" # PDF file given by the user
result = parseFile(pdf_path)
full_text = result['fullText'] # Text without paragraph information
paragraphs = result['contents']  # Text with paragraph boundaries

The converted text is saved in pdf2text/results.

Step 2: Text Segmentation

Identifies paragraphs about chemical reactions and segments them:

from segmentation.segmentor import TopicSegmentor
segmentor = TopicSegmentor()
seg_texts = segmentor.segment(paragraphs)

Step 3: Reaction Extraction

Extracts structured chemical reactions from each segment:

from extraction.extractor import ReactionExtractor
extractor = ReactionExtractor('7b')
reactions = extractor.extract(seg_texts)

🤖 Model Training

We fine-tune Llama-2-7B with LoRA, a technique for efficient fine-tuning, on our collected training set for our reaction extractor. Explore the training details in extraction/training.

📚 Citation

If you find Reaction Miner helpful, please kindly cite our paper:

@inproceedings{zhong2023reaction,
  title={Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data},
  author={Zhong, Ming and Ouyang, Siru and Jiao, Yizhu and Kargupta, Priyanka and Luo, Leo and Shen, Yanzhen and Zhou, Bobby and Zhong, Xianrui and Liu, Xuan and Li, Hongxiang and others},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={389--402},
  year={2023}
}

About

Repository for the EMNLP 2023 Demo Paper "Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.9%
  • Shell 1.1%