-
This model aims to classify whether or not an image is a chemical structure. It was built using EfficientNetB0 as a base model, using transfer learning and fine-tuning it.
-
It gives a prediction between 0 and 1 where 0 means it is a chemical structure and 1 means it is not. The data used to build the model is available in Zenodo.
The model was trained on 10905114 images, validated on 2179798 images and tested on 544946 images. It took 52h and 15 minutes and 17GB on a Tesla V100-PCI-E-32GB GPU.
From the results on the test set, we computed the AUC and the Youden index (on the image below). Using the prediction threshold marked by the Youden index, 0.000089, we achieved the following performance metrics:
MCC = 0.989944
Accuracy = 0.994968
Sensitivity = 0.993049
Specificity = 0.996888
The model construction script is available on the model_construction folder with comments for every step taken.
- Simply run
pip install git+https://github.com/Iagea/DECIMER-Image-Classifier
from decimer_image_classifier import DecimerImageClassifier
# Instantiate DecimerImageClassifier (this loads the model and all needed settings)
decimer_classifier = DecimerImageClassifier()
# If you just need a statement about the image at image_path being a chemical structure or not:
result = decimer_classifier.is_chemical_structure(image_path)
# If you want the classification score (value between 0 (-> chemical structure) and 1 (-> no chemical structure))
score = decimer_classifier.get_classifier_score(image_path)
An example with 10 images from the test set that will be available in Zenodo is also present in the form of a Jupyter notebook in the folder examples.
Author: M. Isabel Agea
- A web application implementation is available at decimer.ai, implemented by Otto Brinkhaus