WSVM is a fully automated method using Vision Transformer (ViT) and Multiple Instance Learning (MIL) for tongue extraction and tooth-marked tongue recognition. It accurately detects the tongue region in clinical images and uses weakly supervised learning to identify tooth-marked areas with only image-level annotations. WSVM enhances the objectivity and accuracy of tongue diagnosis in Traditional Chinese Medicine (TCM).
- Python 3.9.19
- PyTorch 2.2.2, CUDA 12.3
- Required Python packages (specified in
environment.yaml
)
Clone the repository:
git clone https://github.com/yc-zh/WSVM.git
cd WSVM
Create a virtual environment and install dependencies:
conda env create -f environment.yaml
conda activate WSVM
Download the dataset from the following link: Tongue Image Dataset and put it in the data/tongue
directory.
The pre-trained model weights are based on the deep-learning-for-image-processing. The weights can be downloaded from the following link: Pre-trained model and put it in the vit_weights
directory. The extraction code is eu9f
.
To train the model, run the following command:
python train.py
To test the model, use:
python test.py
WSVM/
├── data/ # Directory for storing datasets
├── models/ # fine-tuned model files
├── tongue_extraction/ # Scripts for tongue foreground extraction
├── vision_transformer/ # Vision Transformer related code
├── vit_weights/ # Pre-trained ViT weights
├── environment.yaml # Conda environment configuration file
├── README.md # Project documentation
├── test.py # Testing script
├── train.py # Training script
└── utils.py # Utility functions
Thanks deep-learning-for-image-processing, SAM and YOLOv8 for their public code and released models.