Skip to content

[NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

License

Notifications You must be signed in to change notification settings

ydk122024/PediatricsGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

We release PediatricsGPT, the first Chinese medical large language model for pediatric applications and medical generalist:

teaser

PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications
Dingkang Yang*1, Jinjie Wei*1, Dongling Xiao*2,..., Ke Li3, Peng Zhai1, Lihua Zhang1
1Academy for Engineering and Technology, Fudan University
2ByteDance
3Tencent Youtu Lab

Abstract

Developing intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the above issues, this paper builds PedCorpus, a high-quality dataset of over 300,000 multi-task instructions from pediatric textbooks, guidelines, and knowledge graph resources to fulfil diverse diagnostic demands. Upon well-designed PedCorpus, we propose PediatricsGPT, the first Chinese pediatric LLM assistant built on a systematic and robust training pipeline. In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. Immediately, the full-parameter Supervised Fine-Tuning (SFT) is utilized to incorporate the general medical knowledge schema into the models. After that, we devise a direct following preference optimization to enhance the generation of pediatrician-like humanistic responses. In the parameter-efficient secondary SFT phase, a mixture of universal-specific experts strategy is presented to resolve the competency conflict between medical generalist and pediatric expertise mastery. Extensive results based on the metrics, GPT-4, and doctor evaluations on distinct downstream tasks show that PediatricsGPT consistently outperforms previous Chinese medical LLMs.

PedCorpus

To endow the model with versatile diagnostic proficiency, PedCorpus is constructed through the multi-dimensional corpus across three application-oriented medical tasks, including Knowledge Question-Answer (MedKQ&A), Evidence-based Diagnosis (EviDiag), and Treatment Recommendation (TreRecom). After undergoing an internal ethical review by the partnering healthcare institutions, we release licensed and controllable portions of data resources from PedCorpus.

PediatricsGPT

teaser

After going through the designed training procedure, we provide the 13B version of the model after the DFPO phase as well as the MoE-Adapter from the parameter-efficient secondary SFT phase.

Model Link Description
PediatricsGPT-13B-Base Access Preference-aligned version via the direct following preference optimization
MoE-Adapter Access LoRA-based Adapter with mixture of universal-specific experts

Deploy

git clone https://github.com/ydk122024/PediatricsGPT.git
cd PediatricsGPT
pip install -r requirements.txt

Client-side Inference:

CUDA_VISIBLE_DEVICES=$num python src/cli_demo.py \                 
--model_name_or_path $model_dir \
--adapter_name_or_path $adapter_dir \
--template default \
--finetuning_type lora

Web-side Inference:

CUDA_VISIBLE_DEVICES=$num python src/web_demo.py \                 
--model_name_or_path $model_dir \
--adapter_name_or_path $adapter_dir \
--template default \
--finetuning_type lora

Result Analysis

We present the comparison results of different models on three medical benchmarks through multifaceted metrics, including ROUGE-1/2/L, BLEU-1/2/3/4, GLEU, and Distinct-1/2. teaser Measuring model performance from multiple aspects is essential in the medical domain. To this end, we consider four dimensions to holistically assess response quality, including usefulness, correctness, consistency, and smoothness. Advanced GPT-4 is prompted to select the winning response between pairwise models based on these dimensions. teaser Doctor approval of LLM assistants is a vital step toward realistic applications. We invite several doctors to determine the winner of pairwise models by the majority voting rule. The evaluation requires simultaneous consideration of the responses’ professionalism, factuality, and safety. teaser

Limitations Statement

The user should assume all medical risks and responsibilities in the use of the model. Model responses should be treated with caution due to the possibility of hallucinations.

Acknowledgement

We are aware that our works are inspired by the following works, including but not limited to

Without these, nothing could happen in this repository.

Citation

If you are using our PediatricsGPT for your research, please cite the following paper:

@inproceedings{yang2024pediatricsgpt,
title={PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications},
 author={Yang, Dingkang and Wei, Jinjie and Xiao, Dongling and Wang, Shunli and Wu, Tong and Li, Gang and Li, Mingcheng and Wang, Shuaibing and Chen, Jiawei and Jiang, Yue and others},
 booktitle={Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS)},
 year={2024}
}

About

[NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages