[NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

We release PediatricsGPT, the first Chinese medical large language model for pediatric applications and medical generalist:

PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications
Dingkang Yang*¹, Jinjie Wei*¹, Dongling Xiao*²,..., Ke Li³, Peng Zhai¹, Lihua Zhang¹
¹Academy for Engineering and Technology, Fudan University
²ByteDance
³Tencent Youtu Lab

Abstract

Developing intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the above issues, this paper builds PedCorpus, a high-quality dataset of over 300,000 multi-task instructions from pediatric textbooks, guidelines, and knowledge graph resources to fulfil diverse diagnostic demands. Upon well-designed PedCorpus, we propose PediatricsGPT, the first Chinese pediatric LLM assistant built on a systematic and robust training pipeline. In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. Immediately, the full-parameter Supervised Fine-Tuning (SFT) is utilized to incorporate the general medical knowledge schema into the models. After that, we devise a direct following preference optimization to enhance the generation of pediatrician-like humanistic responses. In the parameter-efficient secondary SFT phase, a mixture of universal-specific experts strategy is presented to resolve the competency conflict between medical generalist and pediatric expertise mastery. Extensive results based on the metrics, GPT-4, and doctor evaluations on distinct downstream tasks show that PediatricsGPT consistently outperforms previous Chinese medical LLMs.

PedCorpus

To endow the model with versatile diagnostic proficiency, PedCorpus is constructed through the multi-dimensional corpus across three application-oriented medical tasks, including Knowledge Question-Answer (MedKQ&A), Evidence-based Diagnosis (EviDiag), and Treatment Recommendation (TreRecom). After undergoing an internal ethical review by the partnering healthcare institutions, we release licensed and controllable portions of data resources from PedCorpus.

PediatricsGPT

After going through the designed training procedure, we provide the 13B version of the model after the DFPO phase as well as the MoE-Adapter from the parameter-efficient secondary SFT phase.

Model	Link	Description
PediatricsGPT-13B-Base	Access	Preference-aligned version via the direct following preference optimization
MoE-Adapter	Access	LoRA-based Adapter with mixture of universal-specific experts

Deploy

git clone https://github.com/ydk122024/PediatricsGPT.git
cd PediatricsGPT
pip install -r requirements.txt

Client-side Inference:

CUDA_VISIBLE_DEVICES=$num python src/cli_demo.py \                 
--model_name_or_path $model_dir \
--adapter_name_or_path $adapter_dir \
--template default \
--finetuning_type lora

Web-side Inference:

CUDA_VISIBLE_DEVICES=$num python src/web_demo.py \                 
--model_name_or_path $model_dir \
--adapter_name_or_path $adapter_dir \
--template default \
--finetuning_type lora

Result Analysis

We present the comparison results of different models on three medical benchmarks through multifaceted metrics, including ROUGE-1/2/L, BLEU-1/2/3/4, GLEU, and Distinct-1/2. Measuring model performance from multiple aspects is essential in the medical domain. To this end, we consider four dimensions to holistically assess response quality, including usefulness, correctness, consistency, and smoothness. Advanced GPT-4 is prompted to select the winning response between pairwise models based on these dimensions. Doctor approval of LLM assistants is a vital step toward realistic applications. We invite several doctors to determine the winner of pairwise models by the majority voting rule. The evaluation requires simultaneous consideration of the responses’ professionalism, factuality, and safety.

Limitations Statement

The user should assume all medical risks and responsibilities in the use of the model. Model responses should be treated with caution due to the possibility of hallucinations.

Acknowledgement

We are aware that our works are inspired by the following works, including but not limited to

LLaMA-Factory: https://github.com/hiyouga/LLaMA-Factory
Baichuan: https://huggingface.co/baichuan-inc

Without these, nothing could happen in this repository.

Citation

If you are using our PediatricsGPT for your research, please cite the following paper:

@inproceedings{yang2024pediatricsgpt,
title={PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications},
 author={Yang, Dingkang and Wei, Jinjie and Xiao, Dongling and Wang, Shunli and Wu, Tong and Li, Gang and Li, Mingcheng and Wang, Shuaibing and Chen, Jiawei and Jiang, Yue and others},
 booktitle={Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS)},
 year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Abstract

PedCorpus

PediatricsGPT

Deploy

Result Analysis

Limitations Statement

Acknowledgement

Citation

About

Releases

Packages

Languages

License

ydk122024/PediatricsGPT

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

Abstract

PedCorpus

PediatricsGPT

Deploy

Result Analysis

Limitations Statement

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages