Model Name | AVG Rank | MedQA-USMLE | MedQA-Mainland | PromptCBLUE | WebMedQA | CheckupQA | MedicineQA | DialogSumm | MedTriage (F1) |
---|---|---|---|---|---|---|---|---|---|
GPT-4 | 1.25 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 |
PULSE-Pro | 1.75 | 2 | 2 | 2 | 1 | 1 | 2 | 2 | 2 |
ChatGPT | 4.00 | 3 | 4 | 4 | 4 | 6 | 5 | 3 | 3 |
PULSE-OS | 4.12 | 4 | 6 | 5 | 3 | 4 | 3 | 4 | 4 |
Baichuan2 | 4.50 | 6 | 5 | 3 | 5 | 3 | 4 | 5 | 5 |
ChatGLM3 | 5.62 | 5 | 3 | 6 | 6 | 7 | 6 | 6 | 6 |
HuatuoGPT2 | 7.62 | 8 | 7 | 7 | 9 | 8 | 7 | 7 | 8 |
QiZhenGPT | 8.38 | 9 | 8 | 8 | 7 | 5 | 10 | 8 | 12 |
BenTsao | 8.75 | 7 | 10 | 9 | 10 | 10 | 8 | 9 | 7 |
BianQue2 | 10.12 | 10 | 9 | 12 | 8 | 9 | 11 | 11 | 11 |
MING | 10.75 | 12 | 11 | 11 | 12 | 12 | 9 | 10 | 9 |
DoctorGLM | 11.12 | 11 | 12 | 10 | 11 | 11 | 12 | 12 | 10 |