Zhongzhen Huang, Gui Geng, Shengyi Hua, Zhen Huang, Haoyang Zou, Shaoting Zhang, Pengfei Liu, Xiaofan Zhang
- [2025/01] 🚨 We have officially released our technical report and the training dataset on 🤗 Hugging Face.
This paper investigates the application of inference time scaling in the medical domain, focusing on complex reasoning, spanning tasks from diagnostic decision-making to treatment planning.
Key findings include:
- Long thought processes require sufficient domain knowledge and instruction-following ability to function effectively during testing.
- Majority voting offers a simple method for augmenting inference time computation, though its efficacy is limited.
- Harder tasks necessitate longer reasoning processes, supporting the idea that task complexity drives the need for more extensive thought chains.
- The multiple-choice options can be removed to encourage the generation of free-form responses, enabling an exploration of the potential of medical journey learning. The model's reasonable responses underscore the promising capability of inference time scaling and journey learning in advancing LLM performance in real-world clinical reasoning tasks.
If you are interested in our project and would like to join us, feel free to send an email to [email protected].
@article{huang2025o1replicationjourney,
title={O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning},
author={Zhongzhen Huang and Gui Geng and Shengyi Hua and Zhen Huang and Haoyang Zou and Shaoting Zhang and Pengfei Liu and Xiaofan Zhang},
journal={arXiv preprint arXiv:2501.06458},
year={2025}
}