Skip to content

Commit

Permalink
Update resources&tools.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lingluodlut committed Dec 26, 2023
1 parent cf2fa4c commit b8b1082
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _pages/resources&tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ permalink: /resources&tools/
<table frame=below>
<tr>
<td align="left"><img src="/assets/images/tools/taiyi.png?raw=true" /></td>
<td align="left">**Taiyi (太一):A Bilingual (Chinese and English) Fine-Tuned Large Language Model for Diverse Biomedical Tasks** [GitHub](https://github.com/DUTIR-BioNLP/Taiyi-LLM)<br> To enable the general LLM to have bilingual biomedical multi-task capabilities, this project first curated a comprehensive collection of 140 existing biomedical text mining datasets, including 102 English and 38 Chinese datasets, covering over 10 biomedical task types. Then the high-quality instruction training datasets are constructed by a series of data processing, such as manual selection, data clarity, deduplication, etc, for subsequent supervised fine-tuning. In contrast to this single-stage fine-tuning, a two-stage fine-tuning strategy is proposed to optimize model performance across a diversity of tasks. Experimental results on 13 test sets demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking.</td>
<td align="left"><font size="4"> <b>Taiyi (太一):A Bilingual (Chinese and English) Fine-Tuned Large Language Model for Diverse Biomedical Tasks </b> </font> &nbsp; <a href="https://github.com/DUTIR-BioNLP/Taiyi-LLM">[GitHub]</a><br> To enable the general LLM to have bilingual biomedical multi-task capabilities, this project first curated a comprehensive collection of 140 existing biomedical text mining datasets, including 102 English and 38 Chinese datasets, covering over 10 biomedical task types. Then the high-quality instruction training datasets are constructed by a series of data processing, such as manual selection, data clarity, deduplication, etc, for subsequent supervised fine-tuning. In contrast to this single-stage fine-tuning, a two-stage fine-tuning strategy is proposed to optimize model performance across a diversity of tasks. Experimental results on 13 test sets demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking.</td>
</tr>

0 comments on commit b8b1082

Please sign in to comment.