Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update code #8

Merged
merged 86 commits into from
Apr 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
6b76eed
feat: add agents/actions/write_markdown
jujimeizuo Mar 27, 2024
5bc9739
feat: add agents/actions/write_markdown (#152)
jujimeizuo Mar 27, 2024
3bead46
Update RAG pipeline (#153)
aJupyter Mar 27, 2024
a24eb52
Dev (#154)
aJupyter Mar 27, 2024
21e8129
[ADD] add evaluation result of base model on 5/10 epochs
ZeyuBa Mar 28, 2024
441a139
[ADD] add evaluation result of base model on 5/10 epochs (#155)
aJupyter Mar 28, 2024
3996029
[Merge] dev (#157)
aJupyter Mar 28, 2024
8633278
Rename mother.json to mother_v1_2439.json
brycewang2018 Mar 29, 2024
a6241bc
Add files via upload
brycewang2018 Mar 29, 2024
8d950c3
Mother dataset
aJupyter Mar 30, 2024
c9e27ca
[Dataset] Merge Dev (#159)
aJupyter Mar 30, 2024
7d2742a
[DOC] update README
zealot52099 Apr 2, 2024
6958611
Update requirements.txt
chg0901 Apr 2, 2024
dbaf87b
update mpi4py installation (#162)
chg0901 Apr 2, 2024
a3ead4f
Update README_EN.md
chg0901 Apr 2, 2024
5c64549
[DOC] update README (#161)
chg0901 Apr 2, 2024
cbcd268
[Merge] Dev (#163)
aJupyter Apr 2, 2024
ce26b77
Update README.md
brycewang2018 Apr 2, 2024
3db73ba
Update README.md (#164)
aJupyter Apr 3, 2024
c682725
多轮对话母亲角色的微调的脚本
brycewang2018 Apr 3, 2024
e7085c5
internlm2_7b_chat_qlora_e3_mother (#165)
aJupyter Apr 4, 2024
6008c23
[Merge] Dev (#166)
aJupyter Apr 4, 2024
11aca78
Update README.md
brycewang2018 Apr 4, 2024
ee320cd
Update README_EN.md
brycewang2018 Apr 4, 2024
3ec5895
Update README.md
brycewang2018 Apr 4, 2024
cfc952d
Update README_EN.md
brycewang2018 Apr 4, 2024
adc0aac
Update README_EN.md
brycewang2018 Apr 4, 2024
824bf51
[DOC] Update readme(#167)
aJupyter Apr 5, 2024
db1ed78
[DOC] Update readme (#168)
aJupyter Apr 5, 2024
2d75f27
Changes to be committed:
Apr 8, 2024
7886171
Update README.md
Yicooong Apr 8, 2024
cad9ed1
Update README.md
Yicooong Apr 8, 2024
6bf233c
Changes to be committed:
Apr 8, 2024
1a03180
Changes to be committed:
Apr 8, 2024
cc0b3d6
Merge branch 'main' of https://github.com/Yicooong/EmoLLM
Apr 8, 2024
4a7d952
Update readme (#169)
aJupyter Apr 8, 2024
efa4c92
[Doc] update readme (#170)
aJupyter Apr 8, 2024
0e9bd73
[Doc] update readme
aJupyter Apr 9, 2024
a25df2d
Merge branch 'main' into dev
aJupyter Apr 9, 2024
e97b1a8
[Doc] update readme
aJupyter Apr 9, 2024
5abda38
[Doc] Update readme (#171)
aJupyter Apr 9, 2024
64a8025
Update README.md
MING-ZCH Apr 9, 2024
b6e81c8
Update README_EN.md
MING-ZCH Apr 9, 2024
360dc21
Update README.md
MING-ZCH Apr 9, 2024
700edfb
Update README_EN.md
MING-ZCH Apr 9, 2024
6e41cba
[DOC] Update README.md in datasets and evaluate (#172)
MING-ZCH Apr 9, 2024
7a19c51
[merge] merge new docs from dev bench (#173)
MING-ZCH Apr 9, 2024
d1bf15c
Delete datasets/mother_v1_2439.json
MING-ZCH Apr 9, 2024
84e0396
Merge branch 'SmartFlowAI:main' into main
MING-ZCH Apr 9, 2024
907e714
Rename mother_v2_3838.json to mother_v2.json
MING-ZCH Apr 9, 2024
310ecfb
Delete datasets/mother_v2.json
MING-ZCH Apr 9, 2024
4068146
Add files via upload
MING-ZCH Apr 9, 2024
4827b39
Update README.md
MING-ZCH Apr 9, 2024
4c60db8
Update README_EN.md
MING-ZCH Apr 9, 2024
2dfa295
Upload mother dataset (#174)
MING-ZCH Apr 9, 2024
43646e7
[Merge] Merge datasets from Dev bench (#175)
MING-ZCH Apr 9, 2024
5a7980f
[Doc] Update README_EN.md
eltociear Apr 9, 2024
14b8b9c
[Doc] Update README_EN.md (#176)
jujimeizuo Apr 10, 2024
2a9ef19
InternLM2-Base-7B QLoRA微调模型 链接和测评结果更新
chg0901 Apr 10, 2024
0e2ae67
Merge branch 'dev' into main
chg0901 Apr 10, 2024
e3b8f2f
[Doc] update evaluate result (#178)
aJupyter Apr 10, 2024
3837558
[Doc] update evalutaion result (#179)
aJupyter Apr 10, 2024
1c5d447
add download_model.py script, automatic download of model libraries
HatBoy Apr 10, 2024
a36a8f5
Merge branch 'main' of https://github.com/SmartFlowAI/EmoLLM
HatBoy Apr 10, 2024
78eeb2f
Added model download scripts, modified local deployment documentation…
aJupyter Apr 10, 2024
43ff7d3
Merge Dev (#181)
aJupyter Apr 10, 2024
67687fc
清除图片的黑边、更新作者信息
Apr 11, 2024
edaac7e
Merge branch 'main' of https://github.com/Yicooong/EmoLLM
Apr 11, 2024
8963632
rectify aiwei_demo transparent
Apr 11, 2024
4b5b7dc
transparent
Apr 11, 2024
1423ccf
modify: aiwei_demo table--->div
Apr 11, 2024
5df57c0
modified: aiwei_demo
Apr 11, 2024
fa20c8f
modify: div ---> table
Apr 11, 2024
7711050
modified: README.md
Apr 11, 2024
0d601db
modified: README_EN.md
Apr 11, 2024
f7c7a27
消除图片黑边,增加作者信息 (#182)
MING-ZCH Apr 11, 2024
b11e29a
[Doc] update readme (#183)
aJupyter Apr 11, 2024
2654270
update model config file links
chg0901 Apr 12, 2024
a41e470
update model config file links (#184)
chg0901 Apr 12, 2024
3366b8a
Create internlm2_20b_chat_lora_alpaca_e3.py
zxazys Apr 12, 2024
3b8c2ec
Create internlm2_20b_chat_lora_alpaca_e3.py (#185)
aJupyter Apr 12, 2024
7035dc1
Merge Dev (#186)
aJupyter Apr 12, 2024
67616a8
update model config file links
chg0901 Apr 13, 2024
5dd0767
update model config file links (#188)
chg0901 Apr 13, 2024
305d4db
Revert "update model config file links"
chg0901 Apr 13, 2024
d401e8d
Revert "update model config file links" (#189)
chg0901 Apr 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ logs/
*.jsonl
*.json
*.txt
localFile/
# ./generate_data/*.josnl
# ./generate_data/*/*/*.josnl

Expand Down
663 changes: 347 additions & 316 deletions README.md

Large diffs are not rendered by default.

656 changes: 336 additions & 320 deletions README_EN.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
import os
# os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860')
os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860')
os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860')
#os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860')
Binary file added assets/EmoLLM_transparent.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/Shusheng.jpg
Binary file not shown.
Binary file added assets/Shusheng.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/aiwei_demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/aiwei_demo2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/aiwei_demo3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/aiwei_demo4.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 18 additions & 14 deletions datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

* 数据集按用处分为两种类型:**General** 和 **Role-play**
* 数据按格式分为两种类型:**QA** 和 **Conversation**
* 数据汇总:General(**6个数据集**);Role-play(**3个数据集**)
* 数据汇总:General(**6个数据集**);Role-play(**5个数据集**)

## 数据集类型

Expand All @@ -19,32 +19,36 @@
| Category | Dataset | Type | Total |
| :---------: | :-------------------: | :----------: | :-----: |
| *General* | data | Conversation | 5600+ |
| *General* | data_pro | Conversation | 36500+ |
| *General* | data_pro | Conversation | 36,500+ |
| *General* | multi_turn_dataset_1 | Conversation | 36,000+ |
| *General* | multi_turn_dataset_2 | Conversation | 27,000+ |
| *General* | single_turn_dataset_1 | QA | 14000+ |
| *General* | single_turn_dataset_2 | QA | 18300+ |
| *General* | single_turn_dataset_1 | QA | 14,000+ |
| *General* | single_turn_dataset_2 | QA | 18,300+ |
| *Role-play* | aiwei | Conversation | 4000+ |
| *Role-play* | SoulStar | QA | 11200+ |
| *Role-play* | SoulStar | QA | 11,200+ |
| *Role-play* | tiangou | Conversation | 3900+ |
| *Role-play* | mother | Conversation | 40,300+ |
| *Role-play* | scientist | Conversation | 28,400+ |
| …… | …… | …… | …… |

## 数据集来源

### **General**

* 数据集 data 来自本项目
* 数据集 data_pro 来自本项目
* 数据集 multi_turn_dataset_1 来源 [Smile](https://github.com/qiuhuachuan/smile)
* 数据集 multi_turn_dataset_2 来源 [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
* 数据集 single_turn_dataset_1 来自本项目
* 数据集 single_turn_dataset_2 来自本项目
* 数据集 `data` 来自本项目
* 数据集 `data_pro` 来自本项目
* 数据集 `multi_turn_dataset_1` 来源 [Smile](https://github.com/qiuhuachuan/smile)
* 数据集 `multi_turn_dataset_2` 来源 [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
* 数据集 `single_turn_dataset_1` 来自本项目
* 数据集 `single_turn_dataset_2` 来自本项目

### **Role-play**

* 数据集 aiwei 来自本项目
* 数据集 tiangou 来自本项目
* 数据集 SoulStar 来源 [SoulStar](https://github.com/Nobody-ML/SoulStar)
* 数据集 `aiwei` 来自本项目
* 数据集 `tiangou` 来自本项目
* 数据集 `SoulStar` 来源 [SoulStar](https://github.com/Nobody-ML/SoulStar)
* 数据集 `mother` 来自本项目
* 数据集 `scientist` 来自本项目

## 数据集去重

Expand Down
16 changes: 10 additions & 6 deletions datasets/README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

* Category of dataset: **General** and **Role-play**
* Type of data: **QA** and **Conversation**
* Summary: General(**6 datasets**), Role-play(**3 datasets**)
* Summary: General(**6 datasets**), Role-play(**5 datasets**)

## Category
* **General**: generic dataset, including psychological Knowledge, counseling technology, etc.
Expand All @@ -17,14 +17,16 @@
| Category | Dataset | Type | Total |
| :---------: | :-------------------: | :----------: | :-----: |
| *General* | data | Conversation | 5600+ |
| *General* | data_pro | Conversation | 36500+ |
| *General* | data_pro | Conversation | 36,500+ |
| *General* | multi_turn_dataset_1 | Conversation | 36,000+ |
| *General* | multi_turn_dataset_2 | Conversation | 27,000+ |
| *General* | single_turn_dataset_1 | QA | 14000+ |
| *General* | single_turn_dataset_2 | QA | 18300+ |
| *General* | single_turn_dataset_1 | QA | 14,000+ |
| *General* | single_turn_dataset_2 | QA | 18,300+ |
| *Role-play* | aiwei | Conversation | 4000+ |
| *Role-play* | SoulStar | QA | 11200+ |
| *Role-play* | SoulStar | QA | 11,200+ |
| *Role-play* | tiangou | Conversation | 3900+ |
| *Role-play* | mother | Conversation | 40,300+ |
| *Role-play* | scientist | Conversation | 28,400+ |
| …… | …… | …… | …… |


Expand All @@ -41,8 +43,10 @@
* dataset `aiwei` from this repo
* dataset `tiangou` from this repo
* dataset `SoulStar` from [SoulStar](https://github.com/Nobody-ML/SoulStar)
* dataset `mother` from this repo
* dataset `scientist` from this repo

**Dataset Deduplication**:
Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold.

https://algonotes.readthedocs.io/en/latest/Simhash.html
https://algonotes.readthedocs.io/en/latest/Simhash.html
File renamed without changes.
Loading