Anooyman · Anooyman · Apr 14, 2024 · Mar 27, 2024 · Mar 27, 2024 · Mar 27, 2024
diff --git a/.gitignore b/.gitignore
@@ -10,6 +10,7 @@ logs/
 *.jsonl
 *.json
 *.txt
+localFile/
 # ./generate_data/*.josnl
 # ./generate_data/*/*/*.josnl
 

diff --git a/README.md b/README.md
diff --git a/README_EN.md b/README_EN.md
diff --git a/app.py b/app.py
@@ -1,3 +1,3 @@
 import os
-# os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860')
-os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860')
+os.system('streamlit run web_internlm2.py --server.address=0.0.0.0 --server.port 7860')
+#os.system('streamlit run web_demo-aiwei.py --server.address=0.0.0.0 --server.port 7860')
diff --git a/assets/EmoLLM_transparent.png b/assets/EmoLLM_transparent.png
diff --git a/assets/Shusheng.jpg b/assets/Shusheng.jpg
diff --git a/assets/Shusheng.png b/assets/Shusheng.png
diff --git a/assets/aiwei_demo.gif b/assets/aiwei_demo.gif
diff --git a/assets/aiwei_demo2.gif b/assets/aiwei_demo2.gif
diff --git a/assets/aiwei_demo3.gif b/assets/aiwei_demo3.gif
diff --git a/assets/aiwei_demo4.gif b/assets/aiwei_demo4.gif
diff --git a/assets/model.png b/assets/model.png
diff --git a/datasets/README.md b/datasets/README.md
@@ -2,7 +2,7 @@
 
 * 数据集按用处分为两种类型：**General** 和 **Role-play**
 * 数据按格式分为两种类型：**QA** 和 **Conversation**
-* 数据汇总：General（**6个数据集**）；Role-play（**3个数据集**）
+* 数据汇总：General（**6个数据集**）；Role-play（**5个数据集**）
 
 ## 数据集类型
 
@@ -19,32 +19,36 @@
 |   Category  |        Dataset        |     Type     |  Total  |
 | :---------: | :-------------------: | :----------: | :-----: |
 |  *General*  |         data          | Conversation |  5600+  |
-|  *General*  |       data_pro        | Conversation | 36500+  |
+|  *General*  |       data_pro        | Conversation | 36,500+ |
 |  *General*  | multi_turn_dataset_1  | Conversation | 36,000+ |
 |  *General*  | multi_turn_dataset_2  | Conversation | 27,000+ |
-|  *General*  | single_turn_dataset_1 |      QA      | 14000+  |
-|  *General*  | single_turn_dataset_2 |      QA      | 18300+  |
+|  *General*  | single_turn_dataset_1 |      QA      | 14,000+ |
+|  *General*  | single_turn_dataset_2 |      QA      | 18,300+ |
 | *Role-play* |         aiwei         | Conversation |  4000+  |
-| *Role-play* |       SoulStar        |      QA      | 11200+  |
+| *Role-play* |       SoulStar        |      QA      | 11,200+ |
 | *Role-play* |        tiangou        | Conversation |  3900+  |
+| *Role-play* |        mother         | Conversation | 40,300+ |
+| *Role-play* |       scientist       | Conversation | 28,400+ |
 |     ……      |          ……           |      ……      |   ……    |
 
 ## 数据集来源
 
 ### **General**
 
-* 数据集 data 来自本项目
-* 数据集 data_pro 来自本项目
-* 数据集 multi_turn_dataset_1 来源 [Smile](https://github.com/qiuhuachuan/smile)
-* 数据集 multi_turn_dataset_2 来源 [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
-* 数据集 single_turn_dataset_1 来自本项目
-* 数据集 single_turn_dataset_2 来自本项目
+* 数据集 `data` 来自本项目
+* 数据集 `data_pro` 来自本项目
+* 数据集 `multi_turn_dataset_1` 来源 [Smile](https://github.com/qiuhuachuan/smile)
+* 数据集 `multi_turn_dataset_2` 来源 [CPsyCounD](https://github.com/CAS-SIAT-XinHai/CPsyCoun)
+* 数据集 `single_turn_dataset_1` 来自本项目
+* 数据集 `single_turn_dataset_2` 来自本项目
 
 ### **Role-play**
 
-* 数据集 aiwei 来自本项目
-* 数据集 tiangou 来自本项目
-* 数据集 SoulStar 来源 [SoulStar](https://github.com/Nobody-ML/SoulStar)
+* 数据集 `aiwei` 来自本项目
+* 数据集 `tiangou` 来自本项目
+* 数据集 `SoulStar` 来源 [SoulStar](https://github.com/Nobody-ML/SoulStar)
+* 数据集 `mother` 来自本项目
+* 数据集 `scientist` 来自本项目
 
 ## 数据集去重
 

diff --git a/datasets/README_EN.md b/datasets/README_EN.md
@@ -2,7 +2,7 @@
 
 * Category of dataset: **General** and **Role-play**
 * Type of data: **QA** and **Conversation**
-* Summary: General(**6 datasets**), Role-play(**3 datasets**)
+* Summary: General(**6 datasets**), Role-play(**5 datasets**)
 
  ## Category
 * **General**: generic dataset, including psychological Knowledge, counseling technology, etc.
@@ -17,14 +17,16 @@
 |   Category  |        Dataset        |     Type     |  Total  |
 | :---------: | :-------------------: | :----------: | :-----: |
 |  *General*  |         data          | Conversation |  5600+  |
-|  *General*  |       data_pro        | Conversation | 36500+  |
+|  *General*  |       data_pro        | Conversation | 36,500+ |
 |  *General*  | multi_turn_dataset_1  | Conversation | 36,000+ |
 |  *General*  | multi_turn_dataset_2  | Conversation | 27,000+ |
-|  *General*  | single_turn_dataset_1 |      QA      | 14000+  |
-|  *General*  | single_turn_dataset_2 |      QA      | 18300+  |
+|  *General*  | single_turn_dataset_1 |      QA      | 14,000+ |
+|  *General*  | single_turn_dataset_2 |      QA      | 18,300+ |
 | *Role-play* |         aiwei         | Conversation |  4000+  |
-| *Role-play* |       SoulStar        |      QA      | 11200+  |
+| *Role-play* |       SoulStar        |      QA      | 11,200+ |
 | *Role-play* |        tiangou        | Conversation |  3900+  |
+| *Role-play* |        mother         | Conversation | 40,300+ |
+| *Role-play* |       scientist       | Conversation | 28,400+ |
 |     ……      |          ……           |      ……      |   ……    |
 
 
@@ -41,8 +43,10 @@
 * dataset `aiwei` from this repo
 * dataset `tiangou` from this repo
 * dataset `SoulStar` from [SoulStar](https://github.com/Nobody-ML/SoulStar)
+* dataset `mother` from this repo
+* dataset `scientist` from this repo
 
 **Dataset Deduplication**：
 Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold.
 
-https://algonotes.readthedocs.io/en/latest/Simhash.html
+https://algonotes.readthedocs.io/en/latest/Simhash.html
diff --git a/datasets/mother.json → datasets/mother_v1.json b/datasets/mother.json → datasets/mother_v1.json
-Original file line number
+Diff line change
@@ Expand Up / @@ -10,6 +10,7 @@ logs/ @@
     *.jsonl
     *.json
     *.txt
+    localFile/
     # ./generate_data/*.josnl
     # ./generate_data/*/*/*.josnl
@@ Expand Down @@