Translated korean_hate_speech.md (#139)

ko-nlp · Oct 18, 2020 · 6cab56f · 6cab56f
1 parent ca1b3cb
commit 6cab56f
Showing 1 changed file with 31 additions and 33 deletions.
diff --git a/en-docs/corpuslist/korean_hate_speech.md b/en-docs/corpuslist/korean_hate_speech.md
@@ -2,10 +2,10 @@
 sort: 3
 ---
 
-# 한국어 혐오 데이터셋
+# Korean Hate Speech Dataset
 
-한국어 혐오 데이터셋은 inmoonlight@github, warnikchow@github, beomi@github 님이 만드신 데이터입니다.
-데이터 정보는 다음과 같습니다.
+Korean Hate Speech Dataset is created by inmoonlight@github, warnikchow@github, and beomi@github.
+Data specification is as follows:
 
 - author: inmoonlight@github, warnikchow@github, beomi@github
 - repository: https://github.com/kocohub/korean-hate-speech
@@ -15,60 +15,59 @@ sort: 3
   - test: 974 examples
   - unlabeled: 2,033,893 examples
 
-데이터 구조는 다음과 같습니다.
+Data structure is as:
 
-|속성명|내용|
+|Attributes|Property|
 |---|---|
-|text|뉴스 댓글|
-|title/pair|뉴스 제목|
-|gender_bias|성적 차별 여부(True/False)|
-|bias|차별 종류(종교 인종 나이 외모 등)|
-|hate|특정 계층 혐오 여부(hate/none)|
+|text|Comments|
+|title/pair|Title of article|
+|gender_bias|Presence of gender-related bias (True/False)|
+|bias|Type of bias (gender-related/other/none)|
+|hate|Toxicity of hateful expressions (hate/offensive/none)|
 
 
-## 1. 파이썬에서 사용하기
+## 1. In Python
 
-파이썬 콘솔을 실행한 뒤 말뭉치를 내려받고 읽어들일 수 있습니다.
+Execute Python console, download the corpus, and read it.
 
-### 말뭉치 다운로드
+### Downloading the corpus
 
-한국어 혐오 데이터셋을 로컬에 내려 받는 파이썬 예제는 다음과 같습니다.
+You can download the Korean Hate Speech Corpus in the local by the following procedure.
 
 ```python
 from Korpora import Korpora
 Korpora.fetch("korean_hate_speech")
 ```
 
 ```note
-기본적으로 사용자의 로컬 컴퓨터 루트 하위의 Korpora라는 디렉토리에 말뭉치를 내려 받습니다(`~/Korpora`). 다른 경로에 말뭉치를 다운로드 받고 싶다면 
-fetch 함수 실행시 `root_dir=custom_path`라는 인자를 추가하세요.
+First, download the corpus to Korpora, a directory under the user's local computer root (`~/Korpora`). If you want to download it in other path, please assign `root_dir=custom_path` when you execute fetch function.
 ```
 
 ```tip
-fetch 함수 실행시 `force_download=True`라는 인자를 줄 경우 해당 말뭉치가 이미 로컬에 있더라도 이를 무시하고 다시 내려 받습니다. 기본값은 `False`입니다.
+If you assign `force_download=True` when you execute the fetch function, the corpus is downloaded again regardless of its presence in the local. The default is `False`.
 ```
 
 
-### 말뭉치 읽어들이기
+### Reading the corpus
 
-한국어 혐오 데이터셋을 파이썬 콘솔에서 읽어들이는 예제는 다음과 같습니다.
-말뭉치가 로컬에 없다면 다운로드도 함께 수행합니다.
+You can read the Korean Hate Speech Corpus in Python console with the following scheme.
+If the corpus is not in the local, the downloading is accompanied.
 
 ```python
 from Korpora import Korpora
 corpus = Korpora.load("korean_hate_speech")
 ```
 
-다음과 같이 실행해도 한국어 혐오 데이터셋을 읽어들일 수 있습니다.
-수행 결과는 위의 코드와 동일합니다.
+You can read the Korean Hate Speech Corpus as below;
+the result is the same as the above operation.
 
 ```python
 from Korpora import KoreanHateSpeechKorpus
 corpus = KoreanHateSpeechKorpus()
 ```
 
-위 코드 둘 중 하나를 택해 실행하면 `corpus`라는 변수에 말뭉치를 로드합니다.
-`train`은 한국어 혐오 데이터셋의 train 데이터로 첫번째 인스턴스는 다음과 같이 확인할 수 있습니다.
+Execute one of the above, and the copus is assigned to the variable `corpus`.
+`train` denotes the train data of Korean Hate Speech Corpus, and you can check the first instance as:
 
 ```
 >>> corpus.train[0]
@@ -85,7 +84,7 @@ others
 hate
 ```
 
-`dev`, `test`, `unlabeled`는 각각 한국어 혐오 데이터셋의 dev, test, unlabeled 데이터로 첫번째 인스턴스는 다음과 같이 확인할 수 있습니다.
+`dev`, `test`, `unlabeled` denote dev, test, and unlabeled data of Korean Hate Speech Corpus, and you can check the first instance as:
 
 ```
 >>> korean_hate_speech.dev[0]
@@ -96,14 +95,14 @@ SentencePair(text='ㅋㅋㅋㅋ 그래도 조아해주는 팬들 많아서 좋
 SentencePair(text='"[단독] 지드래곤♥이주연, 제주도 데이트...', pair='"[단독] 지드래곤♥이주연, 제주도 데이트...')
 ```
 
-`get_all_texts`라는 메소드를 실행하면 한국어 혐오 데이터셋의 모든 text(뉴스 댓글)를 확인할 수 있습니다.
+The method `get_all_texts` lets you check all the texts (news comments) in Korean Hate Speech Corpus.
 
 ```
 >>> corpus.get_all_texts()
 ['송중기 시대극은 믿고본다. 첫회 신선하고 좋았다.', ... ]
 ```
 
-`train`, `dev`, `test`, `unlabeled` 각각에 대해서도 `get_all_texts` 메소드 실행이 가능합니다.
+You can also execute `get_all_texts` for `train`, `dev`, `test`, and `unlabeled` each.
 
 ```
 >>> corpus.train.get_all_texts()
@@ -117,20 +116,19 @@ SentencePair(text='"[단독] 지드래곤♥이주연, 제주도 데이트...',
 ```
 
 
-## 2. 터미널에서 사용하기
+## 2. In terminal
 
-파이썬 콘솔 실행 없이 바로 말뭉치를 다운받을 수 있습니다.
-다음과 같이 실행하면 됩니다.
+You can download the corpus without executing Python console.
+The command is as below.
 
 ```bash
 korpora fetch --corpus korean_hate_speech
 ```
 
 ```note
-기본적으로 사용자의 로컬 컴퓨터 루트 하위의 Korpora라는 디렉토리에 말뭉치를 내려 받습니다(`~/Korpora`). 다른 경로에 말뭉치를 다운로드 받고 싶다면 
-터미널에서 fetch 함수 실행시 `--root_dir custom_path`라는 인자를 추가하세요.
+First, download the corpus to Korpora, a directory under the user's local computer root (`~/Korpora`). If you want to download it in other path, please assign `--root_dir custom_path` when you execute fetch function in the terminal.
 ```
 
 ```tip
-터미널에서 fetch 함수 실행시 `--force_download`라는 인자를 줄 경우 해당 말뭉치가 이미 로컬에 있더라도 이를 무시하고 다시 내려 받습니다.
+If you assign `--force_download` when you execute fetch function in the terminal, the corpus is downloaded again regardless of its presence in the local. 
 ```