Skip to content

luxinyu1/Chinese-LS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

94db191 · Sep 3, 2022

History

12 Commits
Oct 19, 2021
Sep 3, 2022
Sep 23, 2021
Sep 13, 2020
Aug 27, 2020
Aug 27, 2020
Sep 13, 2020
Aug 27, 2020
Sep 23, 2021
Aug 27, 2020
Oct 19, 2021
Oct 19, 2021
Sep 23, 2021
Sep 23, 2021
Oct 19, 2021
Sep 23, 2021
Aug 27, 2020
Sep 23, 2021
Sep 23, 2021
Sep 23, 2021
Sep 6, 2020
Sep 13, 2020
Sep 23, 2021
Sep 23, 2021
Sep 23, 2021
Sep 23, 2021
Sep 6, 2020
Oct 19, 2021
Sep 13, 2020

Repository files navigation

Chinese-LS Logo

English|简体中文

What is Chinese-LS?

Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning. Chinese-LS is the first attempt in the field of Chinese Lexical Simplification. It includes a high-quality benchmark dataset and five baseline approaches:

  • Synonym dictionary-based approach

  • Word embedding-based approach

  • Pretrained language model-based approach

  • Sememe-based approach

  • Hybrid approach

The entire framework of Chinese-LS is shown below:

Chinese-LS Framework

Quick start

Requirements

  • Python==3.7.6
  • transformers==3.5.0
  • numpy==1.18.1
  • jieba==0.42.1
  • torch==1.4.0
  • OpenHowNet==0.0.1a11
  • gensim==3.8.2

You can find the complete requirements here.

Preparations

Download Pretrained Models

Chinese-LS uses the following pretrained models:

Please place the models under the ./model directory after downloading.

Run

We have already executed the codes for you and intermediate results can be found in ./data.

You could check out the details of codes and algorithms from our paper: Chinese Lexical Simplification

If you want to run the codes for reproduction, please execute them in the following order:

Generate

  1. Synonym dictionary based-approach

    Run dict_generate.py

  2. Word embedding based-approach

    Run vector_generate.py

  3. Pretrained language model based-approach

    Run bert_generate.sh

  4. Sememe based-approach

    Run hownet_generate.py

  5. Hybrid approach

    Run hybrid_approach.py

Select

Run substitute_selection.py

Rank

Run substitute_ranking.py

Experiments

Chinese-LS designs 5 experiments to evaluate the quality of our dataset and the performance of five approaches. You could get the experiment results through running experiment.py.

Citation

@article{qiang2021chinese,
    title={Chinese Lexical Simplification},
    author={Qiang, Jipeng and Lu, Xinyu and Li, Yun and Yuan, Yun-Hao and Wu, Xindong},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    year={2021},
    volume={29},
    pages={1819-1828},
    doi={10.1109/TASLP.2021.3078361},
    publisher={IEEE}
}

Contact

This repo may still contain bugs and we are working on improving the reproductivity. Welcome to open an issue or submit a Pull Request to report/fix the bugs.

Email: luxinyu12345@foxmail.com

License

Chinese-LS is under the Apache License, Version 2.0.

About

A dataset and baselines for CLS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published