Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/kornai/4lang
Browse files Browse the repository at this point in the history
  • Loading branch information
kornai committed Apr 28, 2022
2 parents 7485763 + 19911b7 commit bb40128
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 5 deletions.
46 changes: 46 additions & 0 deletions V2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# The parser

```
usage: def_ply_parser.py [-h] -i INPUT_FILE -o OUTPUT_DIR [-f {4lang,def,column}] [-c CLAUSE]
def_ply_parser.py -i <inputfile> -o <outputdir> -f <format> -c <clause>
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
The input file, should be a tsv
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
The output directory, where the processed files will be stored
-f {4lang,def,column}, --format {4lang,def,column}
Choose the process mode. 4lang expects the full column list, def only excpets a single column with the definitions, column expects 2 columns: the words
itself and the definitions
-c CLAUSE, --clause CLAUSE
The clause you want to filter the definitions with
```

## Dependencies

You need to install the _ply_ parser:
```
pip install ply
```

## Usage

To simply parse the _700.tsv_ file just run:
```
python def_ply_parser.py -i 700.tsv -o output
```

The _output_ folder will contain all the processed files. The files are the following:
- __4lang_def_correct__: will contain all the correct lines
- __4lang_def_correct_filtered__: if --clause is provided, the file will contain lines filtered by that clause
- __4lang_def_correct_substituted__: contains the substituted definitions
- __4lang_def_correct_substituted_top_level__: contains top level definitions, one by a line
- __4lang_def_errors__: contains the lines with parser errors
- __top_level_clauses__: prints out the top level clauses

The __4lang_def_correct_substituted_top_level__ file will contain only the splitted definitions. You can also use the parser to parse this file:
```
python def_ply_parser.py -i output/4lang_def_correct_substituted_top_level -o output -f def
```
File renamed without changes.
34 changes: 29 additions & 5 deletions Reform/def_ply_parser.py → V2/def_ply_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,11 +398,35 @@ def get_args():
parser = argparse.ArgumentParser(
description="def_ply_parser.py -i <inputfile> -o <outputdir> -f <format> -c <clause>"
)
parser.add_argument("-i", "--input-file", type=str, required=True)
parser.add_argument("-o", "--output-dir", type=str, required=True)
parser.add_argument("-f", "--format", type=str, default="4lang")
parser.add_argument("-c", "--clause", type=str, default=None)
# parser.add_argument("-b", "--binaries", type=str, required=True)
parser.add_argument(
"-i",
"--input-file",
type=str,
required=True,
help="The input file, should be a tsv",
)
parser.add_argument(
"-o",
"--output-dir",
type=str,
required=True,
help="The output directory, where the processed files will be stored",
)
parser.add_argument(
"-f",
"--format",
type=str,
default="4lang",
choices=["4lang", "def", "column"],
help="Choose the process mode. 4lang expects the full column list, def only excpets a single column with the definitions, column expects 2 columns: the words itself and the definitions",
)
parser.add_argument(
"-c",
"--clause",
type=str,
default=None,
help="The clause you want to filter the definitions with",
)
return parser.parse_args()


Expand Down

0 comments on commit bb40128

Please sign in to comment.