Skip to content

Commit

Permalink
Fixes issue 228 (NVIDIA#234)
Browse files Browse the repository at this point in the history
* Fixes issue 228

Signed-off-by: Simon Zuberek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes issue 228

Signed-off-by: Simon Zuberek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Expands Whitelist and removes redundant [0-9]+-[A-Za-z] (and in reverse) pattern matching from ELECTRONIC

Signed-off-by: Simon Zuberek <[email protected]>

* Updates the cache

Signed-off-by: Simon Zuberek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removes unused imports and variables

Signed-off-by: Simon Zuberek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removes redundant abbreviation mappings

Signed-off-by: Simon Zuberek <[email protected]>

* Updates the cache

Signed-off-by: Simon Zuberek <[email protected]>

---------

Signed-off-by: Simon Zuberek <[email protected]>
Co-authored-by: Simon Zuberek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Namrata Gachchi <[email protected]>
  • Loading branch information
3 people authored and ngachchi committed Dec 6, 2024
1 parent a8b8b48 commit 3f05501
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 2 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ pipeline {
environment {

AR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/04-24-24-0'
DE_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-14-24-0'
DE_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-23-24-0'
EN_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/09-04-24-0'
ES_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/09-25-24-0'
ES_EN_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/08-30-24-0'
Expand Down
14 changes: 14 additions & 0 deletions nemo_text_processing/text_normalization/de/data/whitelist.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,17 @@ Mr. mister
Mrs. misses
Ms. miss
Nr. nummer
2D zwei-D
2-D zwei-D
3D drei-D
3-D drei-D
3-D-Mammogram drei-D-Mammogram
3D-Mammogram drei-D-Mammogram
2-D-Mammogram zwei-D-Mammogram
2D-Mammogram zwei-D-Mammogram
3-D-Mammographie drei-D-Mammographie
3D-Mammographie drei-D-Mammographie
2-D-Mammographie zwei-D-Mammographie
2D-Mammographie zwei-D-Mammographie
3-D-Drucker drei-D-Drucker
3D-Drucker drei-D-Drucker
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

from nemo_text_processing.text_normalization.de.utils import get_abs_path
from nemo_text_processing.text_normalization.en.graph_utils import (
NEMO_ALPHA,
NEMO_NOT_QUOTE,
NEMO_SIGMA,
NEMO_SPACE,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,8 @@ w w w punkt amazon punkt com punkt de.~www.amazon.com.de.
h t t p s doppelpunkt slash slash w w w punkt a b c punkt com slash a b fragezeichen gleichheitszeichen drei bindestrich slash a b s slash eins~https://www.abc.com/ab?=3-/abs/1
at j e n s e n~@jensen
at j e n s e n punkt m [email protected]
at w e z y r eins neun acht sechs~@wezyr1986
at w e z y r eins neun acht sechs~@wezyr1986
zwei-D-Mammogram~2D-Mammogram
zwei-D-Mammogram~2-D-Mammogram
drei-D-Drucker~3D-Drucker
drei-D-Drucker~3-D-Drucker

0 comments on commit 3f05501

Please sign in to comment.