Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future Implementations for classes - Measure, Money and Date #257

Open
wants to merge 288 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
288 commits
Select commit Hold shift + click to select a range
2f76a5c
post processor changes with minor fixes
ngachchi Nov 8, 2024
51bd4c2
removed unused imports and statements
ngachchi Nov 12, 2024
509fda6
Merge branch 'main' of https://github.com/ngachchi/NeMo-text-processi…
ngachchi Dec 3, 2024
9bc2682
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
72ae995
Implements support for minor currency denominations
zoobereq Dec 5, 2024
d0932e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
6ab1eca
added unit test cases and minor fixes
ngachchi Dec 5, 2024
39dde2e
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
730cd04
Updates the cache
zoobereq Dec 5, 2024
b0d3e63
Merge branch 'hi_tn' of https://github.com/ngachchi/NeMo-text-process…
ngachchi Dec 6, 2024
a8f1a57
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
2b9f657
removed unused english whitelist files
ngachchi Dec 6, 2024
45e5d3b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
b5d887d
reverted to previous logic
ngachchi Dec 6, 2024
a8b8b48
Jp tn 20241017 (#240)
BuyuanCui Oct 18, 2024
3f05501
Fixes issue 228 (#234)
zoobereq Oct 23, 2024
8dc2d32
Hindi TN changes
ngachchi Oct 30, 2024
72611ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
a48787c
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
a2a5acf
additional whitelist class .tsv files and unused imports removed
ngachchi Oct 30, 2024
b2d4dea
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
add48be
incorporated suggestions for unused statements and another for closin…
ngachchi Oct 30, 2024
98d657f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
2249724
Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Tim…
ngachchi Dec 6, 2024
62a6e19
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
5829bbf
[pre-commit.ci] auto fixes from pre-commit.com hooks
ngachchi Dec 6, 2024
8fc3a0c
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
fa41100
commented irrevelant references and unused snippets from whitelist an…
ngachchi Nov 5, 2024
f41357e
Whitelist and Word class changes
ngachchi Nov 7, 2024
54b9014
post processor changes with minor fixes
ngachchi Nov 8, 2024
8b1e7a6
remove space before punctuation for sparrowhawk file
ngachchi Nov 11, 2024
edad18f
minor fixes for measure class
ngachchi Nov 11, 2024
d5ae67d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
2a3ac37
Updated Jenkinsfile
ngachchi Nov 12, 2024
c27f2bb
removed unused imports and statements
ngachchi Nov 12, 2024
f845a2b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
b07e41e
updated date stamp for HI cache and commented ITN grammars
ngachchi Nov 12, 2024
36bd6d6
Updates the cache
zoobereq Nov 13, 2024
4e3b377
Disables Hindi ITN L0 checks
zoobereq Nov 13, 2024
8ce2f6a
Reapplies ITN CI Checks
zoobereq Nov 13, 2024
5428913
Adds missing inits
zoobereq Nov 13, 2024
7609c43
resolved the failing sparrowhawk test cases failed
ngachchi Nov 14, 2024
8ce9ac1
added new graph for symbols
ngachchi Nov 18, 2024
403bc7a
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money a…
ngachchi Dec 6, 2024
a25bedc
added into(x) symbol dependency for measure class
ngachchi Nov 25, 2024
f10a464
working on measure class
ngachchi Nov 26, 2024
0fe9889
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
16846e6
Hindi TN changes
ngachchi Oct 30, 2024
dbf4bae
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
b920706
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
62c6d18
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
10f4062
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
b55232f
Whitelist and Word class changes
ngachchi Nov 7, 2024
ab32266
post processor changes with minor fixes
ngachchi Nov 8, 2024
5343fe1
removed unused imports and statements
ngachchi Nov 12, 2024
cd8143b
Hindi ITN - Addition of Whitelist and Word (#248)
ngachchi Dec 6, 2024
c5e5829
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
be34788
Implements support for minor currency denominations
zoobereq Dec 5, 2024
df7ade8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
4f756ce
added unit test cases and minor fixes
ngachchi Dec 5, 2024
80b01fd
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
3ba2602
Updates the cache
ngachchi Dec 6, 2024
f20981f
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
376c681
removed unused english whitelist files
ngachchi Dec 6, 2024
0643176
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
4c0c01d
reverted to previous logic
ngachchi Dec 6, 2024
d0d6fcc
Merge branch 'hi_tn' of https://github.com/ngachchi/NeMo-text-process…
ngachchi Dec 6, 2024
4fafb67
Updates the cache
zoobereq Dec 6, 2024
8e15172
Updates the cache again
zoobereq Dec 6, 2024
0d12f29
dedh and dhai implementation approach
ngachchi Dec 16, 2024
8ee005b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 16, 2024
1c5815e
Merge branch 'main' of https://github.com/NVIDIA/NeMo-text-processing…
ngachchi Dec 16, 2024
6688b0c
Unpin setuptools (#106)
pplantinga Oct 4, 2023
befd13f
fixed warnings: File is not always closes. (#113)
XuesongYang Oct 10, 2023
b2d57c5
fix bug #111 (ar currencies) (#117)
mgrafu Oct 23, 2023
8c9ae9d
Logging clean up + IT TN fix (#118)
ekmb Oct 24, 2023
8fd5bf5
Time_IT_TN (#105)
GiacomoLeoneMaria Oct 25, 2023
7db8734
IT TN improvement on tests (#120)
mgrafu Oct 26, 2023
0236c3d
add single letter exception for roman numerals (#121)
mgrafu Oct 27, 2023
49a274b
fix broken path for nondet whitelist (#124)
mgrafu Nov 3, 2023
6a82c2c
Increase weights for serial (en TN) (#128)
anand-nv Nov 21, 2023
a4e26bc
add measures file for FR TN (#131)
mgrafu Dec 8, 2023
9ecf294
Sh jenkins (#127)
anand-nv Jan 19, 2024
04a1141
update isort - fix precommit (#138)
ekmb Feb 14, 2024
4e8272f
Armenian itn (#136)
davidks13 Feb 15, 2024
ed07944
Fix CI (#142)
ekmb Feb 29, 2024
6414dd7
Armenian TN (#137)
davidks13 Mar 13, 2024
dfe7e80
Marathi ITN (#134)
ChinmayPatil11 Mar 13, 2024
0685601
jenkins fix (#150)
tbartley94 Mar 13, 2024
8e0be0c
r0.3.0 release (#151)
ekmb Mar 13, 2024
dee2238
Fix text=line[text] to text=line[text_field] (#153)
ssh-meister Mar 19, 2024
d021146
use real string on docstring (#157)
kevsan4 Mar 30, 2024
d5e1b2b
Sh postprocess (#147)
anand-nv Apr 16, 2024
193b6da
update run_evaluate script for cased itn (#164)
mgrafu Apr 25, 2024
b9dc7a3
remove unused function from ar tn decimals (#165)
mgrafu Apr 25, 2024
661497e
ZH sentence-level TN (#112)
BuyuanCui Apr 30, 2024
ba75285
preparing release, updating change log (#168)
tbartley94 May 3, 2024
36e241d
hotfix (#169)
ekmb May 3, 2024
81a4d7a
hotfix (#170)
tbartley94 May 3, 2024
855f93c
DE TN Fixes (#177)
zoobereq Jun 6, 2024
2e78a75
Tts en tech terms (#167)
mgrafu Jun 7, 2024
1bc29f8
Normalizes the '%' sign (#180)
zoobereq Jun 7, 2024
39440f6
FR TN Fixes (#181)
zoobereq Jun 7, 2024
ebf0f76
EN TN fixes for Issue #166 (#185)
zoobereq Jul 17, 2024
1bcc0e3
IT TN Fixes for #166 (#183)
zoobereq Jul 17, 2024
90149a6
HU TN Fixes issue #166 (#184)
zoobereq Jul 18, 2024
d672c6b
Jp itn 20240221 (#141)
BuyuanCui Jul 19, 2024
42d4d68
update en tn folder to see if CI tests run - DO NOT MERGE (#199)
anand-nv Jul 24, 2024
3ccfbae
Reverts EN TN fixes for Issue #166 (#202)
zoobereq Aug 13, 2024
b6ab1cc
es and es_en changes for unified models (#143)
mgrafu Aug 14, 2024
d628098
ES TN Fixes for Issue #166 (#206)
zoobereq Aug 15, 2024
a1af523
Zh tn bug 240712 (#187)
BuyuanCui Aug 16, 2024
114538a
EN TN Fixes for Issue 166 (#207)
zoobereq Aug 19, 2024
5ea0470
Fix for nv-bug 4786175 (#213)
zoobereq Aug 21, 2024
363a145
Release commit r1.1.0 (#217)
tbartley94 Aug 21, 2024
985c582
EN TN Fixes for nv-bug 4786225 (#218)
zoobereq Aug 22, 2024
2cba565
Applies fixes for nv-bug 4786263 (#220)
zoobereq Aug 22, 2024
2423af7
Fix invalid escape sequences (#219)
TheKevJames Aug 23, 2024
0dd6f56
IT TN Fixes for Issue #166 (#221)
zoobereq Aug 26, 2024
68f1e4b
ES TN Fix for Issue #166 (#224)
zoobereq Sep 3, 2024
22fe854
Expands per/unit mappings and updates the cache (#227)
zoobereq Sep 11, 2024
4903c73
Cardinals up to a hundred trillions, timeFST and transliteration (#209)
kurt0cougar Sep 17, 2024
dfaf5b5
Fix for issue #211 (#232)
mgrafu Sep 27, 2024
de0a605
Jp itn update 240805 (#208)
BuyuanCui Oct 1, 2024
e77c696
DE TN Fix for Issue #228 (#237)
zoobereq Oct 17, 2024
0f1d1eb
Jp tn 20241017 (#240)
BuyuanCui Oct 18, 2024
8c11047
Fixes issue 228 (#234)
zoobereq Oct 23, 2024
5965050
Hindi TN changes
ngachchi Oct 30, 2024
b6e7098
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
a60c991
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
b1381d7
additional whitelist class .tsv files and unused imports removed
ngachchi Oct 30, 2024
036bf50
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
c9a0f05
incorporated suggestions for unused statements and another for closin…
ngachchi Oct 30, 2024
a67df60
Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Tim…
ngachchi Dec 6, 2024
ac09813
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
0ea2f09
[pre-commit.ci] auto fixes from pre-commit.com hooks
ngachchi Dec 6, 2024
dc6c72d
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
527cde2
commented irrevelant references and unused snippets from whitelist an…
ngachchi Nov 5, 2024
931d10c
Whitelist and Word class changes
ngachchi Nov 7, 2024
92e74b7
post processor changes with minor fixes
ngachchi Nov 8, 2024
82defee
remove space before punctuation for sparrowhawk file
ngachchi Nov 11, 2024
76128e5
minor fixes for measure class
ngachchi Nov 11, 2024
ce46d25
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
3c0be49
Updated Jenkinsfile
ngachchi Nov 12, 2024
1c44a95
removed unused imports and statements
ngachchi Nov 12, 2024
69bfde1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
3a137d5
updated date stamp for HI cache and commented ITN grammars
ngachchi Nov 12, 2024
f558919
Updates the cache
zoobereq Nov 13, 2024
505d02b
Disables Hindi ITN L0 checks
zoobereq Nov 13, 2024
5906450
Reapplies ITN CI Checks
zoobereq Nov 13, 2024
dea3320
Adds missing inits
zoobereq Nov 13, 2024
0823d39
resolved the failing sparrowhawk test cases failed
ngachchi Nov 14, 2024
10c9b3a
added new graph for symbols
ngachchi Nov 18, 2024
1e548ff
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money a…
ngachchi Dec 6, 2024
192cda2
added into(x) symbol dependency for measure class
ngachchi Nov 25, 2024
6ab8363
working on measure class
ngachchi Nov 26, 2024
7f019fe
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
f7d1b20
Hindi TN changes
ngachchi Oct 30, 2024
887fad0
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
c5250cc
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
e14a4f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
0be3010
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
dbe4162
Whitelist and Word class changes
ngachchi Nov 7, 2024
478cfd6
post processor changes with minor fixes
ngachchi Nov 8, 2024
e2edcf3
removed unused imports and statements
ngachchi Nov 12, 2024
774601e
Hindi ITN - Addition of Whitelist and Word (#248)
ngachchi Dec 6, 2024
a684297
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
870d827
Implements support for minor currency denominations
zoobereq Dec 5, 2024
f0086ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
edeab44
added unit test cases and minor fixes
ngachchi Dec 5, 2024
ffb4ffc
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
990f44b
Updates the cache
ngachchi Dec 6, 2024
a449177
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
d5517ae
removed unused english whitelist files
ngachchi Dec 6, 2024
82f5a0e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
1efe029
reverted to previous logic
ngachchi Dec 6, 2024
dff622f
Jp tn 20241017 (#240)
ngachchi Dec 17, 2024
c8773fe
Hindi TN changes
ngachchi Oct 30, 2024
85dbc98
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
6627dcb
additional whitelist class .tsv files and unused imports removed
ngachchi Oct 30, 2024
491a048
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
ee74145
incorporated suggestions for unused statements and another for closin…
ngachchi Oct 30, 2024
bc81e2c
Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Tim…
tarushi2k2 Oct 30, 2024
e5e54e9
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
3d74041
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
fa877f4
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
764db35
commented irrevelant references and unused snippets from whitelist an…
ngachchi Nov 5, 2024
a8ee790
Whitelist and Word class changes
ngachchi Nov 7, 2024
c4cde0b
post processor changes with minor fixes
ngachchi Nov 8, 2024
2910c15
remove space before punctuation for sparrowhawk file
ngachchi Nov 11, 2024
6255b80
minor fixes for measure class
ngachchi Nov 11, 2024
8274d9b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
38aa425
Updated Jenkinsfile
ngachchi Nov 12, 2024
c96cf61
removed unused imports and statements
ngachchi Nov 12, 2024
3154e58
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
0113958
updated date stamp for HI cache and commented ITN grammars
ngachchi Nov 12, 2024
ad1d973
Updates the cache
zoobereq Nov 13, 2024
aa2d4c8
Disables Hindi ITN L0 checks
zoobereq Nov 13, 2024
c095e98
Reapplies ITN CI Checks
zoobereq Nov 13, 2024
c3e8a3d
resolved the failing sparrowhawk test cases failed
ngachchi Nov 14, 2024
01d204b
added new graph for symbols
ngachchi Nov 18, 2024
b0c0153
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money a…
ngachchi Nov 18, 2024
3396ec1
added into(x) symbol dependency for measure class
ngachchi Nov 25, 2024
00d95f7
working on measure class
ngachchi Nov 26, 2024
c26ab14
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
c045a64
Hindi TN changes
ngachchi Oct 30, 2024
fe8cac1
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
d6627b4
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
9fa5e2f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
09c806c
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
10b8bee
Whitelist and Word class changes
ngachchi Nov 7, 2024
2cd6908
post processor changes with minor fixes
ngachchi Nov 8, 2024
24f58f1
removed unused imports and statements
ngachchi Nov 12, 2024
1c29ed7
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
5c67f22
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
b91f47b
Implements support for minor currency denominations
zoobereq Dec 5, 2024
ea96b7a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
7d4a896
added unit test cases and minor fixes
ngachchi Dec 5, 2024
1f2f34e
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
3d10cb6
Updates the cache
zoobereq Dec 5, 2024
f8d5c8b
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
8c94a6e
removed unused english whitelist files
ngachchi Dec 6, 2024
3adc6ff
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
e7d0fe8
reverted to previous logic
ngachchi Dec 6, 2024
faecf16
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
7f10a30
Updates the cache
zoobereq Dec 6, 2024
6e8e045
Updates the cache again
zoobereq Dec 6, 2024
166773a
dedh and dhai implementation approach
ngachchi Dec 16, 2024
1cddd06
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 16, 2024
ab94f8f
Fix space issue with ZH ITN (#244)
zoobereq Dec 10, 2024
ff881b6
contributing update (#251)
tbartley94 Dec 11, 2024
738214e
Merge branch 'hi_tn' of https://github.com/ngachchi/NeMo-text-process…
ngachchi Dec 17, 2024
1465ec1
reverted code and added zero to the hour tsv file
ngachchi Dec 18, 2024
5851784
reverted to previous logic
ngachchi Dec 6, 2024
4025556
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
483f1f4
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
ef54e78
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
c3dfcff
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 19, 2024
c54a7ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 19, 2024
0ca2774
Merge branch 'hi_tn' of https://github.com/ngachchi/NeMo-text-process…
ngachchi Dec 19, 2024
999ef8c
Date further implementation (BC, B.C.) added
ngachchi Dec 19, 2024
844ac7e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 19, 2024
d4fd55b
added date range implementation
ngachchi Dec 23, 2024
d92f502
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 23, 2024
6e248fb
working unit test cases
ngachchi Jan 9, 2025
791ae63
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 9, 2025
afb28f5
removed the conflicted test case for the instance
ngachchi Jan 13, 2025
94f3a2d
Merge branch 'hi_tn' of https://github.com/ngachchi/NeMo-text-process…
ngachchi Jan 13, 2025
6ba9ed3
Merge branch 'NVIDIA:main' into hi_tn
ngachchi Jan 15, 2025
92ce37e
updated Jenkins file
ngachchi Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pipeline {
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-17-24-1'
HI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/11-29-24-1'
HI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/01-15-25-0'
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
}
stages {
Expand Down Expand Up @@ -104,7 +104,7 @@ pipeline {
parallel {
stage('L0: Hi TN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=hi --text="१" --cache_dir ${HI_TN_CACHE}'
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --text="१" --cache_dir ${HI_TN_CACHE}'
}
}
stage('L0: Hi ITN grammars') {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ई. पू. ईसा पूर्व
ई. ईसवी
तक तक
Original file line number Diff line number Diff line change
Expand Up @@ -141,14 +141,16 @@ month महीना
months महीने
ct कैरेट
pH पीएच
km/h किलोमीटर प्रति घंटा
km/hr किलोमीटर प्रति घंटा
km/min किलोमीटर प्रति मिनट
m/h मीटर प्रति घंटा
m/hr मीटर प्रति घंटा
mi/s मील प्रति सेकंड
mi/h मील प्रति घंटा
mi/hr मील प्रति घंटा
mi/min मील प्रति मिनट
₹/ac रुपए प्रति एकड़
x बाई
X बाई
* बाई
- से
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
₹ रुपए
P पैसे
£ पाउंड
₩ वॉन
$ डॉलर
₺ लीरा
৳ टका
¥ येन
₦ नाइरा
€ यूरो
€ यूरो
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
major_minor_currencies = {
"रुपए": "पैसे",
"पाउंड": "पेंस",
"वॉन": "जिओन",
"डॉलर": "सेंट",
"लीरा": "कुरस",
"टका": "पैसे",
"येन": "सेन",
"नाइरा": "कोबो",
"यूरो": "सेंट",
}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
० शून्य
१ एक
२ दो
३ तीन
Expand Down
63 changes: 63 additions & 0 deletions nemo_text_processing/text_normalization/hi/graph_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import pynini
from pynini import Far
from pynini.examples import plurals
from pynini.export import export
from pynini.lib import byte, pynutil, utf8

Expand Down Expand Up @@ -99,6 +100,30 @@ def generator_main(file_name: str, graphs: Dict[str, 'pynini.FstLike']):
logging.info(f'Created {file_name}')


def get_plurals(fst):
"""
Given singular returns plurals

Args:
fst: Fst

Returns plurals to given singular forms
"""
return SINGULAR_TO_PLURAL @ fst


def get_singulars(fst):
"""
Given plural returns singulars

Args:
fst: Fst

Returns singulars to given plural forms
"""
return PLURAL_TO_SINGULAR @ fst


def convert_space(fst) -> 'pynini.FstLike':
"""
Converts space to nonbreaking space.
Expand All @@ -113,6 +138,44 @@ def convert_space(fst) -> 'pynini.FstLike':
return fst @ pynini.cdrewrite(pynini.cross(NEMO_SPACE, NEMO_NON_BREAKING_SPACE), "", "", NEMO_SIGMA)


def string_map_cased(input_file: str, input_case: str = INPUT_LOWER_CASED):
labels = load_labels(input_file)

if input_case == INPUT_CASED:
additional_labels = []
for written, spoken, *weight in labels:
written_capitalized = written[0].upper() + written[1:]
additional_labels.extend(
[
[written_capitalized, spoken.capitalize()], # first letter capitalized
[
written_capitalized,
spoken.upper().replace(" AND ", " and "),
], # # add pairs with the all letters capitalized
]
)

spoken_no_space = spoken.replace(" ", "")
# add abbreviations without spaces (both lower and upper case), i.e. "BMW" not "B M W"
if len(spoken) == (2 * len(spoken_no_space) - 1):
logging.debug(f"This is weight {weight}")
if len(weight) == 0:
additional_labels.extend(
[[written, spoken_no_space], [written_capitalized, spoken_no_space.upper()]]
)
else:
additional_labels.extend(
[
[written, spoken_no_space, weight[0]],
[written_capitalized, spoken_no_space.upper(), weight[0]],
]
)
labels += additional_labels

whitelist = pynini.string_map(labels).invert().optimize()
return whitelist


class GraphFst:
"""
Base class for all grammar fsts.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
# limitations under the License.

import pynini
from pynini.lib import pynutil
from pynini.lib import pynutil, rewrite

from nemo_text_processing.text_normalization.hi.graph_utils import GraphFst
from nemo_text_processing.text_normalization.hi.utils import get_abs_path
from nemo_text_processing.text_normalization.hi.graph_utils import GraphFst, insert_space
from nemo_text_processing.text_normalization.hi.utils import apply_fst, get_abs_path


class CardinalFst(GraphFst):
Expand Down
30 changes: 26 additions & 4 deletions nemo_text_processing/text_normalization/hi/taggers/date.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@

import pynini
from pynini.lib import pynutil

from nemo_text_processing.text_normalization.hi.graph_utils import (
NEMO_HI_DIGIT,
NEMO_HI_NON_ZERO,
Expand All @@ -26,6 +25,7 @@

days = pynini.string_file(get_abs_path("data/date/days.tsv"))
months = pynini.string_file(get_abs_path("data/date/months.tsv"))
year_suffix = pynini.string_file(get_abs_path("data/date/year_suffix.tsv"))


class DateFst(GraphFst):
Expand Down Expand Up @@ -62,12 +62,17 @@ def __init__(self, cardinal: GraphFst):

years_graph = pynutil.insert("year: \"") + graph_year + pynutil.insert("\"") + insert_space

graph_dd_mm = days_graph + delete_dash + months_graph
graph_dd_mm = days_graph + (delete_dash | pynini.accep("")) + months_graph

graph_mm_dd = months_graph + delete_dash + days_graph
graph_mm_dd = months_graph + (delete_dash | pynini.accep("")) + days_graph

graph_mm_dd += pynutil.insert(" preserve_order: true ")

# Graph for era
era_graph = pynutil.insert("era: \"") + year_suffix + pynutil.insert("\"") + insert_space

range_graph = pynini.cross("-", "से")

graph_dd_mm_yyyy = (
days_graph + (delete_dash | delete_slash) + months_graph + (delete_dash | delete_slash) + years_graph
)
Expand All @@ -78,7 +83,22 @@ def __init__(self, cardinal: GraphFst):

graph_mm_dd_yyyy += pynutil.insert(" preserve_order: true ")

graph_mm_yyyy = months_graph + delete_dash + years_graph
graph_mm_yyyy = (
months_graph + (delete_dash | pynini.accep("")) + years_graph + pynutil.insert(" preserve_order: true ")
)

graph_year_suffix = era_graph

graph_range = (
pynutil.insert("text: \"")
+ (cardinal.final_graph | graph_year)
+ insert_space
+ range_graph
+ insert_space
+ (cardinal.final_graph | graph_year)
+ pynutil.insert("\"")
+ pynutil.insert(" preserve_order: true ")
)

# default assume dd_mm_yyyy

Expand All @@ -88,6 +108,8 @@ def __init__(self, cardinal: GraphFst):
| pynutil.add_weight(graph_dd_mm_yyyy, -0.001)
| graph_mm_dd_yyyy
| graph_mm_yyyy
| pynutil.add_weight(graph_year_suffix, -0.001)
| pynutil.add_weight(graph_range, -0.005)
)

self.final_graph = final_graph.optimize()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
import pynini
from pynini.lib import pynutil

from nemo_text_processing.text_normalization.hi.graph_utils import GraphFst
from nemo_text_processing.text_normalization.hi.graph_utils import NEMO_NOT_QUOTE, GraphFst, insert_space
from nemo_text_processing.text_normalization.hi.taggers.cardinal import CardinalFst
from nemo_text_processing.text_normalization.hi.utils import apply_fst


class FractionFst(GraphFst):
Expand Down
30 changes: 27 additions & 3 deletions nemo_text_processing/text_normalization/hi/taggers/measure.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,20 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
)

# Define the unit handling
self.unit = pynutil.insert("units: \"") + unit_graph + pynutil.insert("\" ")
unit = pynutil.insert("units: \"") + unit_graph + pynutil.insert("\" ")

# Handling symbols like x, X, *
symbol_graph = pynini.string_map([("x", "बाई"), ("X", "बाई"), ("*", "बाई"),])

graph_measurements = (
pynutil.insert("decimal { ")
+ optional_graph_negative
+ decimal_graph
+ pynutil.insert(" }")
+ delete_space
+ self.unit
+ unit
)

graph_measurements |= (
pynutil.insert("cardinal { ")
+ optional_graph_negative
Expand All @@ -62,7 +66,27 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
+ pynutil.insert("\"")
+ pynutil.insert(" }")
+ delete_space
+ self.unit
+ unit
)

# Handling cardinal clubbed with symbol as single token
graph_measurements |= (
pynutil.insert("cardinal { ")
+ optional_graph_negative
+ pynutil.insert("integer: \"")
+ cardinal_graph
+ pynutil.insert("\"")
+ pynutil.insert(" }")
+ pynutil.insert(" units: \"")
+ symbol_graph
+ pynutil.insert("\" ")
+ pynutil.insert("} }")
+ insert_space
+ pynutil.insert("tokens { cardinal { ")
+ optional_graph_negative
+ pynutil.insert("integer: \"")
+ cardinal_graph
+ pynutil.insert("\"")
)

graph = graph_measurements
Expand Down
40 changes: 18 additions & 22 deletions nemo_text_processing/text_normalization/hi/taggers/money.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,39 +24,35 @@
class MoneyFst(GraphFst):
"""
Finite state transducer for classifying money, suppletive aware, e.g.
₹1 -> money { currency: "रुपए" integer_part: "एक" }
₹1.2 -> money { currency: "रुपए" integer_part: "एक" fractional_part: "दो" }

₹५० -> money { money { currency_maj: "रुपए" integer_part: "पचास" }
₹५०.५० -> money { currency_maj: "रुपए" integer_part: "पचास" fractional_part: "पचास" currency_min: "centiles" }
₹०.५० -> money { currency_maj: "रुपए" integer_part: "शून्य" fractional_part: "पचास" currency_min: "centiles" }
Note that the 'centiles' string is a placeholder to handle by the verbalizer by applying the corresponding minor currency denomination

Args:
cardinal: CardinalFst
decimal: DecimalFst
deterministic: if True will provide a single transduction option,
for False multiple transduction are generated (used for audio-based normalization)
"""

def __init__(self, cardinal: GraphFst, decimal: GraphFst):
def __init__(self, cardinal: GraphFst):
super().__init__(name="money", kind="classify")

cardinal_graph = cardinal.final_graph

optional_graph_negative = pynini.closure(
pynutil.insert("negative: ") + pynini.cross("-", "\"true\"") + insert_space, 0, 1,
)
self.currency = pynutil.insert("currency: \"") + currency_graph + pynutil.insert("\" ")
self.interger = pynutil.insert("integer_part: \"") + cardinal_graph + pynutil.insert("\" ")
self.fraction = pynutil.insert("fractional_part: \"") + cardinal_graph + pynutil.insert("\" ")

graph_currencies = optional_graph_negative + self.currency + insert_space + self.interger
graph_currencies |= (
optional_graph_negative
+ self.currency
+ insert_space
+ self.interger
+ pynutil.delete(".")
+ insert_space
+ self.fraction
currency_major = pynutil.insert('currency_maj: "') + currency_graph + pynutil.insert('"')
integer = pynutil.insert('integer_part: "') + cardinal_graph + pynutil.insert('"')
fraction = pynutil.insert('fractional_part: "') + cardinal_graph + pynutil.insert('"')
currency_minor = pynutil.insert('currency_min: "') + pynutil.insert("centiles") + pynutil.insert('"')

graph_major_only = currency_major + insert_space + integer
graph_major_and_minor = (
currency_major + insert_space + integer + pynini.cross(".", " ") + fraction + insert_space + currency_minor
)
graph = graph_currencies
self.graph = graph.optimize()

graph_currencies = graph_major_only | graph_major_and_minor

graph = graph_currencies.optimize()
final_graph = self.add_tokens(graph)
self.fst = final_graph
4 changes: 2 additions & 2 deletions nemo_text_processing/text_normalization/hi/taggers/time.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
# limitations under the License.

import pynini
from pynini.lib import pynutil
from pynini.lib import pynutil, rewrite

from nemo_text_processing.text_normalization.hi.graph_utils import GraphFst, insert_space
from nemo_text_processing.text_normalization.hi.utils import get_abs_path
from nemo_text_processing.text_normalization.hi.utils import apply_fst, get_abs_path

hours_graph = pynini.string_file(get_abs_path("data/time/hours.tsv"))
minutes_graph = pynini.string_file(get_abs_path("data/time/minutes.tsv"))
Expand Down
Loading