You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace random word with a phonetically similar one.
Or also replace a random word with the same Part Of Speech or lemma (adjective with adjective or run with ran / running etc...)
Motivation
I'm training a transformer based model to spell check utterances (like a reversed augly).
Like Hello r u fin tdy => Hello are you fine today.
I realized that quite often the spelling errors come from phonetically similar words
exemple (not so good exemple but for the sake of the explanation) : "I love jeans" vs "I love gins"
Also, augmenting by replacing with sane pos or other inflections of the same lemma would help in the same direction (as better destroying the sentences to train a better spellchecking model)
Having this kind of built-in Augmentation would help building better models.
Pitch
Having a built-in augmenter that create mistakes not only with levensthein like distances but uses phonetics.
I've done mine using epitran for phonetics and spacy for pos but other frameworks exists.
Alternatives
Implement my own augmenter (done).
Use only text based distances which cannot find jean vs gin or cute vs beautiful or run vs running as they are textually too different but often found in chats.
The text was updated successfully, but these errors were encountered:
ierezell
changed the title
Phonetic similarity
Other similarities
Jul 5, 2021
ierezell
changed the title
Other similarities
Phonetics similarities
Jul 5, 2021
ierezell
changed the title
Phonetics similarities
Other similarities
Jul 5, 2021
Hi @ierezell! Thank you for all the awesome enhancements you're suggesting! This kind of augmentation is actually something we've talked about building internally, as these are very common misspellings that occur in the wild!
I'll take a look at the epitran library and see how we can support this!
🚀 Feature
Replace random word with a phonetically similar one.
Or also replace a random word with the same Part Of Speech or lemma (adjective with adjective or run with ran / running etc...)
Motivation
I'm training a transformer based model to spell check utterances (like a reversed augly).
Like
Hello r u fin tdy => Hello are you fine today
.I realized that quite often the spelling errors come from phonetically similar words
exemple (not so good exemple but for the sake of the explanation) :
"I love jeans"
vs"I love gins"
Also, augmenting by replacing with sane pos or other inflections of the same lemma would help in the same direction (as better destroying the sentences to train a better spellchecking model)
Having this kind of built-in Augmentation would help building better models.
Pitch
Having a built-in augmenter that create mistakes not only with levensthein like distances but uses phonetics.
I've done mine using epitran for phonetics and spacy for pos but other frameworks exists.
Alternatives
Implement my own augmenter (done).
Use only text based distances which cannot find
jean
vsgin
orcute
vsbeautiful
orrun
vsrunning
as they are textually too different but often found in chats.The text was updated successfully, but these errors were encountered: