Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(internet): userName, email and slugify return only ascii #1554

Merged

Conversation

matthewmayer
Copy link
Contributor

@matthewmayer matthewmayer commented Nov 13, 2022

tentative fix for #1105 and #1437

before this PR is applied:

Object.keys(faker.locales).forEach(locale=>{faker.setLocale(locale); console.log(`${locale}: ${faker.internet.email()}`)})

after this PR is applied:

@codecov
Copy link

codecov bot commented Nov 13, 2022

Codecov Report

Merging #1554 (d461e30) into next (5e51335) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #1554      +/-   ##
==========================================
- Coverage   99.64%   99.63%   -0.01%     
==========================================
  Files        2221     2222       +1     
  Lines      239460   239816     +356     
  Branches     1047     1056       +9     
==========================================
+ Hits       238604   238950     +346     
- Misses        835      845      +10     
  Partials       21       21              
Impacted Files Coverage Δ
src/modules/helpers/index.ts 98.55% <100.00%> (+<0.01%) ⬆️
src/modules/internet/char-mappings.ts 100.00% <100.00%> (ø)
src/modules/internet/index.ts 100.00% <100.00%> (ø)
src/modules/internet/user-agent.ts 90.54% <0.00%> (-2.71%) ⬇️

@ST-DDT ST-DDT added c: bug Something isn't working c: feature Request for new feature p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision labels Nov 13, 2022
@ST-DDT ST-DDT added this to the v8.0 - Module Re-Shuffling milestone Nov 13, 2022
…locales, revert slugify to ascii only - add test
…locales, revert slugify to ascii only - fix test
@ST-DDT ST-DDT added the s: on hold Blocked by something or frozen to avoid conflicts label Nov 14, 2022
@ST-DDT ST-DDT linked an issue Nov 20, 2022 that may be closed by this pull request
…locales, revert slugify to ascii only - allow stripping diacritics
@matthewmayer
Copy link
Contributor Author

matthewmayer commented Nov 20, 2022

I added a further refinement to slugify which strips simple diacritics. This helps in languages like French or Vietnamese, where you have names like Adélaïde or Ngân Hà. Previously those accented letters would get removed completed, leading to email addresses like:

[email protected]
[email protected]

now instead you'd get
[email protected]
[email protected]

which seems nicer. I also added some additional tests for slugify.

@Shinigami92
Copy link
Member

Team Decision:

  • We wont allow adding a dependency to faker to convert the given parameters to related outputs
    This can be done by the user in front of the method and then just pass the converted args to the method
  • We will convert non-ascii chars to ascii chars, in a way that the input is related to the outcome, but not in a literal linguistic way
    How that is done is an implementation detail and in the hand of the PR author
  • The method internet.userName should only return ascii characters
  • A new method internet.displayName should have (kinda) the same implementation as internet.userName has right now, but we can improve e.g. the options/parameter as we are already making breaking changes in v8.0

@ST-DDT ST-DDT removed s: needs decision Needs team/maintainer decision s: on hold Blocked by something or frozen to avoid conflicts labels Nov 24, 2022
@matthewmayer
Copy link
Contributor Author

matthewmayer commented Nov 25, 2022

"We will convert non-ascii chars to ascii chars, in a way that the input is related to the outcome, but not in a literal linguistic way. How that is done is an implementation detail and in the hand of the PR author"

So I tried a new approach in the latest commit, basically reimplementing the userName function with a few strategies:

Ascii names work as before

faker.internet.userName("Matt","Mayer")
'Matt_Mayer22'

Simple accents are stripped using Unicode NFD

faker.internet.userName("Hélene","Müller")
'Helene_Muller11'

A simple mapping can be use to transliterate alphabetical languages like Cyrillic (could also add say Greek and Thai).

faker.internet.userName("Фёдор","Достоевский")
'Fedor.Dostoevskii50'

As a final fallback, just concatenate the hex unicode char codes of each character.

faker.internet.userName("大羽","陳")
'59277fbd33'

This works tolerably well but leads to some very long usernames in the final fallback scenario

af_ZA: [email protected]
ar: [email protected]
az: [email protected]
cz: [email protected]
de: [email protected]
de_AT: [email protected]
de_CH: [email protected]
dv: [email protected]
el: [email protected]
en: [email protected]
en_AU: [email protected]
en_AU_ocker: [email protected]
en_BORK: [email protected]
en_CA: [email protected]
en_GB: [email protected]
en_GH: {{person.male_first_name}}[email protected]
en_IE: [email protected]
en_IN: [email protected]
en_NG: [email protected]
en_US: [email protected]
en_ZA: [email protected]
es: [email protected]
es_MX: [email protected]
fa: [email protected]
fi: [email protected]
fr: [email protected]
fr_BE: [email protected]
fr_CA: [email protected]
fr_CH: [email protected]
ge: 10dc10dd10d310d010e0_10d110d010e010d310d010d510d410da10d810eb10d4@posta.ge
he: [email protected]
hr: [email protected]
hu: [email protected]
hy: [email protected]
id_ID: [email protected]
it: [email protected]
ja: [email protected]
ko: [email protected]
lv: [email protected]
mk: [email protected]
nb_NO: [email protected]
ne: [email protected]
nl: [email protected]
nl_BE: [email protected]
pl: [email protected]
pt_BR: [email protected]
pt_PT: [email protected]
ro: [email protected]
ru: [email protected]
sk: [email protected]
sv: [email protected]
tr: [email protected]
uk: [email protected]
ur: [email protected]
vi: [email protected]
zh_CN: [email protected]
zh_TW: [email protected]
zu_ZA: [email protected]

Copy link
Member

@ST-DDT ST-DDT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impl wise this looks good to me.
Could you add an example to the jsdocs that uses non ascii input?
Also I think it might be better if you move the char mapping to separate file and import it. Not sure whether it should be ts or json. But that is optional for now IMO.

We could do a substring if it gets too long but that's a different issue.

@Shinigami92
Copy link
Member

We also discusses that we want displayName with (for now) the old behavior.
Can we please do this in this PR as well? Otherwise I'm afraid of missing it out.
If you explicitly not want to do this in this PR, please leave a reason and also open a new issue so we can track it and not miss it before v8.0 release.

ST-DDT
ST-DDT previously approved these changes Nov 27, 2022
ejcheng
ejcheng previously approved these changes Nov 30, 2022
ST-DDT
ST-DDT previously approved these changes Nov 30, 2022
@matthewmayer matthewmayer dismissed stale reviews from ST-DDT and ejcheng via c9caafc December 1, 2022 09:41
@ST-DDT ST-DDT changed the title fix(internet): make faker.internet.userName, faker.internet.email and faker.helpers.slugify return only ascii, add faker.internet.displayName fix(internet): make userName(), email() and helpers.slugify() return only ascii, add displayName() for non-ascii Dec 2, 2022
@ST-DDT ST-DDT requested review from Shinigami92 and a team December 2, 2022 15:29
@ST-DDT ST-DDT changed the title fix(internet): make userName(), email() and helpers.slugify() return only ascii, add displayName() for non-ascii fix(internet): userName, email and slugify return only ascii Dec 3, 2022
@ST-DDT ST-DDT merged commit 4ed45fa into faker-js:next Dec 3, 2022
@ST-DDT ST-DDT added the s: accepted Accepted feature / Confirmed bug label Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: bug Something isn't working c: feature Request for new feature p: 1-normal Nothing urgent s: accepted Accepted feature / Confirmed bug
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Weird email and username in Chinese locale package
4 participants