Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treatment of the "tussenvoegsel", the family name affix in the Dutch locale #1777

Closed
9 of 10 tasks
amarillion opened this issue Jan 24, 2023 · 1 comment · Fixed by #1778
Closed
9 of 10 tasks

Treatment of the "tussenvoegsel", the family name affix in the Dutch locale #1777

amarillion opened this issue Jan 24, 2023 · 1 comment · Fixed by #1778
Labels
c: bug Something isn't working c: locale Permutes locale definitions p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision s: on hold Blocked by something or frozen to avoid conflicts
Milestone

Comments

@amarillion
Copy link
Contributor

Pre-Checks

Describe the bug

TL;DR:

  1. Dutch affixes are currently broken
  2. It doesn't make sense to generate random(affix) + random(family name without affix)
  3. Current PR using middle name to track affix has advantages and disadvantages.
  4. Proposal: keep affix + family names together, and remove the concept entirely from faker.

Longer version:

I'm following up from a discussion on Discord and creating an issue as requested.

Dutch family names often have an affix, a.k.a. "tussenvoegsel" (https://en.wikipedia.org/wiki/Tussenvoegsel), such as "van", "de", "van der" and a few other variants. It's very common, in this list of top 100 Dutch family names, roughly 30% have an affix (See: https://nl.wikipedia.org/wiki/Lijst_van_meest_voorkomende_achternamen_van_Nederland).

There used to be support for this in faker, but this no longer works. If I set the locale to 'nl' and generate names with faker.name.fullName(), you never get any name with an affix.

The way faker used to do this, wasn't really realistic. Affixes are part of the family name. The way the old version of faker would merge a random affixless family name with a random affix isn't really realistic. "van der Meer" (lit: from the lake) is a valid Dutch name, "Kuipers" (cooper) as well. "van der Kuipers" (from the cooper???) is nonsensical.

You're probably wondering why the affix is important anyway, if they're really just part of the family name. There is one simple answer: sorting. In a list of names, the affix is ignored when sorting. Dutch administrative systems (from governmental systems to the junior football club spreadsheet) track the affix separately, solely for this reason.

New code being prepared (https://github.com/matthewmayer/faker/tree/chore/fullname-name-patterns) treats the affix as a middle name. This approach has a certain advantage: the affix can be extracted separately, which makes it easier to write some custom sorting code according to the Dutch sorting rules. But it also has disadvantages. Firstly, Dutch persons can have middle names too. Secondly, you still have the problem that the random affix doesn't necessarily match the random family name.

Proposal:

We can take a much simpler route: make the affix part of the family name, put them all together in locales/nl/person/last_name.ts.
I'm preparing a PR to this effect.

Upside is that it's really simple.
Downside is that you need to extract the affix with a regular expression if you want to sort those names.

Alternative:

The fully proper solution is to create an extra field, called 'affix' or 'tussenvoegsel' for Dutch names, and pick them from an array of affix + surname pairs. I think this solution would be overkill for the purposes of Faker.

Minimal reproduction code

import { faker } from '@faker-js/faker';

faker.locale = 'nl';
let affixCount = 0;
for (let i = 0; i < 1000; ++i) {
const name = faker.name.fullName();
if (name.includes(' van ') || name.includes(' de ')) { affixCount++; }
}
console.log(affixCount); // -> prints 0

Additional Context

Tested with faker 7.6.0

Environment Info

System:
    OS: Linux 5.15 Ubuntu 22.04.1 LTS 22.04.1 LTS (Jammy Jellyfish)
    CPU: (16) x64 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    Memory: 18.21 GB / 31.08 GB
    Container: Yes
    Shell: 5.1.16 - /bin/bash
  Binaries:
    Node: 16.15.1 - ~/.n/bin/node
    npm: 8.11.0 - ~/.n/bin/npm
  Browsers:
    Chromium: 109.0.5414.74
    Firefox: 109.0

Which module system do you use?

  • CJS
  • ESM

Used Package Manager

npm

@amarillion amarillion added c: bug Something isn't working s: pending triage Pending Triage labels Jan 24, 2023
@ejcheng ejcheng added s: needs decision Needs team/maintainer decision c: locale Permutes locale definitions and removed s: pending triage Pending Triage labels Jan 25, 2023
@matthewmayer
Copy link
Contributor

"van der Meer" (lit: from the lake) is a valid Dutch name, "Kuipers" (cooper) as well. "van der Kuipers" (from the cooper???) is nonsensical.

This seems the critical part, since only certain combinations are logical/common, it makes sense to just put the affixes directly in the surname file. So this PR seems like the best strategy.

Can I suggest we merge #1637 first, and then follow up with additional PRs for locale-specific corrections like this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: bug Something isn't working c: locale Permutes locale definitions p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision s: on hold Blocked by something or frozen to avoid conflicts
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants