Treatment of the "tussenvoegsel", the family name affix in the Dutch locale #1777
Closed
9 of 10 tasks
Labels
c: bug
Something isn't working
c: locale
Permutes locale definitions
p: 1-normal
Nothing urgent
s: needs decision
Needs team/maintainer decision
s: on hold
Blocked by something or frozen to avoid conflicts
Milestone
Pre-Checks
Describe the bug
TL;DR:
Longer version:
I'm following up from a discussion on Discord and creating an issue as requested.
Dutch family names often have an affix, a.k.a. "tussenvoegsel" (https://en.wikipedia.org/wiki/Tussenvoegsel), such as "van", "de", "van der" and a few other variants. It's very common, in this list of top 100 Dutch family names, roughly 30% have an affix (See: https://nl.wikipedia.org/wiki/Lijst_van_meest_voorkomende_achternamen_van_Nederland).
There used to be support for this in faker, but this no longer works. If I set the locale to 'nl' and generate names with
faker.name.fullName()
, you never get any name with an affix.The way faker used to do this, wasn't really realistic. Affixes are part of the family name. The way the old version of faker would merge a random affixless family name with a random affix isn't really realistic. "van der Meer" (lit: from the lake) is a valid Dutch name, "Kuipers" (cooper) as well. "van der Kuipers" (from the cooper???) is nonsensical.
You're probably wondering why the affix is important anyway, if they're really just part of the family name. There is one simple answer: sorting. In a list of names, the affix is ignored when sorting. Dutch administrative systems (from governmental systems to the junior football club spreadsheet) track the affix separately, solely for this reason.
New code being prepared (https://github.com/matthewmayer/faker/tree/chore/fullname-name-patterns) treats the affix as a middle name. This approach has a certain advantage: the affix can be extracted separately, which makes it easier to write some custom sorting code according to the Dutch sorting rules. But it also has disadvantages. Firstly, Dutch persons can have middle names too. Secondly, you still have the problem that the random affix doesn't necessarily match the random family name.
Proposal:
We can take a much simpler route: make the affix part of the family name, put them all together in
locales/nl/person/last_name.ts
.I'm preparing a PR to this effect.
Upside is that it's really simple.
Downside is that you need to extract the affix with a regular expression if you want to sort those names.
Alternative:
The fully proper solution is to create an extra field, called 'affix' or 'tussenvoegsel' for Dutch names, and pick them from an array of affix + surname pairs. I think this solution would be overkill for the purposes of Faker.
Minimal reproduction code
import { faker } from '@faker-js/faker';
faker.locale = 'nl';
let affixCount = 0;
for (let i = 0; i < 1000; ++i) {
const name = faker.name.fullName();
if (name.includes(' van ') || name.includes(' de ')) { affixCount++; }
}
console.log(affixCount); // -> prints 0
Additional Context
Tested with faker 7.6.0
Environment Info
Which module system do you use?
Used Package Manager
npm
The text was updated successfully, but these errors were encountered: