Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title-case fake names #124

Merged
merged 3 commits into from
Apr 24, 2023
Merged

Title-case fake names #124

merged 3 commits into from
Apr 24, 2023

Conversation

zmbc
Copy link
Collaborator

@zmbc zmbc commented Apr 21, 2023

Title-case fake names

Description

  • Category: bugfix
  • JIRA issue: none

All-caps fake names made them easy to distinguish from our "real" simulated names. This change makes them blend in more.

Testing

Manually tested noising a Census with fully fake names:

>>> psp.generate_decennial_census(config={'decennial_census': {'column_noise': {'first_name': {'use_fake_name': {'cell_probability': 1}}, 'last_name': {'use_fake_name': {'cell_probability': 1}}}}})
      simulant_id     first_name middle_initial      last_name  age date_of_birth street_number           street_name unit_number     city state zipcode relation_to_reference_person     sex race_ethnicity                                                                
0             0_2          Child              L              H   26    08/05/1993         10233  north burgher avenue         NaN  Anytown    US   00000             Reference person  Female          White
1             0_3              W              C  Lady Of House   26    12/29/1993         10233  north burgher avenue         NaN  Anytown    US   00000               Other relative  Female          White
2           0_923             Mr              E              T   77    06/29/1942       147-153          browning ave         NaN  Anytown    US   00000             Reference person    Male          Black
3          0_2641              A              T            NaN   59    10/70/1960        109              stallion st         NaN  Anytown    US   00000             Reference person  Female          White
4          0_2801         Person              A              A   73    12/05/1946           214           s vine lane         NaN  Anytown    US   00000             Reference person    Male          White
...           ...            ...            ...            ...  ...           ...           ...                   ...         ...      ...   ...     ...                          ...     ...            ...
10265     0_19008            Mom              G              C   56    06/12/1963          1113     times square blvd         NaN  Anytown    US   00000             Reference person    Male          Black
10266     0_20161     Male Child              D             Na  NaN    11/09/1958          4123           nw 13th ave     no 207r  Anytown    US   00000             Reference person  Female          White
10267     0_20162         Sister              L      Dont Know   65    01/20/1955          4123           nw 13th ave     no 207r  Anytown    US   00000              Same-sex spouse  Female          White
10268     0_19669        Chiod F              B            Boy   59    10/06/1960         84101        inkberry drive         NaN  Anytown    US   00000                     Roommate    Male          Black
10269     0_20160  Granddaughter              A              Y   26    10/24/1993           NaN    nth indiana avenue         NaN  Anytown    US   00000                      Sibling  Female          White

[10270 rows x 15 columns]

@zmbc zmbc changed the base branch from main to develop April 21, 2023 23:12
@zmbc zmbc merged commit 71927a2 into develop Apr 24, 2023
@zmbc zmbc deleted the bugfix/fake-names-capitalization branch April 24, 2023 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants