Normalize diacritics in player lookup #138

ap-ack · 2024-04-28T23:02:55Z

statsapi.lookup_player('Acuna') returns an empty list, not including Ronald Acuña Jr. because it is not a direct text match due to the diacritic on the n. This effectively makes the player lookup feature unusable for search queries coming from user input, as it may not be obvious to an end user that they need to use characters like this and they may not have easy access to these characters on their keyboard. Additionally, some MLB players like Emmanuel Ramirez don't use diacritics in their names, while others like José Ramírez do, and users won't know when they have to use the diacritics to get their expected search results.

toddrob99 · 2024-04-29T00:37:07Z

I am considering normalizing the accented and diacritical characters using a method I found here. This seems to slow down the lookup process considerably, because it has to check and normalize every character in every field of every player record. I am going to think about this and do some more testing.

ap-ack · 2024-04-29T02:19:26Z

If that is not performant enough how about adding an attribute to the player object like 'normalizedFullName' so the result would be pre-computed? (edit: or does that object come from the underlying API?)

toddrob99 · 2024-04-29T12:39:16Z

I realized last night that the API call is only requesting specific fields, so I checked if there's another field that includes the ascii version of the name, and there is one: 'nameSlug': 'ronald-acuna-jr-660670'. I'll add that field to the API call in the next version, and the issue will be resolved (because the current code checks all included fields for the keyword).

Full record, for reference:

{'id': 660670, 'fullName': 'Ronald Acuña Jr.', 'link': '/api/v1/people/660670', 'firstName': 'Ronald', 'lastName': 'Acuña', 'primaryNumber': '13', 'birthDate': '1997-12-18', 'currentAge': 26, 'birthCity': 'La Guaira', 'birthCountry': 'Venezuela', 'height': '6' 0"', 'weight': 205, 'active': True, 'currentTeam': {'id': 144, 'link': '/api/v1/teams/144'}, 'primaryPosition': {'code': '9', 'name': 'Outfielder', 'type': 'Outfielder', 'abbreviation': 'RF'}, 'useName': 'Ronald', 'useLastName': 'Acuña Jr.', 'middleName': 'Jose', 'boxscoreName': 'Acuña Jr.', 'nickName': 'El De La Sabana', 'gender': 'M', 'nameMatrilineal': 'Blanco', 'isPlayer': True, 'isVerified': True, 'pronunciation': 'ah-cuhn-YA', 'mlbDebutDate': '2018-04-25', 'batSide': {'code': 'R', 'description': 'Right'}, 'pitchHand': {'code': 'R', 'description': 'Right'}, 'nameFirstLast': 'Ronald Acuña Jr.', 'nameTitle': 'Jr.', 'nameSuffix': 'Jr.', 'nameSlug': 'ronald-acuna-jr-660670', 'firstLastName': 'Ronald Acuña Jr.', 'lastFirstName': 'Acuña Jr., Ronald', 'lastInitName': 'Acuña Jr., R', 'initLastName': 'R Acuña Jr.', 'fullFMLName': 'Ronald Jose Acuña Jr.', 'fullLFMName': 'Acuña Jr., Ronald Jose', 'strikeZoneTop': 3.49, 'strikeZoneBottom': 1.58}

toddrob99 · 2024-04-29T12:56:40Z

I forgot to mention this issue in the commit/PR but this is fixed in v1.7.2

ap-ack · 2024-04-29T13:10:06Z

Great, thanks. 'Acuna' now works but 'Ronald Acuna' still doesn't. Would it make sense to split the input and search for the individual words?

ap-ack · 2024-05-07T23:47:01Z

If you agree on splitting the search query, want me to put in a PR? Would you also want a parameter for the function allowing the caller to choose what should qualify as a match? Options could be all words, any words, or full string.

The "match all words" case would be just about as performant as current, since in most cases the first word will not match and we will move on, but the "match any words" case would require more loops linearly with the number of words (again assuming most players are not matches).

Split search input by word #138

toddrob99 self-assigned this Apr 29, 2024

toddrob99 added the enhancement New feature or request label Apr 29, 2024

toddrob99 closed this as completed Apr 29, 2024

toddrob99 reopened this Apr 29, 2024

ap-ack mentioned this issue Jun 28, 2024

Split search input by word #143

Merged

toddrob99 added a commit that referenced this issue Oct 26, 2024

Merge pull request #143 from ap-ack/issue-138/splitSearch

4587c35

Split search input by word #138

toddrob99 mentioned this issue Oct 26, 2024

v1.8 #147

Merged

toddrob99 linked a pull request Oct 26, 2024 that will close this issue

v1.8 #147

Merged

toddrob99 closed this as completed in #147 Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize diacritics in player lookup #138

Normalize diacritics in player lookup #138

ap-ack commented Apr 28, 2024

toddrob99 commented Apr 29, 2024

ap-ack commented Apr 29, 2024 •

edited

Loading

toddrob99 commented Apr 29, 2024

toddrob99 commented Apr 29, 2024

ap-ack commented Apr 29, 2024

ap-ack commented May 7, 2024

Normalize diacritics in player lookup #138

Normalize diacritics in player lookup #138

Comments

ap-ack commented Apr 28, 2024

toddrob99 commented Apr 29, 2024

ap-ack commented Apr 29, 2024 • edited Loading

toddrob99 commented Apr 29, 2024

toddrob99 commented Apr 29, 2024

ap-ack commented Apr 29, 2024

ap-ack commented May 7, 2024

ap-ack commented Apr 29, 2024 •

edited

Loading