Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize diacritics in player lookup #138

Closed
ap-ack opened this issue Apr 28, 2024 · 6 comments · Fixed by #147
Closed

Normalize diacritics in player lookup #138

ap-ack opened this issue Apr 28, 2024 · 6 comments · Fixed by #147
Assignees
Labels
enhancement New feature or request

Comments

@ap-ack
Copy link
Contributor

ap-ack commented Apr 28, 2024

statsapi.lookup_player('Acuna') returns an empty list, not including Ronald Acuña Jr. because it is not a direct text match due to the diacritic on the n. This effectively makes the player lookup feature unusable for search queries coming from user input, as it may not be obvious to an end user that they need to use characters like this and they may not have easy access to these characters on their keyboard. Additionally, some MLB players like Emmanuel Ramirez don't use diacritics in their names, while others like José Ramírez do, and users won't know when they have to use the diacritics to get their expected search results.

@toddrob99
Copy link
Owner

I am considering normalizing the accented and diacritical characters using a method I found here. This seems to slow down the lookup process considerably, because it has to check and normalize every character in every field of every player record. I am going to think about this and do some more testing.

@toddrob99 toddrob99 self-assigned this Apr 29, 2024
@toddrob99 toddrob99 added the enhancement New feature or request label Apr 29, 2024
@ap-ack
Copy link
Contributor Author

ap-ack commented Apr 29, 2024

If that is not performant enough how about adding an attribute to the player object like 'normalizedFullName' so the result would be pre-computed? (edit: or does that object come from the underlying API?)

@toddrob99
Copy link
Owner

I realized last night that the API call is only requesting specific fields, so I checked if there's another field that includes the ascii version of the name, and there is one: 'nameSlug': 'ronald-acuna-jr-660670'. I'll add that field to the API call in the next version, and the issue will be resolved (because the current code checks all included fields for the keyword).

Full record, for reference:

{'id': 660670, 'fullName': 'Ronald Acuña Jr.', 'link': '/api/v1/people/660670', 'firstName': 'Ronald', 'lastName': 'Acuña', 'primaryNumber': '13', 'birthDate': '1997-12-18', 'currentAge': 26, 'birthCity': 'La Guaira', 'birthCountry': 'Venezuela', 'height': '6' 0"', 'weight': 205, 'active': True, 'currentTeam': {'id': 144, 'link': '/api/v1/teams/144'}, 'primaryPosition': {'code': '9', 'name': 'Outfielder', 'type': 'Outfielder', 'abbreviation': 'RF'}, 'useName': 'Ronald', 'useLastName': 'Acuña Jr.', 'middleName': 'Jose', 'boxscoreName': 'Acuña Jr.', 'nickName': 'El De La Sabana', 'gender': 'M', 'nameMatrilineal': 'Blanco', 'isPlayer': True, 'isVerified': True, 'pronunciation': 'ah-cuhn-YA', 'mlbDebutDate': '2018-04-25', 'batSide': {'code': 'R', 'description': 'Right'}, 'pitchHand': {'code': 'R', 'description': 'Right'}, 'nameFirstLast': 'Ronald Acuña Jr.', 'nameTitle': 'Jr.', 'nameSuffix': 'Jr.', 'nameSlug': 'ronald-acuna-jr-660670', 'firstLastName': 'Ronald Acuña Jr.', 'lastFirstName': 'Acuña Jr., Ronald', 'lastInitName': 'Acuña Jr., R', 'initLastName': 'R Acuña Jr.', 'fullFMLName': 'Ronald Jose Acuña Jr.', 'fullLFMName': 'Acuña Jr., Ronald Jose', 'strikeZoneTop': 3.49, 'strikeZoneBottom': 1.58}

@toddrob99
Copy link
Owner

I forgot to mention this issue in the commit/PR but this is fixed in v1.7.2

@ap-ack
Copy link
Contributor Author

ap-ack commented Apr 29, 2024

Great, thanks. 'Acuna' now works but 'Ronald Acuna' still doesn't. Would it make sense to split the input and search for the individual words?

@toddrob99 toddrob99 reopened this Apr 29, 2024
@ap-ack
Copy link
Contributor Author

ap-ack commented May 7, 2024

If you agree on splitting the search query, want me to put in a PR? Would you also want a parameter for the function allowing the caller to choose what should qualify as a match? Options could be all words, any words, or full string.

The "match all words" case would be just about as performant as current, since in most cases the first word will not match and we will move on, but the "match any words" case would require more loops linearly with the number of words (again assuming most players are not matches).

toddrob99 added a commit that referenced this issue Oct 26, 2024
@toddrob99 toddrob99 mentioned this issue Oct 26, 2024
@toddrob99 toddrob99 linked a pull request Oct 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants