Add script for scraping Smogon movesets #363
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This script adds functionality to the repository by providing a means to scrape competitive Pokemon data from Smogon. This data is subsequently saved to smogon-movesets.csv in the folder /pokedex/data/csv. The data itself includes a header line, which outlines the following broad categories which the data covers. For each moveset of each Pokemon in each generation, the following data are available:
Data scraping takes me around eight minutes on my network, but your mileage may vary.
I would include a copy of the data directly, but the CSV file produced contains 5232 lines and is over 100mb, which is GitHub's limit for customers without Large Storage packages. A copy is thus available here.
The script itself is intended to be run once per user in an interactive environment, but could easily be modified to not include prints or a user confirmation prompt if desired.
I'm unsure how this data could see use in the remainder of the utility offered by this project, but it opens the door to future CLI engagements with Smogon's data by abstracting the process of fetching and cleaning their JSON.