Skip to content

carlinakim/DSCI_522_Group_302

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hong Kong Horse Race Predictor

  • Authors: Derek Kruszewski, Yi Liu, Rob Blumberg, Carlina Kim

Data analysis project for Group 302 for DSCI (Data Science Workflows): a Master of Data Science Course at the University of British Columbia.

About

This project attempts to build a regression model to answer the research question:

Given a set of features related to racing horses, can we predict the outcome of a race?

The model produced is able to predict finish times with an R^2 correaltion of 0.909.

The dataset used to answer this question is the Hong Kong Horse Racing Dataset for Experts, publicly available through Kaggle (HorseBaby 2018). This data has been rehosted on github for use with this project's scripts:

https://raw.githubusercontent.com/v5y8/horse_race_data/master

Please ensure the above github repository is used for downloading with Makefile.

Final Report

The final report can be found here.

Usage

There are two ways to replicate the analysis on your local machine.

Method 1: Using Docker

Note - the instructions below depends on running this in a unix shell (e.g., terminal or Git Bash), if you are using Windows Command Prompt, replace /$(pwd) with PATH_ON_YOUR_COMPUTER.

  1. Install and run Docker

  2. Clone this Github repository and run the following command at the command line/terminal from the root directory of this project:

docker run --rm -v /$(pwd):/home/DSCI_522_Group_302 v5y8/group_302_environment make -C /home/DSCI_522_Group_302 all
  1. Toreset the repo to a clean slate, , run the following command at the command line/terminal from the root directory of this project:
docker run --rm -v /$(pwd):/home/DSCI_522_Group_302 v5y8/group_302_environment make -C /home/DSCI_522_Group_302 clean

Method 2: Using Make

This method require all dependencies below to be installed before running the analysis. Run the following command in the terminal at the root directory of this project (script takes 15-20 minutes to fully execute):

make all

To reset this repository to a clean state, run the following command in the terminal at the root directory of this project:

make clean

Dependencies diagram of Makefile

The relationships between the scripts, data files and final outputs are summarised in the dependency diagram below.

Makefile_diagram

Dependencies

Python 3.7.5 and Python Packages:

R version 3.6.1 and R packages:

Contributions

We welcome all contributions to this project! If you notice a bug, or have a feature request, please open up an issue here. If you'd like to contribute a feature or bug fix, you can fork our repo and submit a pull request. We will review pull requests within 7 days. All contributors must abide by our code of conduct.

References

HorseBaby. 2018. “Horse Racing Dataset for Experts (Hong Kong).” https://www.kaggle.com/hrosebaby/horse-racing-dataset-for-experts-hong-kong.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.0%
  • Python 4.1%
  • Other 0.9%