Skip to content

Latest commit

 

History

History
102 lines (61 loc) · 3.84 KB

README.md

File metadata and controls

102 lines (61 loc) · 3.84 KB

Hong Kong Horse Race Predictor

  • Authors: Derek Kruszewski, Yi Liu, Rob Blumberg, Carlina Kim

Data analysis project for Group 302 for DSCI (Data Science Workflows): a Master of Data Science Course at the University of British Columbia.

About

This project attempts to build a regression model to answer the research question:

Given a set of features related to racing horses, can we predict the outcome of a race?

The model produced is able to predict finish times with an R^2 correaltion of 0.909.

The dataset used to answer this question is the Hong Kong Horse Racing Dataset for Experts, publicly available through Kaggle (HorseBaby 2018). This data has been rehosted on github for use with this project's scripts:

https://raw.githubusercontent.com/v5y8/horse_race_data/master

Please ensure the above github repository is used for downloading with Makefile.

Final Report

The final report can be found here.

Usage

There are two ways to replicate the analysis on your local machine.

Method 1: Using Docker

Note - the instructions below depends on running this in a unix shell (e.g., terminal or Git Bash), if you are using Windows Command Prompt, replace /$(pwd) with PATH_ON_YOUR_COMPUTER.

  1. Install and run Docker

  2. Clone this Github repository and run the following command at the command line/terminal from the root directory of this project:

docker run --rm -v /$(pwd):/home/DSCI_522_Group_302 v5y8/group_302_environment make -C /home/DSCI_522_Group_302 all
  1. Toreset the repo to a clean slate, , run the following command at the command line/terminal from the root directory of this project:
docker run --rm -v /$(pwd):/home/DSCI_522_Group_302 v5y8/group_302_environment make -C /home/DSCI_522_Group_302 clean

Method 2: Using Make

This method require all dependencies below to be installed before running the analysis. Run the following command in the terminal at the root directory of this project (script takes 15-20 minutes to fully execute):

make all

To reset this repository to a clean state, run the following command in the terminal at the root directory of this project:

make clean

Dependencies diagram of Makefile

The relationships between the scripts, data files and final outputs are summarised in the dependency diagram below.

Makefile_diagram

Dependencies

Python 3.7.5 and Python Packages:

R version 3.6.1 and R packages:

Contributions

We welcome all contributions to this project! If you notice a bug, or have a feature request, please open up an issue here. If you'd like to contribute a feature or bug fix, you can fork our repo and submit a pull request. We will review pull requests within 7 days. All contributors must abide by our code of conduct.

References

HorseBaby. 2018. “Horse Racing Dataset for Experts (Hong Kong).” https://www.kaggle.com/hrosebaby/horse-racing-dataset-for-experts-hong-kong.