mds522_305: Blood Donation Predictions

Contributors:
- Evelyn Moorhouse
- Tao Guo
- Lise Braaten
- Xinwen Wang

Introduction

We have attempted to build a classification model to determine if a person donated blood on specific date as a result of their past donation history. The goal of working with the data set is to see whether the features Recency(last donation month), Frequency(total number of donation) and Monetary(total blood donated) have any effect on whether the blood was donated on a given date. Therefore if these features have effects on donation, we can extrapolate and infer that they likely would have an effect on future donation.

The question that we aimed to answer was: Do Recency(last donation month), Frequency(total number of donation) and Monetary(total blood donated) have any effect on whether the blood was donated on a specific date?

We preformed classification with three different models: Decision Tree, Random Forest and Logistic Regression. Our best validation results were obtained from the Random Forest model with a 0.30 error, however, our Logistic regression model had a much smaller difference between train and validation errors, with a validation error of 0.32. Since our errors were so high, we would not recommend using a model based on these features to predict blood donation from past donors. However, if it is necessary we would recommend the Logistic Regression classifier since the smaller difference between train and validation error would imply it is better for generalizing to new data.

Based on our results we infer that the features of 1) time since last donation, 2) total number of donations, 3) total blood donated, and 4) the time since the first donation, all combined have some predictive power for whether a patient will donate blood. However, since our accuracy and cross validation scores were low, the combined predictive power of these features is low. Since the predictive power is so low, we suggest that these features don’t have a strong influence on whether a patient donates blood or not. We would suggest that other factors may provide better predictions as to whether blood is donated by a past donor.

Report

The final report can be found here

Dependency Diagram

Usage

1. Using Docker

note - the instructions in this section also depends on running this in a unix shell (e.g., terminal or Git Bash)

To replicate the analysis, install Docker. Then clone this GitHub repository and run the following command at the command line/terminal from the root directory of this project:

docker run --rm -v $(pwd):/home/mds522_305 tguo3/mds make -C '/home/mds522_305' all

To reset the repo to a clean state, with no intermediate or results files, run the following command at the command line/terminal from the root directory of this project:

docker run --rm -v $(pwd):/home/mds522_305 tguo3/mds make -C '/home/mds522_305' clean

2. Without using Docker

To replicate the analysis, clone this GitHub repository, install the dependencies listed below, and run the following command at the command line/terminal from the root directory of this project:

make all

To clean the repo and reset it to a state with no intermediate files or results run the following command at the command line/terminal from the root directory of this project:

make clean

Dependencies

Python 3.7.3 and Python packages:
sklearn==0.22.1
pandas==0.24.2
altair==3.2.0
datapackage==1.11.0
docopt==0.6.2
requests==2.22.0

R version 3.6.1 and R packages:
tidyverse==1.2.1
ggplot2==3.2.1
kableExtra==1.1.0

References

Armitage, C. J., & Conner, M. (2001). Social cognitive determinants of blood donation. Journal of applied social psychology, 31(7), 1431-1457.

de Jonge, Edwin. 2018. Docopt: Command-Line Interface Specification Language. https://CRAN.R-project.org/package=docopt.

Gillespie, T. W., & Hillyer, C. D. (2002). Blood donors and factors impacting the blood donation decision. Transfusion Medicine Reviews, 16(2), 115-130.

Keleshev, Vladimir. 2014. Docopt: Command-Line Interface Description Language. https://github.com/docopt/docopt.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830

R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Van Rossum, Guido, and Fred L. Drake. 2009. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.

VanderPlas et al. Altair: Interactive Statistical Visualizations for Python. Journal of Open Source Software (2018)

Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010) (publisher link)

Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.

Wickham, Hadley. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

Yeh, I. C., Yang, K. J., & Ting, T. M. (2009). Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems with Applications, 36(3), 5866-5871

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
data		data
doc		doc
results		results
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Makefile.dot		Makefile.dot
Makefile.png		Makefile.png
README.md		README.md
report.Rmd		report.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mds522_305: Blood Donation Predictions

Introduction

Report

Dependency Diagram

Usage

1. Using Docker

2. Without using Docker

Dependencies

References

About

Releases 6

Packages

Contributors 3

Languages

License

UBC-MDS/mds522_305

Folders and files

Latest commit

History

Repository files navigation

mds522_305: Blood Donation Predictions

Introduction

Report

Dependency Diagram

Usage

1. Using Docker

2. Without using Docker

Dependencies

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 3

Languages

Packages