[Last update 17/12/2020]
notebooks re-run on 24/10/2022 (except TPOT) to update heroku build matching python 3.10.8 and updated dependencies
In this repo I use previous data from a larger project to develop a Machine Learning model to predict if farmers reach the living income benchmark value, using only 9 indicators.
This is a continuation of the work presented in Living Income Analysis. The work is advanced in two ways:
- The model is simplified to use only 9 variables, which makes it very accesible (applied) researchers and decision makers
- The model is made directly available in a web app
The web app is inspired by the web app develop for the project https://github.com/mtyszler/Disaster-Response-Project
The key files in this repo are:
- LivingIncomeAnalysis.ipynb : notebook where I wrangle the data, compute indicators and analyze whether farmers reach a living income
- LivingIncome_MachineLearning.ipynb : notebook where I tune and fit the broad machine learning model
- LivingIncome_Model.ipynb : notebook where I tune and fit the restricted machine learning model with the top 10 features
The model can be downloaded here.
To use it in python
:
import joblib
import numpy as np
import pandas as pd
model = joblib.load("LI_simplified_model.pkl")
## create X, with these columns:
# Estimated income percentage from sale of cocoa (between 0-1)
# Cocoa production (kg/ha) (= Cocoa production (kg) / Productive land under cocoa (ha) )
# Cocoa production (kg)
# Productive land under cocoa (ha)
# Cocoa land owned (ha)
# Hectares with trees between 5 and 25 years old
# Estimated income percentage from own small business or trading (between 0-1)
# Head: age
# Estimated income percentage from sale of other crops (between 0-1)
# Land used to cultivate all crops (ha)
classification_label = model.predict(X)
classification_prob = model.predict_proba(X)[:,1]*100
The web app has 4 pages:
This page has form to input 9 farming household characteristics. After clicking Calculate chance
the app returns the prediction and probability of achieving the living income benchmark.
This is the main use of the app.
This page has some basic info about the training and testing datasets, as well as graphs about the all the features. It contains a bar chart showing how many observations reach the living income benchmark in the training and testing sets, as well as histograms of all the 9 features, compared between those achieving and not achieving the living income benchmark in the training and testing sets.
This page has some info about the performance of the ML model. It shows a few metrics as well as a confusion table.
In this page the user can download the model for own use in python
. It contains basic script explaining how to use the model.
To run the web app locally:
python app/run.py
then go to http://0.0.0.0:3001/ or localhost:3001
- Alternatevely, in unix system type:
gunicorn app.run:app -b 0.0.0.0:3001
to run a local gunicorn server
A live version of the app can be see at https://living-income-model.herokuapp.com/
This projects uses Python 3.10.8 and was developed in windows 10 system. Python package requirements for the web app can be found in requirements.txt
.
Because the project is deployed at Heroku, build-22.
There are 3 write-ups related to this project:
- An initial non-technical blog post about the findings of the initial analysis can be found at Medium
- A follow-up non-technical blog post about the findings of the this analysis and web app can be found at Medium
- A technical write-up about this project can be found at Medium
Marcelo Tyszler
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/.