- Clone this repository
- On terminal, run
pip3 install -r requirements.txt
- Initialize the database
python3 database_setup.py
- Install Jupyter notebooks:
pip3 install jupyter
- Install Mixpanel API:
pip install mixpanel_api
- Install pandas:
pip3 install pandas
Good tutorial: https://www.idiotinside.com/2015/05/10/python-auto-generate-requirements-txt/
-
Create a file in the working directory called "app-env"
touch app-env
-
In this file add the following three lines replacing the text between quotes with your API keys:
export api_secret="[INSERT MIXPANEL API SECRET]"
export token="[INSERT MIXPANEL TOKEN]"
export my_key="[INSERT GOOGLE MAPS GEOCODE API KEY]" -
In the terminal run
source app-env
to activate your environment variables (Note: you must run thesource
command in the same terminal before launching Jupyter Notebooks in order for the geocode()function to work within the data wrangling script)
- Using python 2, run the script
python get_data.py
python3 server.py
- Open jupyter notebook, in terminal run:
jupyter notebook
- Open the notebook:
wrangleDataAndFillDb
- Run the entire script
-
The second-to-last section in notebook creates a random subset from the entire dataset. You can change the size of this subset by modifying the parameter to the sample() function in this line:
sampleDf = df.sample(20)
-
Then run the last section of the notebook (the for-loop which calls the
geocode()
funtion to add these entries to the database). -
The geocode() function will output the entire JSON response from google as well as the the dictionary that will be returned to the main script. You will see this output at the end of the notebook.
-
Modify the date of the
get_data.py
script (line 13) to the most recent date -
Run the script
get_data.py
(using Python 2)! -
Run the R script
wrangleColumnsFix.R
. This script takes as its input the output from step 2 and then outputs a new script in which the certain fields have been modified. I have been running this script from R Studio. -
Open Jupyter Notebook, run the script
wrangleDataAndFillDb.ipynb
. Be sure to specify sampleDf with the columns that you want to populate.