Name		Name	Last commit message	Last commit date
parent directory ..
graphs		graphs
my_src		my_src
README.md		README.md
continuous_weekly_negative_binomial.ipynb		continuous_weekly_negative_binomial.ipynb
hourly_pickups_exploration.ipynb		hourly_pickups_exploration.ipynb

README.md

Uber Pickup Data For New York City

We explore and analyze Uber pick up data for New York City. The data is from FiveThirtyEight. The data includes the following for each pickup:

Date/Time
Latitude
Longitude

Virtual Environment

Notebooks use statsmodels.py which has a dependency on scipy.py 1.2.0. So we need to use a virtual environment.

Installing the Virtual Environment

Make sure you have virtualenv.py; if not, then run

pip install virtualenv

First, we install the proper version of scipy in the virtual environment and install a jupyter kernel for the virtual environment. From the project directory, run the following:

Run virtualenv --system-site-packages venv.
Activate the virtual environment. For example on Windows, run venv\Scripts\activate.
From the virtual environment, run pip install scipy==1.2.0.
From the virtual environment, run python -m ipykernel install --user --name=uber_data.
Exit the virtual environment by running deactivate.

Using the Virtual Environment

Now, when we run jupyter notebook (inside or outside the virtual environment), just make sure you are using the uber_data kernel (and NOT the default kernel, e.g. Python 3).

For example, using the menus in the notebook, go to Kernel > Change kernel > uber_data.

Notebooks

We have the following notebooks:

Hourly Pickup Counts for Different Days of the Week

`continuous_weekly_negative_binomial.ipynb`

Group the pickup data to get counts of how many pickups occur each hour (so 24 in a single day). Model these counts based on the day of the week. Can be used to make strategic decisions related to how busy drivers are at different points in the week (e.g. what are the most popular times of the week).

Here is the resulting model:

The model was selected from several variations using cross-validation where the hold-out set is scored based on the mean log-likelihood of the model. The final model uses that the conditional distribution of y = hourly pickup count given x = day of week (0 is Monday) is a negative binomial distribtuion.

The negative binomial distribution is a common model for Poisson-like data that has too large a variance, i.e. over-dispersion. The idea is that it is a poisson model with a latent (gamma) distribution for the mean.

Hourly Pickup Exploration

`hourly_pickups_exploration.ipynb`

Group the pickup data by the date and hour. Look at trends in the counts of the number of pickups for a given hour in the day. Look at splitting by days that are weekends, weekdays, etc. For example, we have the following for hourly counts for Monday through Thursday (Friday nights should have a different behavior).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uber_data

uber_data

README.md

Uber Pickup Data For New York City

Virtual Environment

Installing the Virtual Environment

Using the Virtual Environment

Notebooks

Hourly Pickup Counts for Different Days of the Week

`continuous_weekly_negative_binomial.ipynb`

Hourly Pickup Exploration

`hourly_pickups_exploration.ipynb`

Files

uber_data

Directory actions

More options

Directory actions

More options

Latest commit

History

uber_data

Folders and files

parent directory

README.md

Uber Pickup Data For New York City

Virtual Environment

Installing the Virtual Environment

Using the Virtual Environment

Notebooks

Hourly Pickup Counts for Different Days of the Week

continuous_weekly_negative_binomial.ipynb

Hourly Pickup Exploration

hourly_pickups_exploration.ipynb

`continuous_weekly_negative_binomial.ipynb`

`hourly_pickups_exploration.ipynb`