The MTA are kind enough to share statistics on the number of entries, and exits, at all stations and turnstiles. during certain time intervals. They CSV key to this file can be read here.
Thank you to the MTA for sharing this data.
An example of the data that can be produced via this application:
This Ruby on Rails 6 project will download the relevant files, and import them into the database.
This was created with Ruby 2.6.5, and Rails 6.0.2.2. The database is any version of PostgreSQL 9.5+.
Simply run ./bin/setup
, or bundle exec rake db:setup
to initialize the database and create the migrations.
One can download all the files from the MTA site, and then process each one individually via this rake task:
bundle exec rake populate:data
After all the data has been downloaded and populated into the database, some calculations on the cumulative entries and exits will have to be done:
bundle exec rake populate:totals
This will delete data in station_total_events
for re-population.
The FetchFileListings
class fetches the relevant files it sees from the MTA listing, and creates a record for each of the files in FileDownload
.
The FetchAndParse
class then iterates over the unprocessed FileDownload
objects, retrieving each one from the site, and then then parsing it using the CSV
library.
StationEvent
represents each entry from the downloaded file. This is linked to its source, FileDownload
, the Station
, and Division
.
The files are loaded in through Rail 6's built-in insert_all
method.
There are ample tests on the models, and service objects, using rspec
.
rubocop
is bundled to provide linting.
As an example, the raw data for the entries and exits for 59 St station, 02-00-00, looks like this:
Date Time | Entries | Exits |
---|---|---|
03/21/2020 12:00:00 | 7411969 | 2516000 |
03/21/2020 16:00:00 | 7412028 | 2516024 |
03/21/2020 20:00:00 | 7412053 | 2516040 |
03/21/2020 12:00:00 | 7411969 | 2516000 |
03/21/2020 16:00:00 | 7412028 | 2516024 |
03/21/2020 20:00:00 | 7412053 | 2516040 |
03/22/2020 00:00:00 | 7412067 | 2516049 |
03/22/2020 04:00:00 | 7412068 | 2516049 |
03/22/2020 08:00:00 | 7412071 | 2516059 |
03/22/2020 12:00:00 | 7412086 | 2516073 |
It should be noted that Entries
and Exits
are both cumulative.
One challenge was to break up the cumulative figure for each turnstile, for a running total. This was to carry out weekly total operations.
These running totals are calculated in the GenerateTotals
class.
Still being worked on but here is a way to return data for the totals for each week from 2019-12-01
to 2020-04-01
for the 59 ST
station.
> report_options = { begin_week: '2019-12-01', end_week: '2020-04-01' }
> Reports::StationTotals.new('59 ST', options).execute
[
{:week=>"2019-12-02", :total=>"57568"},
{:week=>"2019-12-09", :total=>"56993"},
{:week=>"2019-12-16", :total=>"54758"},
{:week=>"2019-12-23", :total=>"45795"},
{:week=>"2019-12-30", :total=>"46626"},
{:week=>"2020-01-06", :total=>"56258"},
{:week=>"2020-01-13", :total=>"55167"},
{:week=>"2020-01-20", :total=>"49727"},
{:week=>"2020-01-27", :total=>"55101"},
{:week=>"2020-02-03", :total=>"55789"},
{:week=>"2020-02-10", :total=>"55658"},
{:week=>"2020-02-17", :total=>"48882"},
{:week=>"2020-02-24", :total=>"56477"},
{:week=>"2020-03-02", :total=>"54014"},
{:week=>"2020-03-09", :total=>"44048"},
{:week=>"2020-03-16", :total=>"19694"},
{:week=>"2020-03-23", :total=>"6743"}
]