Skip to content

Commit

Permalink
update README files
Browse files Browse the repository at this point in the history
  • Loading branch information
dmil committed Feb 9, 2018
1 parent 3dbd1d7 commit 2200622
Show file tree
Hide file tree
Showing 19 changed files with 70 additions and 57 deletions.
2 changes: 1 addition & 1 deletion ahca-polls/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# American Health Care Act Polls

The raw data behind the story [Why The GOP Is So Hell-Bent On Passing An Unpopular Health Care Bill](https://fivethirtyeight.com/features/why-the-gop-is-so-hell-bent-on-passing-an-unpopular-health-care-bill)
This folder contains the data behind the story [Why The GOP Is So Hell-Bent On Passing An Unpopular Health Care Bill](https://fivethirtyeight.com/features/why-the-gop-is-so-hell-bent-on-passing-an-unpopular-health-care-bill).
4 changes: 2 additions & 2 deletions airline-safety/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
### Airline safety data
# Airline Safety

The raw data behind the story [Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?](http://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/)
This folder contains the data behind the story [Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?](http://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/).

Header | Definition
---|---------
Expand Down
6 changes: 4 additions & 2 deletions alcohol-consumption/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
http://fivethirtyeight.com/datalab/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/
# Alcohol Consumption

Units: Average serving sizes per person
This folder contains the data behind the story [Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?](http://fivethirtyeight.com/datalab/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/).

Units: Average serving sizes per person
Source: World Health Organisation, Global Information System on Alcohol and Health (GISAH), 2010
10 changes: 5 additions & 5 deletions antiquities-act/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
### Actions under the Antiquities Act
# Actions under the Antiquities Act

Data behind the story [Trump Might Be The First President To Scrap A National Monument](http://fivethirtyeight.com/features/trump-might-be-the-first-president-to-scrap-a-national-monument/).
This folder contains the data behind the story [Trump Might Be The First President To Scrap A National Monument](http://fivethirtyeight.com/features/trump-might-be-the-first-president-to-scrap-a-national-monument/).

This data was compiled by the National Parks Conservation Association and includes national monuments that were created by presidents by under the Antiquities Act. It does not include national monuments created by Congress.
This data was compiled by the National Parks Conservation Association and includes national monuments that were created by presidents by under the Antiquities Act. It does not include national monuments created by Congress.

Header | Definition
---|---------
`current_name` | Current name of piece of land designated under the Antiquities Act
`states` | State(s) or territory where land is located
`original_name` | If included, original name of piece of land designated under the Antiquities Act
`current_agency` | Current land management agency. NPS = National Parks Service, BLM = Bureau of Land Management, USFS = US Forest Service, FWS = US Fish and Wildlife Service, NOAA = National Oceanic and National Oceanic and Atmospheric Administration
`action` | Type of action taken on land
`current_agency` | Current land management agency. NPS = National Parks Service, BLM = Bureau of Land Management, USFS = US Forest Service, FWS = US Fish and Wildlife Service, NOAA = National Oceanic and National Oceanic and Atmospheric Administration
`action` | Type of action taken on land
`date` | Date of action
`year` | Year of action
`pres_or_congress` | President or congress that issued action
Expand Down
6 changes: 3 additions & 3 deletions avengers/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### Avengers
# Avengers

This directory contains the data behind the story [Joining The Avengers Is As Deadly As Jumping Off A Four-Story Building](http://fivethirtyeight.com/features/avengers-death-comics-age-of-ultron).
This folder contains the data behind the story [Joining The Avengers Is As Deadly As Jumping Off A Four-Story Building](http://fivethirtyeight.com/features/avengers-death-comics-age-of-ultron).

It includes the dataset `avengers.csv`, which details the deaths of Marvel comic book characters between the time they joined the Avengers and April 30, 2015, the week before Secret Wars #1.
`avengers.csv` details the deaths of Marvel comic book characters between the time they joined the Avengers and April 30, 2015, the week before Secret Wars #1.

Header | Definition
---|---------
Expand Down
2 changes: 1 addition & 1 deletion bachelorette/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Bachelorette / Bachelor

The raw data behind the stories:
This folder contains the data behind the stories:

- [How To Spot A Front-Runner On The ‘Bachelor’ Or ‘Bachelorette’](https://fivethirtyeight.com/features/the-bachelorette/)
- [Rachel’s Season Is Fitting Neatly Into ‘Bachelorette’ History](https://fivethirtyeight.com/features/rachels-season-is-fitting-neatly-into-bachelorette-history/)
Expand Down
4 changes: 2 additions & 2 deletions bad-drivers/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
### Bad drivers
# Bad drivers

Data from the article [Dear Mona, Which State Has The Worst Drivers?](http://fivethirtyeight.com/datalab/which-state-has-the-worst-drivers/)

Variable | Source
---|---------
`State` | N/A
`State` | N/A
`Number of drivers involved in fatal collisions per billion miles` | National Highway Traffic Safety Administration, 2012
`Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding` | National Highway Traffic Safety Administration, 2009
`Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired` | National Highway Traffic Safety Administration, 2012
Expand Down
3 changes: 3 additions & 0 deletions bechdel/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Bechdel

This folder contains data and code behind the story [The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women](http://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/).
6 changes: 4 additions & 2 deletions biopics/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
### Biopics
# Biopics

The directory contains the data behind the story ['Straight Outta Compton' Is The Rare Biopic Not About White Dudes](http://fivethirtyeight.com/features/straight-outta-compton-is-the-rare-biopic-not-about-white-dudes). The data file `biopics.csv` contains the following variables:
This folder contains the data behind the story ['Straight Outta Compton' Is The Rare Biopic Not About White Dudes](http://fivethirtyeight.com/features/straight-outta-compton-is-the-rare-biopic-not-about-white-dudes).

`biopics.csv` contains the following variables:

Variable | Definition
---|---------
Expand Down
8 changes: 4 additions & 4 deletions births/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
### U.S. births data
# U.S. Births

The raw data behind the story [Some People Are Too Superstitious To Have A Baby On Friday The 13th](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/)
This folder contains data behind the story [Some People Are Too Superstitious To Have A Baby On Friday The 13th](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/).

There are two files:

`US_births_1994-2003_CDC_NCHS.csv` contains U.S. births data for the years 1994 to 2003, as provided by the Centers for Disease Control and Prevention's National Center for Health Statistics
`US_births_1994-2003_CDC_NCHS.csv` contains U.S. births data for the years 1994 to 2003, as provided by the Centers for Disease Control and Prevention's National Center for Health Statistics.

`US_births_2000-2014_SSA.csv` contains U.S. births data for the years 2000 to 2014, as provided by the Social Security Administration
`US_births_2000-2014_SSA.csv` contains U.S. births data for the years 2000 to 2014, as provided by the Social Security Administration.

Both files have the following structure:

Expand Down
4 changes: 2 additions & 2 deletions bob-ross/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# bob-ross
# Bob Ross

The raw data behind the story [A Statistical Analysis of the Work of Bob Ross](https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/)
This folder contains data behind the story [A Statistical Analysis of the Work of Bob Ross](https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/).
4 changes: 2 additions & 2 deletions buster-posey-mvp/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
### Buster Posey MVP
# Buster Posey MVP

The code behind the story [Buster Posey’s Pitch Framing Makes Him A Potential MVP](http://fivethirtyeight.com/features/buster-poseys-pitch-framing-makes-him-a-potential-mvp/).
This folder contains the code behind the story [Buster Posey’s Pitch Framing Makes Him A Potential MVP](http://fivethirtyeight.com/features/buster-poseys-pitch-framing-makes-him-a-potential-mvp/).

File | Description
---|---------
Expand Down
6 changes: 4 additions & 2 deletions candy-power-ranking/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Candy Power Ranking

Data behind [The Ultimate Halloween Candy Power Ranking](http://fivethirtyeight.com/features/the-ultimate-halloween-candy-power-ranking/)
This folder contains the data behind the story [The Ultimate Halloween Candy Power Ranking](http://fivethirtyeight.com/features/the-ultimate-halloween-candy-power-ranking/).

`candy-data.csv` includes attributes for each candy along with its ranking. For binary variables, 1 means yes, 0 means no. The data contains the following fields:
`candy-data.csv` includes attributes for each candy along with its ranking. For binary variables, 1 means yes, 0 means no.

The data contains the following fields:

Header | Description
-------|------------
Expand Down
3 changes: 1 addition & 2 deletions chess-transfers/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Chess Transfers

The raw data behind the story [American Chess Is Great Again](https://fivethirtyeight.com/features/american-chess-is-great-again/).

This folder contains data behind the story [American Chess Is Great Again](https://fivethirtyeight.com/features/american-chess-is-great-again/).

Headers | Description
--------|-------------
Expand Down
27 changes: 14 additions & 13 deletions classic-rock/readme.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
# classic-rock
# Classic Rock

Data behind the story [Why Classic Rock Isn’t What It Used To Be](https://fivethirtyeight.com/features/why-classic-rock-isnt-what-it-used-to-be/)
This folder contains the data behind the story [Why Classic Rock Isn’t What It Used To Be](https://fivethirtyeight.com/features/why-classic-rock-isnt-what-it-used-to-be/).

Each line represents a play of a song on a radio station.
Each line represents a play of a song on a radio station.
- The first element, RAW_SONG, is the song text scraped from the radio station
- The second element, Song Clean, is the song's title. It's been made so that all versions
- The second element, Song Clean, is the song's title. It's been made so that all versions
of the RAW_SONG — be they (live) or spelled differently point to the same text in this \
field. So even if we scraped "{Don't Fear} The Reaper" or "(Don't Fear) The Reaper"
or merely "The Reaper" by Blue Oyster Cult, the text in Song Clean is always "(Don't Fear) The Reaper"
- The third element, RAW_ARTIST, is the artist text scraped from the radio station
- The fourth element, ARTIST CLEAN, is a unified version of Raw Artist. So even if we scraped
"Blue Öyster Cult" or "Blue Oyster Cult" or "Blue ?yster Cult", this field would always
read as "Blue Oyster Cult".
- The fourth element, ARTIST CLEAN, is a unified version of Raw Artist. So even if we scraped
"Blue Öyster Cult" or "Blue Oyster Cult" or "Blue ?yster Cult", this field would always
read as "Blue Oyster Cult".
- The fifth element is that station callsign of the song play
- The sixth element is time the song was pulled. Python measures time as seconds since January 1, 1970.
- The seventh element is a unique ID assigned to each play, formed by the callsign of the
station that played it and a four digit number, where 0001 is the last song played on the station
in our set and the highest number is the first song we pulled, if you want to order them.
- The eight element combines Song Clean and ARTIST CLEAN. It can be used for connecting
this data set to the dataset of unique songs.
- The ninth element is a zero or one used to find if this is the first mention of a given song,
it's pretty pointless.
- The ninth element is a zero or one used to find if this is the first mention of a given song,
it's pretty pointless.

classic-rock-song-list:

Each line represents one song in the set
- Song Clean is the name of the song
- Song Clean is the name of the song
- ARTIST CLEAN is the name of the artist
- Release Year is the release year, according to SongFacts. If there isn't a listed year, I couldn't
find an entry for the song on SongFacts
Expand All @@ -35,6 +35,7 @@ Each line represents one song in the set
- PlayCount is the number of plays of the song across all stations.
- F*G is the number of plays of the song across all stations, if a year was found.

radio.py is the program to scrape the data from radio sites

compiling_radio.py is the program to consolidate the output of radio.py into one file per station.
`radio.py` is the program to scrape the data from radio sites.

`compiling_radio.py` is the program to consolidate the output of radio.py into one file per station.
25 changes: 13 additions & 12 deletions college-majors/readme.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,31 @@
This repo contains the data and code for FiveThirtyEight's story on earnings of college majors.
# College Majors

This folder contains the data and code behind the story [The Economic Guide To Picking A College Major](https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/).

All data is from American Community Survey 2010-2012 Public Use Microdata Series.

Download data here: http://www.census.gov/programs-surveys/acs/data/pums.html

Documentation here: http://www.census.gov/programs-surveys/acs/technical-documentation/pums.html

college-majors-rscript.R:
- My R script for parsing the data.
`college-majors-rscript.R`
- Assumes you've already downloaded the data and selected only records with non-NA values for college major (FOD1P)
- Outputs data to csv

majors-list.csv:
`majors-list.csv`
- List of majors with their FOD1P codes and major categories.
- Major categories are from Carnevale et al, "What's It Worth?: The Economic Value of College Majors." Georgetown University Center on Education and the Workforce, 2011. http://cew.georgetown.edu/whatsitworth

Three main data files:
- all-ages.csv
- recent-grads.csv (ages <28)
- grad-students.csv (ages 25+)
All contain basic earnings and labor force information.
recent-grads contains more detailed breakdown, including by sex and by the type of job they got. Full headers below.
grad-students contains details on graduate school attendees.
- `all-ages.csv`
- `recent-grads.csv` (ages <28)
- `grad-students.csv` (ages 25+)

All contain basic earnings and labor force information. `recent-grads.csv` contains a more detailed breakdown, including by sex and by the type of job they got. `grad-students.csv` contains details on graduate school attendees.

Additionally, women-stem.csv contains data for scatter plot in associated DataLab post on women in science/technology jobs. It is a subset of recent-grads.csv. (Small easter egg: Check out my related Shiny app: https://bencasselman.shinyapps.io/new-test/)
Additionally, `women-stem.csv` contains data for scatter plot in associated DataLab post on women in science/technology jobs. It is a subset of `recent-grads.csv`. (Small easter egg: Check out my related Shiny app: https://bencasselman.shinyapps.io/new-test/)

Headers for recent-grads.csv
Headers for `recent-grads.csv` are shown below:

Header | Description
---|---------
Expand Down
4 changes: 2 additions & 2 deletions comic-characters/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Comic characters data
# Comic characters data

The raw data behind the story [Comic Books Are Still Made By Men, For Men And About Men](http://fivethirtyeight.com/features/women-in-comic-books/).
This folder contains data behind the story [Comic Books Are Still Made By Men, For Men And About Men](http://fivethirtyeight.com/features/women-in-comic-books/).

The data comes from [Marvel Wikia](http://marvel.wikia.com/Main_Page) and [DC Wikia](http://dc.wikia.com/wiki/Main_Page). Characters were scraped on August 24. Appearance counts were scraped on September 2. The month and year of the first issue each character appeared in was pulled on October 6.

Expand Down
3 changes: 3 additions & 0 deletions comma-survey/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Comma Survey

This folder contains the data behind the story [Elitist, Superfluous, Or Popular? We Polled Americans on the Oxford Comma](https://fivethirtyeight.com/features/elitist-superfluous-or-popular-we-polled-americans-on-the-oxford-comma/).
File renamed without changes.

0 comments on commit 2200622

Please sign in to comment.