0010_Marriage.Rmd

---
title: "Marriage Day 10"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
```

## Hypothesis

- Detect the deviation of demographic

- Getiing Married / Demographic
- Getiing Devorced / Demographic


`glimpse(men)`
`men %>% select(-(1:3))`

```
# # change text
# both_sexesT$demographic %>% unique %>% 
#   gsub("Black", "黑", .) %>% 
#   gsub("all", "全", .) %>%
#   gsub("HS", "高", .) %>%
#   gsub("SC", "部大", .) 


```


Get all data from current plot `ggplot_build(plot)$data`
```

plot = menNwomenL %>% filter(grepl('^(BA)', demographic)) %>%  ggplot(aes(x=year, y=value, colour=sex)) + geom_line() + facet_grid(~demographic)
ggplot_build(plot)$data

```

## TODO

- starting time 08:15


> Figures represent share of the relevant population that has never been married (MARST == 6 in the IPUMS data). Note that in the story, charts generally show the share that have ever been married, which is simply 1 - n.


```{r}
men = read.table('./data/data/marriage/men.csv', header = TRUE,sep = ",")
str(men)
colnames(men)
glimpse(men)
tail(men)
```
```{r}
women = read.table('./data/data/marriage/women.csv', header = TRUE,sep = ",")
str(women)
tail(women)
```


```{r}
both_sexes = read.table('./data/data/marriage/both_sexes.csv', header = TRUE,sep = ",")
str(both_sexes)
tail(both_sexes)
```


> In the divorce file, figures are share of the relevant population that is currently divorced, conditional on having ever been married.

```{r}
divorce = read.table('./data/data/marriage/divorce.csv', header = TRUE,sep = ",")
str(divorce)
tail(divorce)
```


```{r}

menR = men
menR[-(1:3)] = 1 - menR[-(1:3)]
# menR %>% melt(id )


womenR = women
womenR[-(1:3)] = 1 - womenR[-(1:3)]
womenR

both_sexesR = both_sexes
both_sexesR[-(1:3)] = 1 - both_sexesR[-(1:3)]
both_sexesR

```


```{r}

both_sexesT = both_sexesR %>% gather(demographic, value, -c(X, year, date))
menT = menR %>% gather(demographic, value, -c(X, year, date))
womenT = womenR %>% gather(demographic, value, -c(X, year, date))
```


TODO: Maybe auto sort by decreasing order


```{r fig.width=12, fig.height=2}

# # change text
# both_sexesT$demographic %>% unique %>% 
#   gsub("Black", "黑", .) %>% 
#   gsub("all", "全", .) %>%
#   gsub("HS", "高", .) %>%
#   gsub("SC", "部大", .) 


# change appearance order
high = c('kids_all_2534', 'kids_poor_2534', 'kids_HS_2534')
rest = setdiff(unique(both_sexesT$demographic), high)
rank= c(high, rest)
both_sexesT$demographicTmpRank = factor(both_sexesT$demographic, levels=rank)

both_sexesT %>% filter(grepl('^(kids)', demographic)) %>%  ggplot(aes(x=year, y=value)) + geom_line() + facet_grid(~demographicTmpRank) + scale_x_continuous(breaks = seq(min(both_sexesT$year), max(both_sexesT$year), by = 52)) + theme(panel.spacing = unit(1, "lines"))
```


TODO: Partial Highlight

```{r}


l = both_sexesT %>% filter(grepl('^(kids_)', demographic)) %>%  ggplot(aes(x=year, y=value, colour=demographic)) + geom_line() + scale_color_manual(values = c(rep('grey', 9) ) )
high = both_sexesT %>% filter(grepl('^(kids_mid_2534|kids_GD_2534)', demographic))
high
# l + geom_line(data=high, colour='red', shape=3)

```


```{r fig.width=5, fig.height=2}

menL = menT
menL$sex = factor('male')

womenL = womenT
womenL$sex = factor('female')

menNwomenL = rbind(menL, womenL)
menNwomenL %>% filter(grepl('^(all)', demographic)) %>%  ggplot(aes(x=year, y=value, colour=sex)) + geom_line() + facet_grid(~demographic)
  

```


Very useful
```{r fig.width=7, fig.height=2}
plot = menNwomenL %>% filter(grepl('^(BA)', demographic)) %>%  ggplot(aes(x=year, y=value, colour=sex)) + geom_line() + facet_grid(~demographic)
ggplot_build(plot)$data
```


TODO: good color mapping

How to do multiple line per facet?
- need to have both `male` and `female` data in the 

```{r fig.width=7, fig.height=2}

menNwomenL %>% filter(grepl('^(BA)', demographic)) %>%  ggplot(aes(x=year, y=value, colour=sex)) + geom_line() + facet_grid(~demographic) + scale_colour_manual(values=c(male="#00BFC4", female="#F8766D"))

```