GitHub - mlinegar/politicalRhetoric: Analysis of 2016 campaign speeches and poll dynamics

This repository contains the code for my senior thesis, which examines the relationship between speeches given by Hillary Clinton and Donald Trump in the 2016 election and their poll performance over the course of the election. The goal of this repository is to document my workflow, allowing the work to be completely reproducible, while allowing an interested observer to take up the work at any single stage. To do so I simply save the results as a csv at each stage; these csv files are included in the repository in addition to the code.

I first assembled a dataset of political speeches from three sources using webscraping tools from Rvest and using the Internet Archive's WayWayBack Machine. The code used to gather this data can be seen in speechScraper.R, which gathers data from three sources, and removes duplicates using the RNewsflow package. The results for this stage can be read in using thesisSpeeches.csv.

Once assembled, I use the code contained in corpusPreper to clean and organize the data. This code relies on the litMagModelling package I wrote for another project; it is a wrapper for MALLET, a set of tools for text analysis made in Java and implemented in R, but with syntax unfamiliar to most R users. This code also stems the collected speeches using koRpus, removes a list of stopwords, combines interesting terms into single tokens to aid analysis, and removes words that occur either too frequently or too infrequently to be useful in the analysis. The results for this stage can be read in using processedSpeeches.csv. Additionally, this file creates a dataframe of metadata for the speeches that can be seen in collectedSpeeches.metadata.csv.

The small file corpusMaker.R takes the output of corpusPreper.R and applies LDA. The results for this stage can be read in using overallCorpus.csv for the post-June 1 corpus or postSep1Corpus.csv for the post-September 1 corpus. Note that estimation can take a while, so I recommend skipping this step, or reducing the number of runs in the makeCorpus function. Additionally, note that as these two corpuses contain different speeches, different topics will be generated by LDA (though these topics are relatively stable to the choice of these two dates).

The file pollTopicVARModeler.R applies a VAR model to the polls from the 2016 election and the topic-speeches generated from corpusMaker.R. Note that this is a discrete-time model, not a continuous time model; the continuous-time model I was using has been failing to converge. I've contacted the package maintainer and hope to compare the outputs from each of the different models. In the meantime, I compare two different sets of models, each with daily observations. In one days with no speeches are given a 0 for that day for each topic; in the other I use a Kalman filter to smooth out these missing observations. I believe that the second case may better approximate the way the speeches enter into the popular conscience. For example, the media might continue to refer to an old speech if no new information (speech) is presented. Of course, it would be far better to simply examine media discussion of the election, but I don't have access to that data at present. In either case, the results are almost indistinguishable. Finally, note that rather than use the post-July 1 corpus that I had used earlier, I now use a post-September 1 corpus, as the density of speeches given between July 1 and September 1 is far lower than after September 1. In the continuous-time case I don't believe that this is a problem, but is more likely to bias results in the discrete-time case.

A total of 40 bivariate VAR models were run, estimating the relationship between changes in differences in poll-performances between candidates and each candidate's use of each topic. The patterns between bivariate models are consistent: the first lag of the difference in poll support has an effect on changes in the difference in poll support (as a dependent variable) that is significant at the 5% level (the average coefficient is 0.319, which is equivalent to 89% of one standard deviation), while only two models have a significant effect on changes in the difference in poll support. It is likely that the significance of Clinton's use of this topic is due to chance, as only 0.4% of an average Clinton speech could be attributable to this topic. No other topic used by either candidate has an effect on changes in differences in poll support significant at the 5% level. Of the topics significant at the 10% level, only one (Trump's use of the "ISIS Military" topic) is used by the corresponding candidate in more than 10% of a representative speech. The topics covered in candidates' speeches thus appear to have little to no effect on their performance in the polls.

Similarly, candidates' performance in the polls appears to have little effect on the topics they use in their speeches. Polls had a significant effect on topic expression in only a single model (Clinton's use of the "Win Build/Bad Deal" topic); again, this result is likely to be a fluke of the data, as Clinton uses the topic in only 2.7% of a representative speech. Lags of topics did consistently have a positive effect on present use of that same topic, with 19 of the 40 models having a lagged effect that was significant at the 5% level. In all cases this relationship was positive, indicating that once candidates begin using a topic they are likely to continue to do so in the immediate future.

Finally, I include some graphs of polls and topic-use by each candidate over time. The code used to generate these plots is included in the functions.R script, but the actual generation of these plots isn't very informative, and so is not included for now. All of these plots can be seen under the folder Plots.

Name	Name	Last commit message	Last commit date
Latest commit mlinegar Corrected typo Mar 29, 2018 6cbce99 · Mar 29, 2018 History 4 Commits
Plots	Plots	Added topicPollCombPlot example to Plots.	Mar 28, 2018
.gitignore	.gitignore	Initial commit	Mar 28, 2018
README.md	README.md	Corrected typo	Mar 29, 2018
collectedSpeeches.metadata.csv	collectedSpeeches.metadata.csv	Initial commit	Mar 28, 2018
corpusMaker.R	corpusMaker.R	Added topicPollCombPlot example to Plots.	Mar 28, 2018
corpusPreper.R	corpusPreper.R	Updated comments	Mar 28, 2018
functions.R	functions.R	Added topicPollCombPlot example to Plots.	Mar 28, 2018
lemmaerrors.csv	lemmaerrors.csv	Initial commit	Mar 28, 2018
overallCorpus.csv	overallCorpus.csv	Initial commit	Mar 28, 2018
politicalRhetoric.Rproj	politicalRhetoric.Rproj	Initial commit	Mar 28, 2018
pollTopicVARModeler.R	pollTopicVARModeler.R	Initial commit	Mar 28, 2018
postSep1Corpus.csv	postSep1Corpus.csv	Initial commit	Mar 28, 2018
processedSpeeches.csv	processedSpeeches.csv	Initial commit	Mar 28, 2018
speechScraper.R	speechScraper.R	Initial commit	Mar 28, 2018
thesisSpeeches.csv	thesisSpeeches.csv	Initial commit	Mar 28, 2018
twotokenlist2.csv	twotokenlist2.csv	Initial commit	Mar 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mlinegar/politicalRhetoric

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages