Skip to content

mikss/nsfads

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nsfads

Introduction

This repository contains R code to munge, analyze, and publish (as a Shiny web app) summaries of word counts in National Science Foundation (NSF) grant abstracts from the Division of Mathematical Sciences (DMS). See mathtrends.ssk.im for an example.

The relevant files are:

xml_munge.R
gen_tdm.R
tdm_dms.R
ui.R
server.R

For any questions, bug reports, etc., contact Steven S. Kim via e-mail at [email protected].

Requirements

Required R packages include:

XML
plyr
data.table
tm
RWeka
ggplot2
stringr

The XML files containing abstract data were downloaded from the NSF website.

Project Notes

  • This project was heavily influenced by the Google Ngram viewer.
  • Default constants look through years 1990 -- 2015, but this was an arbitrary choice, and easily changed by updating the YEARS constant in the code. However, many XML files from earlier years do not contain abstract data.
  • Key functionality is provided by the tm text-mining package in R.
  • The file tdm_dms.R sparsifies the TermDocumentMatrix to only include terms which occur in at least 20% of the years analyzed.
  • A few example terms with interesting trends:
    • machine learning vs. data vs. statistics + statistical
    • biology + biological
    • underrepresented, minority + minorities
    • outreach
    • young researchers and undergraduate, graduate
    • develop, advance + advances
    • the project will
    • network + networks
    • control, partial differential
  • Some eventual TODOs:
    • smoothing the time series
    • a "shuffle" option incorporating list of sample queries
    • make the plot interactive with tool-tips on hover
    • look at all divisions and make comparisons across NSF
    • compare to NIH/DOD/NSERC funding priorities
    • use a Markov model to generate a "sample" abstract
    • map textual differences across corpora
    • a "dollar-weighted" count (weighting gram proportion in a given grant by dollars in grant)

About

NSF-DMS funding trends

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages