Skip to content

Writing download functions

Will Pearse edited this page Mar 16, 2017 · 4 revisions

Are you ready to write some functions?!1?

After writing a few of these download functions (in the downloads.R file), I've compiled a list of some notes and helpful tips to make the experience all the more pleasant. Any additional tips or tricks are welcome!

Getting started

  • After opening up R, there are a few things you need to load in before running any of the downloads.R code:

    • library(reshape2)
    • library(devtools)
    • install_github("willpearse/fulltext")
    • library(fulltext)
    • source('/path/to/natdb/R/utility.R')
  • Once you get settled, find a paper to get data from and download to your computer. Then you can open the file in R to have a look at it:

      data <- read.delim("~/Desktop/PanTHERIA_1-0_WR05_Aug2008.txt")
    
  • Look through the metadata and begin to figure out which columns are useful, what the units are, etc.

  • You can use names(data) to pull out just the names for each column, which can make it easier to extract just the ones you'd like to keep.

  • Make sure that any meaningless info is removed, and that NAs are in place where data is absent.

How to use .df.melt

You cannot write a download function without using .df.melt. .df.melt turns your downloaded data into a format that natdb can work with. If you don't use this function, you may as well not have bothered writing a function at all! .df.melt takes four arguments, only two of which are required:

  • data - a data.frame containing all of the trait measurements you have downloaded.
  • species - the name of the column in data that contains the species names of the trait measurements
  • units - (optional, but recommended!) the units of each of column in data, in the order they appear, with nothing (no NA - nothing!) for column species
  • metadata - (optional, but recommended!) a data.frame containing the meta-data for everything in data. This could be things like the longitude and latitude at which a measurement was made, which is useful information but isn't a trait.

An annotate example download function

Lines 13-56 in the downloads.R file contain sample functions for several different data repositories that you might be pulling data from. Before worrying too much about how to make the Best Function, look through those 5 functions to see if you can create your own from those examples.

    # name your function with the author and year; if there is more than one dataset from a paper, use a, b, c... after the author name/year
    .jones.2009 <- function(...){

        # read in your data from an internet source
        data <- read.delim(ft_get_si("E090-184", "PanTHERIA_1-0_WR05_Aug2008.txt", "esa_archives"))

        # do somethings to make the dataset more useful - add in NAs where appropriate, remove meaningless columns, etc
        for(i in 1:ncol(data))
            data[data[,i]==-999 | data[,i]=="-999",i] <- NA

        # from the metadata, find units for each column; add in as with c() - sample() is used just in this case
        units <- sample(c("g","m^2"),length(names(data))-1,TRUE)

        # use the .df.melt() function to add in the data, species name column, and units
        data <- .df.melt(data, "MSW05_Binomial", units=units)

        # ta-da! data!
        return(data)
    }

Stream-lining your functions

  • Below is the step-by-step constuction of one of these downloaded functions, as written by Will during one of our meetings. It can be helpful to work through the process for the first few functions, but after a while you can write one using just the last step:

      # downloading from harddrive after manually unzipping
      # editing to include NAs, units
      .jones.2009 <- function(...){
          data <- read.delim("~/Desktop/PanTHERIA_1-0_WR05_Aug2008.txt")
          for(i in 1:ncol(data))
              data[data[,i]==-999 | data[,i]=="-999",i] <- NA
          units <- sample(c("g","m^2"),length(names(data))-1,TRUE)
          data <- .df.melt(data, "MSW05_Binomial", units=units)
          return(data)
      }
    
      # downloading .zip, unzipping
      .jones.2009 <- function(...){
          file <- .unzip("PanTHERIA_1-0_WR05_Aug2008.txt", "~/Downloads/ECOL_90_184.zip")
          data <- read.delim(file)
          for(i in 1:ncol(data))
              data[data[,i]==-999 | data[,i]=="-999",i] <- NA
          units <- sample(c("g","m^2"),length(names(data))-1,TRUE)
          data <- .df.melt(data, "MSW05_Binomial", units=units)
          return(data)
      }
    
      # downloading from the internet, unzipping
      .jones.2009 <- function(...){
          file <- tempfile()
          download.file("https://ndownloader.figshare.com/files/5604752", file)
          file <- .unzip("PanTHERIA_1-0_WR05_Aug2008.txt", file)
          data <- read.delim(file)
          for(i in 1:ncol(data))
              data[data[,i]==-999 | data[,i]=="-999",i] <- NA
          units <- sample(c("g","m^2"),length(names(data))-1,TRUE)
          data <- .df.melt(data, "MSW05_Binomial", units=units)
          return(data)
      }
    
      # downloading directly from ESA (in this case), selecting desired file to retrieve from database
      # this is specific to which database you're downloading from (ESA, Dryad, etc...)
      .jones.2009 <- function(...){
          data <- read.delim(ft_get_si("E090-184", "PanTHERIA_1-0_WR05_Aug2008.txt", "esa_archives"))
          for(i in 1:ncol(data))
              data[data[,i]==-999 | data[,i]=="-999",i] <- NA
          units <- sample(c("g","m^2"),length(names(data))-1,TRUE)
          data <- .df.melt(data, "MSW05_Binomial", units=units)
          return(data)
      }
    
      # test the bad boi
      test <- .jones.2009()`