Skip to content

2. Castor Framework

Lochhead edited this page Mar 13, 2024 · 2 revisions

Model Framework

Castor is primarily built using the R programming language, supplemented with some JAVA (used in the blockingCastor module for building homogeneous harvest units). One of the advantages of using R is the large user group who actively create 'packages' to do various analytical functions. This has resulted in a strong library of statistical packages, but also a diversity of data management and visualization packages. Working in R therefore provides a great deal of flexibility. R packages are also typically well documented, so it is usually straightforward to learn how to use and apply the functions in a given package.

SpaDES R Package

The Castor model is implemented using the SpaDES package within R. SpaDES provides a modular structure for implementing discrete-event, spatial simulation models, similar to other discrete-event simulation software (e.g., SELES). Events are 'modules' which are programmed and scheduled in discrete time within SpaDES. These 'modules' are intended to represent sub-processes within a larger modelling system.

SpaDES modules typically consist of two main files, a .R file and .Rmd file. The .R file contains the functions that action the 'events' within the module. It also defines the modules expected inputs (parameters) and outputs that preform the "data hand-shake" among several modules. The .Rmd file provides the text description of the module, and importantly, contains the code chunk needed to show-case the functions coded in the .R file. The .Rmd code chunk typically defines what modules to simulate, their parameters or objects needed to run the module.

Another way to think of the relationship between these two files, is that the .R file contains the functions to implement the desired model logic and the .Rmd file is used to modify flexible parameters in the module. The code in the .R file should not get modified frequently, for example, you should not need to change it to run alternative simulation scenarios. It contains the generalized, stable and consistent model logic. However, the .Rmd file can be used to modify elements of the .R code that are flexible, for example, changing the harvest flow or harvest priority.

It can take some time to get familiar with the SpaDES structure, but more resources can be obtained using the SpaDES getting started guide. If you are interested in using Castor, we recommend spending some time familiarizing yourself with SpaDES, including reviewing the example model.

Two-step Process

We developed Castor to run in two steps: 1. data set building/caching, and 2. simulation. Each step is 'anchored' by a module, i.e., dataCastor and forestryCastor, respectively. We separate these two steps to help keep the simulation process fast and efficient, by handling the bulk of the data compilation process outside of the simulation process.Note that SpaDES has functionality to help assist with the same caching capabilities - however, we chose to store the resulting data into a database structure to allow for better dissemination and querying capabilities.

Data Building Step

All Castor models use an SQLite database object as the required input to function. This database houses the data needed to run the forestry modules (e.g. forest inventory characteristics, yield curves and zone constraints). We call this database, 'castordb' and it is an object that gets created by running the dataCastor module.

Castor was designed to model large spatial extents at a high resolution. To meet this objective, in step one of the process, i.e., the data building process using dataCastor module, Castor connects to a PostgreSQL database which has been designed to store the information required to parameterize a Castor model. This information includes, for example data from the forest inventory, growth and yield models and road infrastructure. This information is manipulated to be relevant to any area of interest determined by the user. The output of this process is a SQLite database.

The dataCastor process is illustrated in a simplified way below. A PostgreSQL database is currently hosted on a networked computer. The dataCastor module connects to this database to build datasets needed to run the forest harvest simulator for the area of interest (for example, a timber supply area). It may also be connected to other modules that will connect to the PostgreSQL database to build datasets that are needed to model other aspects of the model. For example, if there is interest in pre-defining heterogeneous harvest units, dataCastor can be linked to the blockingCastor module. The blockingCastor module will then run some functions that evaluate similarity between neighboring forest stands identified in the forest inventory based on some stand characteristics, and then 'group' stands of similar characteristics under the same identifier. These identifiers are then stored in the SQLite database. Similarly, if there is interest in modeling road development associated with forest harvest, the roadCastor module can be linked to the dataCastor module to pre-define road pathways to harvest blocks, and output potential simulated roads to the SQLite database.

The SQLite database is an efficient way to store (typically less than 1GB, but can be up to 8 GB for a timber supply area) information to run the forest harvest simulator and associated modules. This provides a relatively portable and computationally fast framework for completing simulation analyses, as the database can be easily shared with other analysts to run simulations, without needing to connect to the original source data, such as the networked postgreSQL database. However, in this framework it is important to define all data in dataCastor that you think you may need for the simulation analysis. For example, if you have not included a spatial layer or parameter needed by blockCastor like say the zones to apply the blocking in the dataCastor, the blockCastor module will need to connect to the postgreSQL database at every simulation run which unnecessarily increases the time to run. Nevertheless, if such changes are needed, the database can be easily re-created later with the same parameters, but adding additional module parameters.

Simulation Step

The simulation step is illustrated in a simplified way below. Forest harvest simulation events are a core piece of Castor, and thus the forestryCastor module is a central module for Castor. The 'castordb' SQLite database is a key input to forestryCastor, as it contains the landscape and growth and yield needed to simulate forest stands over time. Importantly, the dataCastor module is also an important input as there are some parameters that need to be defined for forestryCastor simulations. For example, you can change the castordb, or which zone constraints to include in a scenario using the dataCastor parameters. Similar to dataCastor, the forestryCastor can be connected to other modules to parameterize and simulate other components of the simulation related to forestry. For example, you can connect to roadCastor to define how to simulate roads associated with forestry.

Outputs from forestrycastor (and associated modules) can be saved using the uploadercastor module. Essentially uploadercastor contains functions for saving various desired outputs from a forest harvest simulation, for example, the annual volume harvested. These are typically tables of information, and are identified as 'reports' within uploadercastor. Currently these reports get uploaded to a cloud-based PostgreSQL database. These reports, or tables, are then connected to web applications using 'shiny' that support data summarization and visualization (e.g., harvest flow figures). Alternatively, they can be downloaded from the database to complete additional analysis or visualizations.