Skip to content
wolfderby edited this page Sep 1, 2022 · 44 revisions

Getting Started

Instructions to execute the standardize FAERS data and generate safety signals ETL process in Pentaho

System Prerequisites

  • Postgres 9.1+ (create table if not exists)
  • Pentaho Spoon (built w/ 8.4 then 9.2)
  • AWS s3 bucket and credentials

Reference Data Prerequisites

- a license for the latest available OHDSI CDMV5 Vocabulary tables from the OHDSI[Athena website](www.ohdsi.org/web/athena/)

1. Clone in repo

  • git clone https://github.com/dbmi-pitt/faersdbstats.git

2. Setting up your config file

  • Open stage_0_set_pentaho/example_config.config
  • Add your values
  • Save as faers_config.config in repo's parent directory (BASE_FILE_DIR)

Overview of sections:

  1. LOCAL DATA HANDLING - where your files will be locally
    • REBUILD_ALL_TIME_DATA_LOCALLY=1 triggers local data (data_from_s3) deletion and redownload in (./s3_data_download.sh)
  2. TIMEFRAME - LOAD_ALL_TIME=1 and LOAD_NEW_QUARTER=Q1 and LOAD_NEW_YEAR=2022 are defined here, ie Q1 and YYYY.
  3. LOCAL LOG - Give your local log a name in this suggested format LOG-MONTH-YEAR-load.txt ie LOG-July-2022-load.txt
  4. ORANGE BOOK - Orange book data download link location
  5. AWS - s3 bucket information ($aws configure list command should return same values)
  6. DATABASE - main postgres database config vars
  7. PENTAHO LOG DATABASE - name the table that will reside in your main database's public schema
  8. COMPARISON DATABASE - if you have an older database to compare against, define it here, otherwise repeat main db's configuration to prevent errors

ie: ${BASE_FILE_DIR} would be the '/path/to/your' of '/path/to/your/repo' (repo's parent directory)

3. Open ./meta.kjb in pentaho

You can right click through the meta.kjb to easily open other stages

image

4. Run stage_#'s in order