forked from ltscomputingllc/faersdbstats
-
Notifications
You must be signed in to change notification settings - Fork 4
Home
wolfderby edited this page Sep 1, 2022
·
44 revisions
Instructions to execute the standardize FAERS data and generate safety signals ETL process in Pentaho
- Postgres 9.1+ (create table if not exists)
- Pentaho Spoon (built w/ 8.4 then 9.2)
- AWS s3 bucket and credentials
- a license for the latest available OHDSI CDMV5 Vocabulary tables from the OHDSI[Athena website](www.ohdsi.org/web/athena/)
git clone https://github.com/dbmi-pitt/faersdbstats.git
- Open stage_0_set_pentaho/example_config.config
- Add your values
- Save as faers_config.config in repo's parent directory (BASE_FILE_DIR)
- LOCAL DATA HANDLING - where your files will be locally
- REBUILD_ALL_TIME_DATA_LOCALLY=1 triggers local data (data_from_s3) deletion and redownload in (./s3_data_download.sh)
- TIMEFRAME - LOAD_ALL_TIME=1 and LOAD_NEW_QUARTER=Q1 and LOAD_NEW_YEAR=2022 are defined here, ie Q1 and YYYY.
- LOCAL LOG - Give your local log a name in this suggested format LOG-MONTH-YEAR-load.txt ie LOG-July-2022-load.txt
- ORANGE BOOK - Orange book data download link location
- Navigate to https://www.fda.gov/drugs/drug-approvals-and-databases/orange-book-data-files
- Get CEM_ORANGE_BOOK_DOWNLOAD_URL value:
- Download the file and set CEM_ORANGE_BOOK_DOWNLOAD_FILENAME to name of file downloaded
- AWS - s3 bucket information (
$aws configure list
command should return same values) - DATABASE - main postgres database config vars
- PENTAHO LOG DATABASE - name the table that will reside in your main database's public schema
- COMPARISON DATABASE - if you have an older database to compare against, define it here, otherwise repeat main db's configuration to prevent errors
ie: ${BASE_FILE_DIR} would be the '/path/to/your' of '/path/to/your/repo' (repo's parent directory)
3. Open ./meta.kjb in pentaho
- Follow wiki pages for additional stage documentation
- Stage 0 Wiki
- Stage 0 Wiki
- Stage 0 Wiki