-
Notifications
You must be signed in to change notification settings - Fork 21
Config, databases, and SFTP
There are two ARAX config files: config_secrets.json
and config_dbs.json
.
This config file is auto-downloaded to machines running ARAX code (appx. every 24 hours, using the same auto-download system as the old configv2.json
) from the 'master' copy on araxconfig.rtx.ai
at /home/araxconfig/config_secrets.json
.
This config file is meant to contain things that should never be checked into the repo or shared publicly (usernames/passwords, etc.).
In the event you want to override the 'master' config_secrets.json
, simply create a local copy of config_secrets.json
, rename it config_secrets_local.json
, and edit its contents how you'd like. If a config_secrets_local.json
is present it will always be used preferentially over config_secrets.json
.
This file lives in the RTX repo (at RTX/code/config_dbs.json
). It essentially contains the paths for the 'master' copies of the various databases on arax-databases.rtx.ai
that are auto-downloaded to machines running ARAX code (by ARAX_database_manager.py
), as well as the paths of the current KG2pre/KG2c Neo4j endpoints.
NOTE: The root of the paths in config_dbs.json
(i.e., /translator/data/orangeboard/databases/
) is not the current root path for databases on arax-databases.rtx.ai
(which is actually /home/rtxconfig/
); they are legacy paths from our old database storage location. ITRB could not adjust their scripts to work with the new root paths when we moved to arax-databases.rtx.ai
, so we left those root paths as they were in config_dbs.json
, and instead RTXConfiguration
maps from the old root paths to the current root paths as appropriate. It's silly, but it works. When you upload a database to arax-databases.rtx.ai
, just put it under the proper KG2 directory (e.g., /home/rtxconfig/KG2.8.0
).
Note: Before pushing a change to config_dbs.json
in master
, ensure that any new databases pointed to have already been uploaded to arax-databases.rtx.ai
in the proper KG2 directory as well as to the ITRB SFTP server! If you point config_dbs.json
(in the master
branch) to a database that does not exist in both of those two places, things will break.
When updating a database, follow the steps in the section Steps when updating a database to ensure nothing breaks!
RTXConfiguration
dynamically determines a machine's 'maturity' (based on current branch and/or instance/domain name), which is used to select which Plover KG2 URLs to use. But it also provides a mechanism for overriding that maturity. If, for example, you wanted your own machine to run as 'production' maturity, simply create a local one-line file called maturity_override.txt
that contains that maturity:
echo "production" > RTX/code/maturity_override.txt
Remember to delete your local override file after you're done!
By default, ARAX will determine the correct URL to use for querying KG2 (hosted in Plover2.0) by looking at KG2's SmartAPI registration. If, however, you want ARAX to use a different KG2 URL (e.g., you're working on rolling out a new KG2 version), you can force ARAX to use a certain KG2 URL by putting it in the plover_url_override
slot in RTX/code/config_dbs.json
. So, for instance, that line in config_dbs.json
might look like this:
"plover_url_override": "https://kg2cplover.rtx.ai:9990",
When you're done, be sure to set the plover_url_override
slot back to null
.
When you need to update one of the auto-downloaded databases listed in config_dbs.json
, whether for a new KG2 version or for any other reason, follow these steps (order is important!):
- Make sure to give the new/updated database a new (unique) name (e.g., bump v1.0 --> v1.1, or KG2.X.1 --> KG2.X.2, as appropriate)
- Locally or in the branch you're working in (if applicable), update
config_dbs.json
to refer to the new database name - Test the new database locally
- This includes running the ARAX pytest suite! Make sure you didn't break any tests.
- If all tests pass, upload the database to
arax-databases.rtx.ai
under the proper KG2 directory (e.g.,/home/rtxconfig/KG2.8.0
)- Before uploading, ensure there is enough free disk space on
arax-databases.rtx.ai
(e.g., usingdf -h
)
- Before uploading, ensure there is enough free disk space on
- Copy the database from
arax-databases.rtx.ai
toarax.ncats.io
:- First ensure there is enough free disk space on
arax.ncats.io
- Then run these commands:
scp [email protected]
cd ../../data/orangeboard/databases/KG2.X.Y
scp [email protected]:/home/rtxconfig/KG2.X.Y/my_database_v1.1_KG2.X.Y.sqlite .
- First ensure there is enough free disk space on
- Follow the steps in the section Uploading databases to ITRB's SFTP server to upload the new database and its md5sum to the ITRB SFTP server
- Update
config_dbs.json
inmaster
to point to the new database.- If you're working in a branch, merge your branch into
master
at this point; this should carry your previous change toconfig_dbs.json
intomaster
. - This should trigger an auto-deployment to ITRB's Staging instances, which should already have access to the new database thanks to Step 6.
- If you're working in a branch, merge your branch into
- At this point it's safe for
master
to be rolled out toarax.ncats.io
.- It's generally a good idea to run the DatabaseManger when doing so, but it shouldn't be required.
- Download the new database to
cicd.rtx.ai
(automatic downloads to that instance don't currently work quite right), via these steps:ssh [email protected]
cd RTX/
git pull origin master
python3 code/ARAX/ARAXQuery/ARAX_database_manager.py --mnt --skip-if-exists --remove_unused
- Note: You can do this step either before or after Step 8. Prior to completing this step, commits may show as 'Failing' in GitHub.
In addition to arax-databases.rtx.ai
, all databases must be uploaded to ITRB's SFTP server, which is the instance ITRB's system downloads databases from.
ITRB manages users for the SFTP server (contact them if you need to gain access).
When uploading databases to the SFTP server, you need to upload not only the database file itself, but also its md5 checksum.
Below is a complete example showing how to upload a single database (in this case, curie_to_pmids_v1.0_KG2.7.6.sqlite
) and its md5 checksum to ITRB's SFTP server:
ssh [email protected]
cd /data/orangeboard/databases/KG2.7.6
sudo bash
md5sum curie_to_pmids_v1.0_KG2.7.6.sqlite > curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit
sftp [email protected]
cd databases/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite
cd ../../md5_sums/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit
Generally it's easier to upload all the new databases for a new KG2 version to the SFTP server in one batch. Below is an example of doing so for the KG2.8.0 databases:
# First upload all database files to the SFTP server
ssh arax.ncats.io
cd /data/orangeboard/databases/KG2.8.0
sftp team-expander-[myuser]@sftp.transltr.io
cd databases
mkdir KG2.8.0
cd KG2.8.0
put *2.8.0*
exit
# Then create their md5 checksums and upload those as well
sudo bash
mkdir md5_sums
chmod 777 md5_sums
exit
for file in *2.8.0*; do md5sum ${file} > md5_sums/${file}.md5; done
cd md5_sums
sftp team-expander-[myuser]@sftp.transltr.io
cd md5_sums
mkdir KG2.8.0
cd KG2.8.0 <------ IMPORTANT. Missed last time for 2.9.0
put *2.8.0*
exit
You do not need to warn ITRB when deploying a new database; simply ensure that you have uploaded it and its md5 checksum to the SFTP server in the way shown above, and then push your code change to config_dbs.json
that points to that new database. If your commit was to master
it will trigger a rebuild of the ITRB ARAX CI instance (arax.ci.transltr.io
); it would be wise to test this instance to ensure it seems to be working properly. Note that if your commit involved pointing to a new database in config_dbs.json
, you may need to wait up to around an hour to test the instance since it will take the system a while to download the new database(s) while it's rebuilding. If the system isn't working after said timeframe, post a message in the devops-teamexpanderagent
channel in the Translator slack workspace and 'at' @Pouyan Ahmadi
.