Skip to content

Open Genes infrastructure

Constantine Rafikov edited this page Dec 10, 2022 · 3 revisions
infrastructure

The code for the client part of the application (frontend) and the server part of the application (backend) are stored in separate repositories. The work on the code is carried out using GitFlow. Frontend and backend are built separately. Frontend and backend deployment also happens independently of each other.

Open Genes consists of several applications. Read more in "Open Genes Infrastructure"

Open Genes Infrastructure

Environment

Genes runs its applications on the DigitalOcean cloud platform.

Applications

Open Genes consists of several applications. All of these applications form the Open Genes infrastructure.

Frontend

open-genes/open-genes-frontend

The Open Genes website is a front-end written in TypeScript, Angular. The web application allows you to search, filter, sort, group data from the Open Genes API using a graphical interface. The Frontend is being developed in Typescript on the Angular 13 framework. Frontend is a client application and categorized as SPA (single page web application). We are planning to move towards PWA (Progressive Web App). When built, the application is interpreted in JavaScript and stored in dist/frontend folder. Build parameters are configured in angular.json, dependencies are automatically written in package.json. Lazy modules are used to speed up the web application.

Backend

API

The Open Genes API is a REST API with GET endpoints that return data from the database on genes, research and entities related to genes - diseases, orthologs, GO terms, protein categories, aging mechanisms, origin of the gene and its family, and other different categories. Description of endpoints and response formats can be read here: https://open-genes.com/api/docs

Legacy API in PHP 7 (Yii2). During development, the team was guided by the 12factor.net methodology. Dependencies are provided in Composer, the runtime configuration is set using .env. Project configuration (dsn for database connections, etc.) is stored in environment variables. To work with them, the phpdotenv package is installed. In /app directory in the project is located a file called .env.sample with values for the dev environment. It is copied to app/.env. Server Variables stored in the GitHub settings and exported when uploading. The development of API (app/genes folder) followed the rules of PSR methodology. The project was developed close to the principles of layered architecture, logic was isolated from the framework and infrastructure.

The application is no longer supported at this time. Most of the endpoints have been moved to the new API, but some endpoints is implemented in this old project. The redirect for the PHP API is made at the nginx config level. If Python API returns 404, then the request is passed to the PHP API.

New REST API in Python 3, developed on the FastAPI framework.

Console scripts

Console scripts for various purposes written in PHP and Python. Basically, these are scripts for parsing data from other databases that run in CRON according to a schedule.

  • Getting data from UniProt
  • Automatic translation of entities using the Google Translate API (names can be edited later in the CMS)
  • Obtaining protein classes from the Human Protein Atlas
  • Obtaining gene data from the Human Protein Atlas
  • Getting diseases from eDGAR
  • Obtaining expression level data from NCBI
  • Obtaining gene orthologs from NCBI
  • Obtaining the id of the gene orthologues in flies from the Flybase database
  • Getting a text description of genes NCBI summary from MyGene
  • Obtaining ICD codes parsed from the eDgar database
  • Obtaining categories of diseases from the WHO database
  • Getting Gene Ontology (GO terms) annotation categories from QuickGO API

Assignment of aging mechanisms to genes by paired GO terms

CMS

open-genes/open-genes-cms

CMS is an CRUD app developed for the needs of biologists. It's written on PHP 7 (Yii2) and designed to manually add data from studies to the database. It also provides moderators with ability to check the data added by contributors, as well as editing data collected automatically.

CMS is directly connected to the database, and database migrations are also being managed here. The dev environment is deployed in Docker. To date this is one of the oldest parts of the project along with the legacy API and the bottleneck of the entire infrastructure.

Publications Search API

open-genes/og-publications-search-api

This service has been implemented on Node.js, which searches for scientific publications in PubMed by the list of genes from the Open Genes database and obtains information about the articles by DOI. It sends the frontend the data it collects, sending requests to the Open Genes API and to the ESearch API to search for articles by genes and to Plu.mx's API service to get data regarding the articles by DOI.

DB

Database on MySQL. Changes to the database are applied using migrations. Data in the database is saved from console scripts during automatic parsing and through CMS app where biologists manually edit the data.

DevOps

Infrastructure solutions for delivering applications and deploying them on a server. Slightly different for frontend and backend.

For the frontend, CI / CD is configured via GitHub Actions with the ability to build and deploy for different environments. After a checkout the entire application is being wrapped in a Docker container, then npm modules are installed and built. After assembled frontend bundle applications are transferred to a folder on the server with rsync.

For the backend, CI \ CD is configured via GitHub Actions with the ability to build and deploy for different environments. Unlike frontend, backend applications are already wrapped in Docker containers. For CMS project assembly and deployment are done in separate pipelines, for API - in a single pipeline.

The CMS and legacy API applications run inside Docker only on a local machine. When deployed on a server they use PHP environment. New API runs inside Docker both on local machine and on server. Both the API application and the CMS application use the container registry when deployed. We use both DigitalOcean container registry and DockerHub. We use Watchtower to run and stop Docker containers automatically after a deployment.

Clone this wiki locally