RADAR-base · keyvaann · Jun 25, 2024 · May 29, 2024 · May 31, 2024 · May 31, 2024
diff --git a/README.md b/README.md
@@ -27,6 +27,7 @@ The Kubernetes stack of RADAR-base platform.
    * [Configure](#configure)
    * [Install](#install)
 - [Usage and accessing the applications](#usage-and-accessing-the-applications)
+- [Service-specific documentation](#service-specific-documentation)
 - [Troubleshooting](#troubleshooting)
 - [Upgrade instructions](#upgrade-instructions)
    * [Upgrade to RADAR-Kubernetes version 1.1.x](#upgrade-to-radar-kubernetes-version-11x)
@@ -328,6 +329,11 @@ https://k8s.radar-base.org
 Now you can head over to the [Management Portal](https://radar-base.atlassian.net/wiki/spaces/RAD/pages/49512484/Management+Portal) guide for next steps.
 
 
+## Service-specific documentation
+
+- [Data Dashboard Backend data transformation](docs/ksql-server_for_data-dashboard-backend.md)
+
+
 ## Troubleshooting
 
 If an application doesn't become fully ready, installation will not be successful. In this case, you should investigate the root cause by investigating the relevant component. It's suggested to run the following command when `helmfile sync` command is running so you can keep an eye on the installation:
@@ -386,6 +392,15 @@ Run the following instructions to upgrade an existing RADAR-Kubernetes cluster.
 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | Upgrading the major version of a PostgreSQL image is not supported. If necessary, we propose to use a `pg_dump` to dump the current data and a `pg_restore` to restore that data on a newer version. Please find instructions for this elsewhere. |
 
+### Upgrade to RADAR-Kubernetes version >=1.1.4
+
+In `production.yaml` rename sections:
+
+| Old Name                 | New Name                           |
+|--------------------------|------------------------------------|
+| radar_jdbc_connector     | radar_jdbc_connector_grafana       |
+| radar_jdbc_connector_agg | radar_jdbc_connector_realtime_dashboard |
+
 ### Upgrade to RADAR-Kubernetes version 1.1.x
 Before running the upgrade make sure to copy `environments.yaml.tmpl` to `environments.yaml` and if you've previously changed `environments.yaml` apply the changes again. This is necessary due to addition of `helmDefaults` and `repositories` configurations to this file.
 

diff --git a/docs/ksql-server_for_data-dashboard-backend.md b/docs/ksql-server_for_data-dashboard-backend.md
@@ -0,0 +1,46 @@
+# Kafka-data transformer (ksql-server) for data-dashboard-backend service
+
+Reference: https://docs.ksql-server.io/
+
+The data-dashboard-backend service uses data derived from Kafka topics imported into the _observation_ table
+data-dashboard-backend service database. The data in the Kafka topics is transformed by the ksql-server data transformer
+to be imported into the _observation_ table.
+
+The ksql-server data transformer is able to register Consumer/Producers to Kafka that transform data in a topic and
+publish the results to another topic.
+
+The provided ksql-server _questionnaire_response_observations.sql_ and _questionnaire_app_events_observation.sql_ SQL files
+transform, respectively, the _questionnaire_response_ and _questionnaire_app_event_ topics and publish the data to the
+_ksql_observations_ topic. The _ksql_observations_ topic is consumed by the radar-jdbc-connector service deployed for the
+data-dashboard-backend service (see: [20-data-dashboard.yaml](../helmfile.d/20-dashboard.yaml)).
+
+When transformation of other topics is required, new SQL files can be added to this directory. These new files should be
+referenced in the _cp-ksql-server_ -> ksql -> queries_ section of the `etc/base.yaml.gotmpl` file. New ksql-server SQL
+files should transform towards the following format of the _ksql_observations_ topic:
+
+```
+    TOPIC KEY:
+      PROJECT: the project identifier
+      SOURCE: the source identifier
+      SUBJECT: the subject/study participant identifier
+    TOPIC VALUE:
+      TOPIC: the topic identifier
+      CATEGORY: the category identifier (optional)
+      VARIABLE: the variable identifier
+      DATE: the date of the observation
+      END_DATE: the end date of the observation (optional)
+      TYPE: the type of the observation (STRING, STRING_JSON, INTEGER, DOUBLE)
+      VALUE_TEXTUAL: the textual value of the observation (optional, must be set when VALUE_NUMERIC is NULL)
+      VALUE_NUMERIC: the numeric value of the observation (optional, must be set when VALUE_TEXTUAL is NULL)
+```
+
+New messages are added to the _ksql_observations_ topic by inserting into the _observations_ stream (
+see [_base_observations_stream.sql](_base_observations_stream.sql)):
+
+```
+INSERT INTO observations
+SELECT
+...
+PARTITION BY q.projectId, q.userId, q.sourceId
+EMIT CHANGES;
+```
diff --git a/etc/base-secrets.yaml b/etc/base-secrets.yaml
@@ -90,6 +90,8 @@ management_portal:
       client_secret: secret
     radar_push_endpoint:
       client_secret: secret
+    radar_data_dashboard_backend:
+      client_secret: secret
   smtp:
     password: change_me
 
@@ -124,8 +126,9 @@ oura_api_secret: change_me
 radar_rest_sources_backend:
   postgres:
     password: secret
-# --------------------------------------------------------- 20-grafana.yaml ---------------------------------------------------------
+# --------------------------------------------------------- 20-dashboard.yaml ---------------------------------------------------------
 timescaledb_password: secret
+data_dashboard_backend_db_password: secret
 grafana_password: secret
 grafana_metrics_password: secret
 

diff --git a/etc/base.yaml b/etc/base.yaml
@@ -290,10 +290,12 @@ radar_rest_sources_backend:
     garmin:
       enable: "false"
 
-# --------------------------------------------------------- 20-grafana.yaml ---------------------------------------------------------
+# --------------------------------------------------------- 20-dashboard.yaml ---------------------------------------------------------
 
 timescaledb_username: postgres
 timescaledb_db_name: grafana-metrics
+data_dashboard_backend_db_name: data-dashboard
+data_dashboard_backend_db_username: postgres
 grafana_metrics_username: postgres
 
 timescaledb:
@@ -315,23 +317,63 @@ timescaledb:
       # Uncomment when upgrading
       #existingClaim: "data-timescaledb-postgresql-0"
 
+# Make sure to set:
+#- radar_jdbc_connector_grafana._install to true
+#- ksql_server._install to true
 radar_grafana:
-  _install: true
+  _install: false
   _chart_version: 6.26.8
   _extra_timeout: 0
   replicaCount: 1
   env:
     GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH: /var/lib/grafana/dashboards/allprojects/home.json
 
-radar_jdbc_connector:
-  _install: true
+# Make sure to set:
+#- radar_jdbc_connector_data_dashboard_backend._install to true
+#- ksql_server._install to true
+data_dashboard_backend:
+  _install: false
+  _chart_version: 0.1.0
+  _extra_timeout: 0
+  replicaCount: 1
+
+# Install when radar_grafana._install is 'true'
+radar_jdbc_connector_grafana:
+  _install: false
   _chart_version: 0.5.1
   _extra_timeout: 0
   replicaCount: 1
   sink:
     # Change the list of topics if you have dashboards that read other data or if you don't have certain topics available on your cluster.
     topics: android_phone_relative_location, android_phone_battery_level, connect_fitbit_intraday_heart_rate, connect_fitbit_intraday_steps
 
+# Install when data_dashboard_backend._install is 'true'
+radar_jdbc_connector_data_dashboard_backend:
+  _install: false
+  _chart_version: 0.5.1
+  _extra_timeout: 0
+  replicaCount: 1
+
+# Install when using realtime analysis
+radar_jdbc_connector_realtime_dashboard:
+  _install: false
+  _chart_version: 0.5.1
+  _extra_timeout: 0
+  replicaCount: 1
+
+# Install when:
+#- radar_grafana._install is 'true'
+#- data_dashboard_backend._install is 'true'
+#- using realtime analysis
+ksql_server:
+  _install: false
+  _chart_version: 0.3.1
+  _extra_timeout: 0
+  # -- Uncomment when using real-time analysis
+  # ksql:
+  #   headless: false
+  # --
+
 # --------------------------------------------------------- 20-ingestion.yaml ---------------------------------------------------------
 
 radar_gateway:
@@ -488,30 +530,6 @@ radar_push_endpoint:
   garmin:
     enabled: true
 
-# --------------------------------------------------------- 40-realtime-analyses.yaml ---------------------------------------------------------
-
-radar_jdbc_connector_agg:
-  _install: false
-  _chart_version: 0.5.1
-  _extra_timeout: 0
-  replicaCount: 1
-
-ksql_server:
-  _install: false
-  _chart_version: 0.3.1
-  _extra_timeout: 0
-  replicaCount: 1
-  servicePort: 8088
-  kafka:
-    bootstrapServers: PLAINTEXT://cp-kafka:9092
-  cp-schema-registry:
-    url: http://cp-schema-registry:8081
-  ksql:
-    headless: false
-  configurationOverrides:
-    "ksql.server.url": "http://0.0.0.0:8088"
-    "ksql.advertised.listener": "http://ksql-server:8088"
-
 # --------------------------------------------------------- 99-velero.yaml ---------------------------------------------------------
 
 velero:

diff --git a/etc/base.yaml.gotmpl b/etc/base.yaml.gotmpl
@@ -27,9 +27,15 @@ radar_grafana:
 #  google_application_credentials: {{ readFile "../etc/radar-appserver/firebase-adminsdk.json" | quote }}
 #*/}}
 
-# Remove below Go comment to read the queries.sql and set the queries
-# in the ksql_server
-#ksql_server:
+# If data transformation of kafka topic data is needed, please remove the Go template comments and yaml comments.
+# Make sure to reference a ksql transformation file that contains the required transformation logic.
+# The files below are transform the data from the questionnaire_response and questionnaire_app_events topics to the
+# ksql_observations topic, used by the data-dashboard-backend. If using the data-dashboard-backend, please make sure
+# to uncomment the relevant ksql transformer files.
+# Note: never remove the _base_observations_stream.sql file.
+# ksql_server:
 #  ksql:
 #    queries: |
-#      {{/*- readFile "cp-ksql-server/queries.sql" | nindent 8 */}}
+#      {{/*- readFile "../etc/cp-ksql-server/_base_observations_stream.sql"             | nindent 8 */}}
+#      {{/*- readFile "../etc/cp-ksql-server/questionnaire_response_observations.sql"   | nindent 8 */}}
+#      {{/*- readFile "../etc/cp-ksql-server/questionnaire_app_event_observations.sql"  | nindent 8 */}}
diff --git a/etc/cp-ksql-server/_base_observations_stream.sql b/etc/cp-ksql-server/_base_observations_stream.sql
@@ -0,0 +1,20 @@
+SET 'auto.offset.reset' = 'earliest';
+
+-- Register the 'ksql_observations' topic (is created when not exists).
+CREATE STREAM observations (
+    PROJECT VARCHAR KEY,    -- 'KEY' means that this field is part of the kafka message key
+    SUBJECT VARCHAR KEY,
+    SOURCE VARCHAR KEY,
+    TOPIC_NAME VARCHAR,
+    CATEGORY VARCHAR,
+    VARIABLE VARCHAR,
+    OBSERVATION_TIME TIMESTAMP,
+    OBSERVATION_TIME_END TIMESTAMP,
+    TYPE VARCHAR,
+    VALUE_NUMERIC DOUBLE,
+    VALUE_TEXTUAL VARCHAR
+) WITH (
+    kafka_topic = 'ksql_observations',
+    partitions = 3,
+    format = 'avro'
+);
diff --git a/etc/cp-ksql-server/queries.sql b/etc/cp-ksql-server/queries.sql
diff --git a/etc/cp-ksql-server/questionnaire_app_event_observations.sql b/etc/cp-ksql-server/questionnaire_app_event_observations.sql
@@ -0,0 +1,31 @@
+CREATE STREAM questionnaire_app_event (
+    projectId VARCHAR KEY,    -- 'KEY' means that this field is part of the kafka message key
+    userId VARCHAR KEY,
+    sourceId VARCHAR KEY,
+    questionnaireName VARCHAR,
+    eventType VARCHAR,
+    time DOUBLE,
+    metadata MAP<VARCHAR, VARCHAR>+
+) WITH (
+    kafka_topic = 'questionnaire_app_event',
+    partitions = 3,
+    format = 'avro'
+);
+
+INSERT INTO observations
+WITH (QUERY_ID='questionnaire_app_event_observations')
+SELECT
+    q.projectId AS PROJECT,
+    q.userId AS SUBJECT,
+    q.sourceId AS SOURCE,
+    'questionnaire_app_event' as TOPIC_NAME,
+    CAST(NULL as VARCHAR) as CATEGORY,
+    q.questionnaireName as VARIABLE,
+    FROM_UNIXTIME(CAST(q.time * 1000 AS BIGINT)) as OBSERVATION_TIME,
+    CAST(NULL as TIMESTAMP) as OBSERVATION_TIME_END,
+    'STRING_JSON' as TYPE,
+    CAST(NULL as DOUBLE) as VALUE_NUMERIC,
+    TO_JSON_STRING(q.metadata) as VALUE_TEXTUAL
+FROM questionnaire_app_event q
+PARTITION BY q.projectId, q.userId, q.sourceId -- this sets the fields in the kafka message key
+EMIT CHANGES;
diff --git a/etc/cp-ksql-server/questionnaire_response_observations.sql b/etc/cp-ksql-server/questionnaire_response_observations.sql
@@ -0,0 +1,83 @@
+CREATE STREAM questionnaire_response (
+    projectId VARCHAR KEY, -- 'KEY' means that this field is part of the kafka message key
+    userId VARCHAR KEY,
+    sourceId VARCHAR KEY,
+    time DOUBLE,
+    timeCompleted DOUBLE,
+    timeNotification DOUBLE,
+    name VARCHAR,
+    version VARCHAR,
+    answers ARRAY<STRUCT<questionId VARCHAR, value STRUCT<int INT, string VARCHAR, double DOUBLE>, startTime DOUBLE, endTime DOUBLE>>
+) WITH (
+    kafka_topic = 'questionnaire_response',
+    partitions = 3,
+    format = 'avro'
+);
+
+CREATE STREAM questionnaire_response_exploded
+AS SELECT
+    EXPLODE(TRANSFORM(q.answers, a => a->questionId)) as VARIABLE,
+    FROM_UNIXTIME(CAST(q.time * 1000 AS BIGINT)) as OBSERVATION_TIME,
+    q.projectId,
+    q.userId,
+    q.sourceId,
+    'questionnaire_response' as TOPIC_NAME,
+    q.name as CATEGORY,
+    CAST(NULL as TIMESTAMP) as OBSERVATION_TIME_END,
+    -- WARNING!!! The cast from VARCHAR (string) to DOUBLE will throw an JAVA exception if the string is not a number.
+    -- This does not mean that the message will be lost. The value will be present in the VALUE_TEXTUAL_OPTIONAL field.
+    EXPLODE(TRANSFORM(q.answers, a => COALESCE(a->value->double, CAST(a->value->int as DOUBLE), CAST(a->value->string as DOUBLE)))) as VALUE_NUMERIC,
+    EXPLODE(TRANSFORM(q.answers, a => CASE
+        WHEN a->value->int IS NOT NULL THEN 'INTEGER'
+        WHEN a->value->double IS NOT NULL THEN 'DOUBLE'
+        ELSE NULL
+    END)) as TYPE,
+    -- Note: When cast to double works for the string value, the VALUE_TEXTUAL_OPTIONAL will also be set.
+    EXPLODE(TRANSFORM(q.answers, a => a->value->string)) as VALUE_TEXTUAL_OPTIONAL
+FROM questionnaire_response q
+EMIT CHANGES;
+
+INSERT INTO observations
+WITH (QUERY_ID='questionnaire_response_observations')
+SELECT
+   q.projectId as PROJECT,
+   q.sourceId as SOURCE,
+   q.userId as SUBJECT,
+   TOPIC_NAME, CATEGORY, VARIABLE, OBSERVATION_TIME, OBSERVATION_TIME_END,
+   CASE
+       WHEN TYPE IS NULL AND VALUE_NUMERIC IS NOT NULL THEN 'DOUBLE' -- must have been derived from a string cast
+       WHEN TYPE IS NULL AND VALUE_NUMERIC IS NULL THEN 'STRING'
+       ELSE TYPE                                                     -- keep the original type when TYPE is not NULL
+   END as TYPE,
+   VALUE_NUMERIC,
+    CASE
+        WHEN VALUE_NUMERIC IS NOT NULL THEN NULL                     -- When cast to double has worked for the string value, set VALUE_TEXTUAL to NULL.
+        ELSE VALUE_TEXTUAL_OPTIONAL
+    END as VALUE_TEXTUAL
+FROM questionnaire_response_exploded q
+PARTITION BY q.projectId, q.userId, q.sourceId -- this sets the fields in the kafka message key
+EMIT CHANGES;
+
+-- TODO: exploding the 'select:' questions is not yet fully designed.
+-- I keep the code here for future reference.
+-- Multi-select questionnaire questions are stored as a single 'value' string with the
+-- names of the selected options separated by comma's. Multiselect questions are prefixed
+-- by 'select:' in the questionId.
+-- When 'questionId' is like 'select:%' create a new stream with the select options.
+-- The options in the value field split commas and added as separate VARIABLE records.
+-- The VALUE_NUMERIC is set to 1 and VALUE_TEXTUAL is set to NULL.
+-- INSERT INTO observations
+-- SELECT
+--     EXPLODE(SPLIT(VALUE_TEXTUAL, ',')) as VARIABLE,
+--     PROJECT, SOURCE, SUBJECT, TOPIC_NAME, CATEGORY, OBSERVATION_TIME, OBSERVATION_TIME_END,
+--     'INTEGER' as TYPE,
+--     CAST(1 as DOUBLE) VALUE_NUMERIC,
+--     CAST(NULL as VARCHAR) as VALUE_TEXTUAL
+-- FROM questionnaire_response_observations
+-- WHERE
+--  VARIABLE IS NOT NULL
+--  AND VARIABLE LIKE 'select:%'
+--  AND VALUE_TEXTUAL IS NOT NULL
+--  AND VALUE_TEXTUAL != ''
+-- PARTITION BY SUBJECT, PROJECT, SOURCE
+-- EMIT CHANGES;
diff --git a/etc/cp-ksql-server/values.yaml b/etc/cp-ksql-server/values.yaml
@@ -1,6 +1,12 @@
 # KSQL configuration options
 ## ref: https://docs.confluent.io/current/ksql/docs/installation/server-config/config-reference.html
+kafka:
+    bootstrapServers: PLAINTEXT://cp-kafka:9092
+cp-schema-registry:
+    url: http://cp-schema-registry:8081
 configurationOverrides:
+    "ksql.server.url": "http://0.0.0.0:8088"
+    "ksql.advertised.listener": "http://ksql-server:8088"
     "ksql.logging.processing.topic.auto.create": true
     "ksql.logging.processing.stream.auto.create": true
     "ksql.service.id": radar_default_