Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data dashboard backend] Add Data Dashboard Backend service #267

Merged
merged 23 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
735a81d
Add data-dashboard-backend service
pvannierop May 29, 2024
537b1df
Move KSQLDB doc to /docs directory
pvannierop May 31, 2024
df71d19
Add reference to KSQLDB doc in README.md
pvannierop May 31, 2024
0cf64a9
Add extra timeout property to data-dashboard-backend helmfile
pvannierop May 31, 2024
4d02dd3
Use different database for data-dashboard-backend (for backwards comp…
pvannierop May 31, 2024
859d04f
Remove Ingress section from values.yaml of data-dashboard-backend
pvannierop May 31, 2024
c5c925e
Add <<: *logFailedRelease
pvannierop May 31, 2024
8590a80
Place data-dashboard-backend service on /dashboard-data sub-path
pvannierop May 31, 2024
b179559
Add extra timeout property to data-dashboard-backend helmfile
pvannierop May 31, 2024
7018dca
Fix rename of service
pvannierop May 31, 2024
0250885
Update README.md
pvannierop Jun 10, 2024
2a4d3b1
Merge realtime-dashboard and data-dashboard-backend support services …
pvannierop Jun 11, 2024
abb6927
Add migration instructions
pvannierop Jun 12, 2024
51fdead
Fix path to ksql-server transformation file
pvannierop Jun 12, 2024
0df8dd4
Revert default database name to 'grafana-metrics'
pvannierop Jun 12, 2024
9c29557
Add comments explaining when to install ksql-server or a jdbc-connector
pvannierop Jun 12, 2024
2bd32be
Remove data-dashboard-backend value.yaml file
pvannierop Jun 12, 2024
09cfd69
Touch up docs
pvannierop Jun 13, 2024
264a341
Fix config of ksql.headless property
pvannierop Jun 13, 2024
02ea382
Rename TOPIC to TOPIC_NAME
pvannierop Jun 17, 2024
9762fbe
Rename DATE column to OBSERVATION_TIME
pvannierop Jun 17, 2024
c09f960
Add QUERY_ID to ksql insert statements
pvannierop Jun 17, 2024
26f80d7
Update KSQL server config for new column names
pvannierop Jun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ The Kubernetes stack of RADAR-base platform.
* [Configure](#configure)
* [Install](#install)
- [Usage and accessing the applications](#usage-and-accessing-the-applications)
- [Service-specific documentation](#service-specific-documentation)
- [Troubleshooting](#troubleshooting)
- [Upgrade instructions](#upgrade-instructions)
* [Upgrade to RADAR-Kubernetes version 1.1.x](#upgrade-to-radar-kubernetes-version-11x)
Expand Down Expand Up @@ -328,6 +329,11 @@ https://k8s.radar-base.org
Now you can head over to the [Management Portal](https://radar-base.atlassian.net/wiki/spaces/RAD/pages/49512484/Management+Portal) guide for next steps.


## Service-specific documentation

- [Data Dashboard Backend data transformation](docs/ksql-server_for_data-dashboard-backend.md)


## Troubleshooting

If an application doesn't become fully ready, installation will not be successful. In this case, you should investigate the root cause by investigating the relevant component. It's suggested to run the following command when `helmfile sync` command is running so you can keep an eye on the installation:
Expand Down Expand Up @@ -386,6 +392,15 @@ Run the following instructions to upgrade an existing RADAR-Kubernetes cluster.
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Upgrading the major version of a PostgreSQL image is not supported. If necessary, we propose to use a `pg_dump` to dump the current data and a `pg_restore` to restore that data on a newer version. Please find instructions for this elsewhere. |

### Upgrade to RADAR-Kubernetes version >=1.1.4

In `production.yaml` rename sections:

| Old Name | New Name |
|--------------------------|------------------------------------|
| radar_jdbc_connector | radar_jdbc_connector_grafana |
| radar_jdbc_connector_agg | radar_jdbc_connector_realtime_dashboard |

### Upgrade to RADAR-Kubernetes version 1.1.x
Before running the upgrade make sure to copy `environments.yaml.tmpl` to `environments.yaml` and if you've previously changed `environments.yaml` apply the changes again. This is necessary due to addition of `helmDefaults` and `repositories` configurations to this file.

Expand Down
46 changes: 46 additions & 0 deletions docs/ksql-server_for_data-dashboard-backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Kafka-data transformer (ksql-server) for data-dashboard-backend service

Reference: https://docs.ksql-server.io/

The data-dashboard-backend service uses data derived from Kafka topics imported into the _observation_ table
data-dashboard-backend service database. The data in the Kafka topics is transformed by the ksql-server data transformer
to be imported into the _observation_ table.

The ksql-server data transformer is able to register Consumer/Producers to Kafka that transform data in a topic and
publish the results to another topic.

The provided ksql-server _questionnaire_response_observations.sql_ and _questionnaire_app_events_observation.sql_ SQL files
transform, respectively, the _questionnaire_response_ and _questionnaire_app_event_ topics and publish the data to the
_ksql_observations_ topic. The _ksql_observations_ topic is consumed by the radar-jdbc-connector service deployed for the
data-dashboard-backend service (see: [20-data-dashboard.yaml](../helmfile.d/20-dashboard.yaml)).

When transformation of other topics is required, new SQL files can be added to this directory. These new files should be
referenced in the _cp-ksql-server_ -> ksql -> queries_ section of the `etc/base.yaml.gotmpl` file. New ksql-server SQL
files should transform towards the following format of the _ksql_observations_ topic:

```
TOPIC KEY:
PROJECT: the project identifier
SOURCE: the source identifier
SUBJECT: the subject/study participant identifier
TOPIC VALUE:
TOPIC: the topic identifier
CATEGORY: the category identifier (optional)
VARIABLE: the variable identifier
DATE: the date of the observation
END_DATE: the end date of the observation (optional)
TYPE: the type of the observation (STRING, STRING_JSON, INTEGER, DOUBLE)
VALUE_TEXTUAL: the textual value of the observation (optional, must be set when VALUE_NUMERIC is NULL)
VALUE_NUMERIC: the numeric value of the observation (optional, must be set when VALUE_TEXTUAL is NULL)
```

New messages are added to the _ksql_observations_ topic by inserting into the _observations_ stream (
see [_base_observations_stream.sql](_base_observations_stream.sql)):

```
INSERT INTO observations
SELECT
...
PARTITION BY q.projectId, q.userId, q.sourceId
EMIT CHANGES;
```
5 changes: 4 additions & 1 deletion etc/base-secrets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ management_portal:
client_secret: secret
radar_push_endpoint:
client_secret: secret
radar_data_dashboard_backend:
client_secret: secret
smtp:
password: change_me

Expand Down Expand Up @@ -124,8 +126,9 @@ oura_api_secret: change_me
radar_rest_sources_backend:
postgres:
password: secret
# --------------------------------------------------------- 20-grafana.yaml ---------------------------------------------------------
# --------------------------------------------------------- 20-dashboard.yaml ---------------------------------------------------------
timescaledb_password: secret
data_dashboard_backend_db_password: secret
grafana_password: secret
grafana_metrics_password: secret

Expand Down
74 changes: 46 additions & 28 deletions etc/base.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -290,10 +290,12 @@ radar_rest_sources_backend:
garmin:
enable: "false"

# --------------------------------------------------------- 20-grafana.yaml ---------------------------------------------------------
# --------------------------------------------------------- 20-dashboard.yaml ---------------------------------------------------------

timescaledb_username: postgres
timescaledb_db_name: grafana-metrics
data_dashboard_backend_db_name: data-dashboard
data_dashboard_backend_db_username: postgres
grafana_metrics_username: postgres

timescaledb:
Expand All @@ -315,23 +317,63 @@ timescaledb:
# Uncomment when upgrading
#existingClaim: "data-timescaledb-postgresql-0"

# Make sure to set:
#- radar_jdbc_connector_grafana._install to true
#- ksql_server._install to true
radar_grafana:
_install: true
_install: false
_chart_version: 6.26.8
_extra_timeout: 0
replicaCount: 1
env:
GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH: /var/lib/grafana/dashboards/allprojects/home.json

radar_jdbc_connector:
_install: true
# Make sure to set:
#- radar_jdbc_connector_data_dashboard_backend._install to true
#- ksql_server._install to true
data_dashboard_backend:
_install: false
_chart_version: 0.1.0
_extra_timeout: 0
replicaCount: 1

# Install when radar_grafana._install is 'true'
radar_jdbc_connector_grafana:
_install: false
_chart_version: 0.5.1
_extra_timeout: 0
replicaCount: 1
sink:
# Change the list of topics if you have dashboards that read other data or if you don't have certain topics available on your cluster.
topics: android_phone_relative_location, android_phone_battery_level, connect_fitbit_intraday_heart_rate, connect_fitbit_intraday_steps

# Install when data_dashboard_backend._install is 'true'
radar_jdbc_connector_data_dashboard_backend:
_install: false
_chart_version: 0.5.1
_extra_timeout: 0
replicaCount: 1

# Install when using realtime analysis
radar_jdbc_connector_realtime_dashboard:
_install: false
_chart_version: 0.5.1
_extra_timeout: 0
replicaCount: 1

# Install when:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again it can be enabled automatically in the helmfiles using templating with if statements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. We decided to postpone this to future changes.

#- radar_grafana._install is 'true'
#- data_dashboard_backend._install is 'true'
#- using realtime analysis
ksql_server:
_install: false
_chart_version: 0.3.1
_extra_timeout: 0
# -- Uncomment when using real-time analysis
# ksql:
# headless: false
# --

# --------------------------------------------------------- 20-ingestion.yaml ---------------------------------------------------------

radar_gateway:
Expand Down Expand Up @@ -488,30 +530,6 @@ radar_push_endpoint:
garmin:
enabled: true

# --------------------------------------------------------- 40-realtime-analyses.yaml ---------------------------------------------------------

radar_jdbc_connector_agg:
_install: false
_chart_version: 0.5.1
_extra_timeout: 0
replicaCount: 1

ksql_server:
_install: false
_chart_version: 0.3.1
_extra_timeout: 0
replicaCount: 1
servicePort: 8088
kafka:
bootstrapServers: PLAINTEXT://cp-kafka:9092
cp-schema-registry:
url: http://cp-schema-registry:8081
ksql:
headless: false
configurationOverrides:
"ksql.server.url": "http://0.0.0.0:8088"
"ksql.advertised.listener": "http://ksql-server:8088"

# --------------------------------------------------------- 99-velero.yaml ---------------------------------------------------------

velero:
Expand Down
14 changes: 10 additions & 4 deletions etc/base.yaml.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,15 @@ radar_grafana:
# google_application_credentials: {{ readFile "../etc/radar-appserver/firebase-adminsdk.json" | quote }}
#*/}}

# Remove below Go comment to read the queries.sql and set the queries
# in the ksql_server
#ksql_server:
# If data transformation of kafka topic data is needed, please remove the Go template comments and yaml comments.
# Make sure to reference a ksql transformation file that contains the required transformation logic.
# The files below are transform the data from the questionnaire_response and questionnaire_app_events topics to the
# ksql_observations topic, used by the data-dashboard-backend. If using the data-dashboard-backend, please make sure
# to uncomment the relevant ksql transformer files.
# Note: never remove the _base_observations_stream.sql file.
# ksql_server:
# ksql:
# queries: |
# {{/*- readFile "cp-ksql-server/queries.sql" | nindent 8 */}}
# {{/*- readFile "../etc/cp-ksql-server/_base_observations_stream.sql" | nindent 8 */}}
# {{/*- readFile "../etc/cp-ksql-server/questionnaire_response_observations.sql" | nindent 8 */}}
# {{/*- readFile "../etc/cp-ksql-server/questionnaire_app_event_observations.sql" | nindent 8 */}}
20 changes: 20 additions & 0 deletions etc/cp-ksql-server/_base_observations_stream.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
SET 'auto.offset.reset' = 'earliest';

-- Register the 'ksql_observations' topic (is created when not exists).
CREATE STREAM observations (
PROJECT VARCHAR KEY, -- 'KEY' means that this field is part of the kafka message key
SUBJECT VARCHAR KEY,
SOURCE VARCHAR KEY,
TOPIC_NAME VARCHAR,
CATEGORY VARCHAR,
VARIABLE VARCHAR,
OBSERVATION_TIME TIMESTAMP,
OBSERVATION_TIME_END TIMESTAMP,
TYPE VARCHAR,
VALUE_NUMERIC DOUBLE,
VALUE_TEXTUAL VARCHAR
) WITH (
kafka_topic = 'ksql_observations',
partitions = 3,
format = 'avro'
);
Empty file removed etc/cp-ksql-server/queries.sql
Empty file.
31 changes: 31 additions & 0 deletions etc/cp-ksql-server/questionnaire_app_event_observations.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
CREATE STREAM questionnaire_app_event (
projectId VARCHAR KEY, -- 'KEY' means that this field is part of the kafka message key
userId VARCHAR KEY,
sourceId VARCHAR KEY,
questionnaireName VARCHAR,
eventType VARCHAR,
time DOUBLE,
metadata MAP<VARCHAR, VARCHAR>+
) WITH (
kafka_topic = 'questionnaire_app_event',
partitions = 3,
format = 'avro'
);

INSERT INTO observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a Query ID to the insert so we can identify them

Copy link
Contributor Author

@pvannierop pvannierop Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean an autoincremented id field for each inserted row?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented a auto-incrementing 'id' field in data-dashboard-backend. The config in this PR has been updated accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry for the confusion i just meant the QUERY_ID -- https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/insert-into/#query_id (it is just a name for the INSERT query)

For the database tables, i think the primary keys you had previously were good. This is not that important in case of questionnaires, but when you have passive data such as fitbit there is usually high chance of having duplicate data, so this would reduce that in the database. Also would help with querying if using primary keys (as will be indexed by default).

Copy link
Contributor Author

@pvannierop pvannierop Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it like so:

INSERT INTO observations
WITH (QUERY_ID='questionnaire_app_event_observations')
SELECT
    ...

and

INSERT INTO observations
WITH (QUERY_ID='questionnaire_response_observations')
SELECT 
   ...

WITH (QUERY_ID='questionnaire_app_event_observations')
SELECT
q.projectId AS PROJECT,
q.userId AS SUBJECT,
q.sourceId AS SOURCE,
'questionnaire_app_event' as TOPIC_NAME,
CAST(NULL as VARCHAR) as CATEGORY,
q.questionnaireName as VARIABLE,
FROM_UNIXTIME(CAST(q.time * 1000 AS BIGINT)) as OBSERVATION_TIME,
CAST(NULL as TIMESTAMP) as OBSERVATION_TIME_END,
'STRING_JSON' as TYPE,
CAST(NULL as DOUBLE) as VALUE_NUMERIC,
TO_JSON_STRING(q.metadata) as VALUE_TEXTUAL
FROM questionnaire_app_event q
PARTITION BY q.projectId, q.userId, q.sourceId -- this sets the fields in the kafka message key
EMIT CHANGES;
83 changes: 83 additions & 0 deletions etc/cp-ksql-server/questionnaire_response_observations.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
CREATE STREAM questionnaire_response (
projectId VARCHAR KEY, -- 'KEY' means that this field is part of the kafka message key
userId VARCHAR KEY,
sourceId VARCHAR KEY,
time DOUBLE,
timeCompleted DOUBLE,
timeNotification DOUBLE,
name VARCHAR,
version VARCHAR,
answers ARRAY<STRUCT<questionId VARCHAR, value STRUCT<int INT, string VARCHAR, double DOUBLE>, startTime DOUBLE, endTime DOUBLE>>
) WITH (
kafka_topic = 'questionnaire_response',
partitions = 3,
format = 'avro'
);

CREATE STREAM questionnaire_response_exploded
AS SELECT
EXPLODE(TRANSFORM(q.answers, a => a->questionId)) as VARIABLE,
FROM_UNIXTIME(CAST(q.time * 1000 AS BIGINT)) as OBSERVATION_TIME,
q.projectId,
q.userId,
q.sourceId,
'questionnaire_response' as TOPIC_NAME,
q.name as CATEGORY,
CAST(NULL as TIMESTAMP) as OBSERVATION_TIME_END,
-- WARNING!!! The cast from VARCHAR (string) to DOUBLE will throw an JAVA exception if the string is not a number.
-- This does not mean that the message will be lost. The value will be present in the VALUE_TEXTUAL_OPTIONAL field.
EXPLODE(TRANSFORM(q.answers, a => COALESCE(a->value->double, CAST(a->value->int as DOUBLE), CAST(a->value->string as DOUBLE)))) as VALUE_NUMERIC,
EXPLODE(TRANSFORM(q.answers, a => CASE
WHEN a->value->int IS NOT NULL THEN 'INTEGER'
WHEN a->value->double IS NOT NULL THEN 'DOUBLE'
ELSE NULL
END)) as TYPE,
-- Note: When cast to double works for the string value, the VALUE_TEXTUAL_OPTIONAL will also be set.
EXPLODE(TRANSFORM(q.answers, a => a->value->string)) as VALUE_TEXTUAL_OPTIONAL
FROM questionnaire_response q
EMIT CHANGES;

INSERT INTO observations
WITH (QUERY_ID='questionnaire_response_observations')
SELECT
q.projectId as PROJECT,
q.sourceId as SOURCE,
q.userId as SUBJECT,
TOPIC_NAME, CATEGORY, VARIABLE, OBSERVATION_TIME, OBSERVATION_TIME_END,
CASE
WHEN TYPE IS NULL AND VALUE_NUMERIC IS NOT NULL THEN 'DOUBLE' -- must have been derived from a string cast
WHEN TYPE IS NULL AND VALUE_NUMERIC IS NULL THEN 'STRING'
ELSE TYPE -- keep the original type when TYPE is not NULL
END as TYPE,
VALUE_NUMERIC,
CASE
WHEN VALUE_NUMERIC IS NOT NULL THEN NULL -- When cast to double has worked for the string value, set VALUE_TEXTUAL to NULL.
ELSE VALUE_TEXTUAL_OPTIONAL
END as VALUE_TEXTUAL
FROM questionnaire_response_exploded q
PARTITION BY q.projectId, q.userId, q.sourceId -- this sets the fields in the kafka message key
EMIT CHANGES;

-- TODO: exploding the 'select:' questions is not yet fully designed.
-- I keep the code here for future reference.
-- Multi-select questionnaire questions are stored as a single 'value' string with the
-- names of the selected options separated by comma's. Multiselect questions are prefixed
-- by 'select:' in the questionId.
-- When 'questionId' is like 'select:%' create a new stream with the select options.
-- The options in the value field split commas and added as separate VARIABLE records.
-- The VALUE_NUMERIC is set to 1 and VALUE_TEXTUAL is set to NULL.
-- INSERT INTO observations
-- SELECT
-- EXPLODE(SPLIT(VALUE_TEXTUAL, ',')) as VARIABLE,
-- PROJECT, SOURCE, SUBJECT, TOPIC_NAME, CATEGORY, OBSERVATION_TIME, OBSERVATION_TIME_END,
-- 'INTEGER' as TYPE,
-- CAST(1 as DOUBLE) VALUE_NUMERIC,
-- CAST(NULL as VARCHAR) as VALUE_TEXTUAL
-- FROM questionnaire_response_observations
-- WHERE
-- VARIABLE IS NOT NULL
-- AND VARIABLE LIKE 'select:%'
-- AND VALUE_TEXTUAL IS NOT NULL
-- AND VALUE_TEXTUAL != ''
-- PARTITION BY SUBJECT, PROJECT, SOURCE
-- EMIT CHANGES;
6 changes: 6 additions & 0 deletions etc/cp-ksql-server/values.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# KSQL configuration options
## ref: https://docs.confluent.io/current/ksql/docs/installation/server-config/config-reference.html
kafka:
bootstrapServers: PLAINTEXT://cp-kafka:9092
cp-schema-registry:
url: http://cp-schema-registry:8081
configurationOverrides:
"ksql.server.url": "http://0.0.0.0:8088"
"ksql.advertised.listener": "http://ksql-server:8088"
"ksql.logging.processing.topic.auto.create": true
"ksql.logging.processing.stream.auto.create": true
"ksql.service.id": radar_default_
Expand Down
Loading
Loading