Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data dashboard backend] Add Data Dashboard Backend service #267

Merged
merged 23 commits into from
Jun 25, 2024

Conversation

pvannierop
Copy link
Contributor

@pvannierop pvannierop commented May 29, 2024

Description of the change

Data dashboard backend is a new service that exposes data from the observation table in timescaledb database (provisioned by KSQLDB service) to frontend/mobile applications.

See: https://github.com/RADAR-base/radar-data-dashboard-backend

This PR will implement the Data dashboard backend service as well as include the KSQLDB transformer logic to populate the observation table in timescaledb.

Additional information

  • Transformer logic transforms messages in any kafka topic to a single observation table. For now, only questionnaire_response and questionnaire_app_events are handled. Transformation logic for other topics can be added in the future.
  • Ingested topics are configured by selective uncommenting/addition in base.yaml.gotmpl file:
kafka_data_transformer:
 ksql:
   queries: |
     {{/* - readFile "../etc/cp-ksql-server/_base_observations_stream.sql"             | nindent 8 */}}
     {{/* - readFile "../etc/cp-ksql-server/questionnaire_response_observations.sql"   | nindent 8 */}}
#      {{/* - readFile "../etc/cp-ksql-server/questionnaire_app_events_observations.sql" | nindent 8 */}}
  • The existing 20-grafana.yaml file name was changed to 20-dashboard.yaml to fit the expanded scope. In addition, the database name was changed from grafana-metrics to data-dashboard.

@pvannierop pvannierop requested a review from keyvaann May 29, 2024 10:53
@pvannierop pvannierop self-assigned this May 29, 2024
@pvannierop pvannierop changed the base branch from main to dev May 29, 2024 10:54
@pvannierop pvannierop force-pushed the data-dashboard-backend branch from e28f72b to 48afbd4 Compare May 29, 2024 10:57
@pvannierop pvannierop requested a review from yatharthranjan May 29, 2024 10:58
@pvannierop pvannierop force-pushed the data-dashboard-backend branch from 48afbd4 to 188975a Compare May 31, 2024 08:27
@pvannierop pvannierop requested a review from keyvaann May 31, 2024 10:02
@pvannierop pvannierop requested a review from keyvaann June 10, 2024 14:16
@pvannierop pvannierop force-pushed the data-dashboard-backend branch from f43433d to 07d9d82 Compare June 11, 2024 16:47
@pvannierop pvannierop force-pushed the data-dashboard-backend branch 2 times, most recently from 67a86ac to 62bd1fb Compare June 12, 2024 09:28
@keyvaann
Copy link
Collaborator

Did you change permissions of generate-secrets script? Seems like it can't run anymore
https://github.com/RADAR-base/RADAR-Kubernetes/actions/runs/9481223536/job/26123532105#step:5:39

Copy link
Member

@yatharthranjan yatharthranjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please find some comments.

_extra_timeout: 0
replicaCount: 1

# Install when:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again it can be enabled automatically in the helmfiles using templating with if statements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. We decided to postpone this to future changes.

PROJECT VARCHAR KEY, -- 'KEY' means that this field is part of the kafka message key
SUBJECT VARCHAR KEY,
SOURCE VARCHAR KEY,
`TOPIC` VARCHAR,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATA_TYPE instead of topic ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that is a API question. Would we rather use radar-centric terminology (subject, topic), or expose a different API to consumers (e.g., participant, data_type). It depends on the consumers I guess. I personally think that the RADAR-base teams of The Hyve and KCL are basically the only users of this API. To me, it looks like an extra burden to juggle these translations all the time:

  • participant means subject
  • data_type means topic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure, just mentioned to avoid conflicts and confusion with the KSQL topic keyword. You can use TOPIC_NAME maybe to avoid conflicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the column to TOPIC_NAME as suggested.

`TOPIC` VARCHAR,
CATEGORY VARCHAR,
VARIABLE VARCHAR,
DATE TIMESTAMP,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there is a time column created by jdbc-connector. Renaming DATE to time would cause a clash here. What would work is for instance time_of_observation or observation_time. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use TIME in our queries and works with jdbc-connector. But yes observation_time works too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed DATE to OBSERVATION_TIME and END_DATE to OBSERVATION_TIME_END.

format = 'avro'
);

INSERT INTO observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a Query ID to the insert so we can identify them

Copy link
Contributor Author

@pvannierop pvannierop Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean an autoincremented id field for each inserted row?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented a auto-incrementing 'id' field in data-dashboard-backend. The config in this PR has been updated accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry for the confusion i just meant the QUERY_ID -- https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/insert-into/#query_id (it is just a name for the INSERT query)

For the database tables, i think the primary keys you had previously were good. This is not that important in case of questionnaires, but when you have passive data such as fitbit there is usually high chance of having duplicate data, so this would reduce that in the database. Also would help with querying if using primary keys (as will be indexed by default).

Copy link
Contributor Author

@pvannierop pvannierop Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it like so:

INSERT INTO observations
WITH (QUERY_ID='questionnaire_app_event_observations')
SELECT
    ...

and

INSERT INTO observations
WITH (QUERY_ID='questionnaire_response_observations')
SELECT 
   ...

cp-schema-registry:
url: http://cp-schema-registry:8081
ksql:
headless: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be enabled to use the sql files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah well spotted. Thanks for that. For, 40-realtime-analysis.yaml this setting was set to false. In order to reuse the ksql-server, I needed to add this field to base.yaml and add a comment like so:

ksql_server:
  _install: false
  _chart_version: 0.3.1
  _extra_timeout: 0
  # -- Uncomment when using real-time analysis
  # ksql:
  #   headless: false
  # --

It is ugly, I know...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to keep the comment-based method.

values: ["dashboard.{{ .Values.server_name }}"]
- name: "grafana\\.ini.server.root_url"
value: "https://dashboard.{{ .Values.server_name }}/"
- name: ingress.tls[0].secretName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the section about TLS can be removed since it's not used in the actual chart
https://github.com/RADAR-base/RADAR-Kubernetes/actions/runs/9515630882/job/26230258308?pr=266#step:11:38

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is out of scope for this PR. I created an issue out of it.

Copy link
Collaborator

@keyvaann keyvaann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pvannierop pvannierop force-pushed the data-dashboard-backend branch from d155e8b to 71ef416 Compare June 18, 2024 09:35
Includes KSQLDB transformer logic.
PR feedback Keyvan.
PR feedback Keyvan.
@pvannierop pvannierop force-pushed the data-dashboard-backend branch from 71ef416 to 26f80d7 Compare June 24, 2024 10:23
@keyvaann
Copy link
Collaborator

Force pushing makes it difficult to review changes since the last time I have reviewed the code since all of the commits are pushed again.

@keyvaann keyvaann merged commit 6792ded into dev Jun 25, 2024
4 of 7 checks passed
@keyvaann keyvaann deleted the data-dashboard-backend branch June 25, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants