This version contains set of manual activites tasks that must be completed in order to improve to upgrade the ML Analytics Service code to 6.0.0. Please consider the following list of tasks to be completed.
We added new environment keys to the DevOps repository (PR link) to as required for new features and functionality. For configuration and access to outside services or resources, these keys will be utilised.
Please note you don't need to deploy the DevOps repo. Once the PR is merged, deploy this service, env variable will automatically add from the DevOps branch.
In this release, we have introduced new environment variables as follows.
The Below value of these keys can be overridden or have values defined as needed using the private devops Repo path E.g. : ansible/inventory/staging/managed-learn/common.yml
ml_analytics_authorization_access_token : "{{ ml_analytics_authorization_access_token }}"
ml_analytics_client_id : "{{ ml_analytics_client_id }}"
ml_analytics_client_secret : "{{ ml_analytics_client_secret }}"
ml_analytics_username : "{{ ml_analytics_username }}"
ml_analytics_password : "{{ ml_analytics_password }}"
ml_analytics_createdBy : "{{ ml_analytics_createdBy }}"
ml_analytics_api_base_url : "https://{{ domain_name }}/"
ml_analytics_reports_store : "{{ cloud_service_provider }}"
ml_analytics_reports_container : "{{ cloud_storage_privatereports_bucketname }}"
ml_analytics_driver_memory: "{{ ml_analytics_driver_memory | default('50g') }}"
ml_analytics_executor_memory: "{{ ml_analytics_executor_memory | default('50g') }}"
URL : "{{ ml_analytics_core_service }}"
container_name : "{{ cloud_storage_telemetry_bucketname }}"
We can use the existing user credentials or we can create new user by using below curl
curl --location 'https://staging.sunbirded.org/api/user/v1/create' \
--header 'Authorization: Bearer {{Bearer_token}}' \
--header 'x-authenticated-user-token: {{access_token}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"request": {
"firstName": "report",
"password": "{{password}}",
"email": "[email protected]",
"userName": "ReportCreator",
"lastName": "creator",
"emailVerified":true,
"channel": "sunbird",
"roles":["PROGRAM_MANAGER", "PROGRAM_DESIGNER", "REPORT_ADMIN", "REPORT_VIEWER"]
}
}'
Note:
1. The request body can be modified and used accordingly
2. The "ml_analytics_authorization_access_token" access token should have access to below api endpoints
refresh_token = "auth/realms/sunbird/protocol/openid-connect/token"
access_token = "auth/v1/refresh/token"
backend_create = "api/data/v1/report/jobs/submit"
frontend_create = "/api/data/v1/report-service/report/create"
frontend_get = "/api/data/v1/report-service/report/get/"
backend_update = "/api/data/v1/report/jobs/"
frontend_update = "/api/data/v1/report-service/report/update/"
frontend_retire = "/api/data/v1/report-service/report/delete/"
backend_retire = "/report/jobs/deactivate/"
reports_list = "/api/data/v1/report-service/report/list"
3. The value for ml_analytics_createdBy should be unique UUID of the user generated or being used and supplied to ml_analytics_username e.g. fb85a044-d9eb-479b-a55a-fer1bfaea14d
4. The value for ml_analytics_client_secret is used for generating the keyclock access token.
e.g. fd241dce-46b9-47e1-97cf-1c7de7a44216
To retrieve the latest release tag for version 6.0.0, please visit the following URL: https://github.com/Sunbird-Ed/ml-analytics-service/releases/tag e.g. release-6.0.0_RC14
To proceed with the deployment process, follow the steps below:
1. Log in to Jenkins.
2. Go to Dashboard -> Deploy -> staging -> managed-learn -> ml-analytics-service.
3. Click on "Build with parameters" and provide the latest release tag of ml-analytics-service in the field labeled "ml_analytics_version" and also provide the latest release branch of devops in the field labeled "private_branch". Initiate the build process.
4. Once the job is completed, the services will be deployed on the staging environment
In this release, we have added automation script to create the reports using report config json file. it uses reports backend and frontend API's to create the report.
Login to the ml-analytics-service server
Switch to the user
sudo su data-pipeline
Activate virtual environment
. /opt/sparkjobs/spark_venv/bin/activate
Navigate to path
cd /opt/sparkjobs/ml-analytics-service/migrations/releases/6.0.0/
Run the script which will create backend and frontend report configs
python index.py
Running the Jenkins Job to Generate CSV and JSON in the Cloud
To execute the Jenkins Job and generate the CSV and JSON files in the cloud, please follow the steps below:
1. Log in to the VPN environment.
2. Access the Jenkins UI by navigating to the appropriate URL.
3. Once logged in, locate the "Deploy" section and select the desired environment (e.g., Staging, Production, etc.).
4. In the selected environment, find the "DataPipeline" folder and click on it.
5. Inside the "DataPipeline" folder, locate the "RunReportjob" job and click on it.
6. On the job page, look for the option to "Build with Parameters" and click on it.
7. You will be prompted to provide the required parameters for the job.
1. Specify the "report_id" parameter with the appropriate value.
(Retrieve all the "report_id" value from the 12th point and pass it here one by one -For example, you can refer to the "ml_report_name" value) or (in this path -
https://github.com/Sunbird-Ed/ml-analytics-service/tree/release-6.0.0/migrations/releases/6.0.0/config/backend/create )
Once you have obtained the required "report_id" value, provide it as the parameter during the job execution.
2. The "private_branch" parameter should default to the latest branch, so no action is needed unless specified otherwise.
3. If there is a "branch_or_tag" parameter, it should be set to the same value as the "private_branch" parameter.
8. Once you have entered the necessary parameters, click on the "Build" button to start the job.
9. The Jenkins job will now run, executing the required tasks to generate the CSV and JSON files in the cloud.
10. Monitor the job's progress through the Jenkins UI to ensure it completes successfully.
11. Once the job finishes, the CSV and JSON files should be generated in the designated cloud storage location.
12. report_id's to run in the RunReportjob in Jenkins :
ml_no_of_users_joining_program_sl_pdr
ml_no_of_improvements_in_progress_status_currently_sl
ml_district_wise_observation_status_and_entities_observed_table_block_with_rubric_sl_pdr
ml_no_of_improvements_certificate_issued_sl_pdr
ml_total_submissions_sl
ml_no_of_users_district_wise_sl
ml_no_of_improvements_certificate_issued_sl
ml_improvement_project_big_numbers_sl_pdr
ml_district_wise_unique_users_who_submitted_form_with_rubric_sl
ml_total_entities_observed_sl
ml_district_wise_no_of_submissions_vs_observation_status_with_rubric_sl_pdr
ml_district_wise_unique_entities_observed_sl_pdr
ml_no_of_users_district_wise_sl_pdr
ml_district_wise_unique_entities_observed_with_rubric_sl_pdr
ml_total_entities_observed_sl_pdr
ml_total_submissions_sl_pdr
ml_status_of_improvement_projects_district_wise_sl_pdr
ml_improvement_projects_status_by_district_table_block_sl
ml_district_wise_unique_entities_observed_with_rubric_sl
ml_no_of_observations_submitted_till_date_sl
ml_district_wise_observation_status_and_entities_observed_table_a_with_rubric_sl_pdr
ml_district_wise_observation_status_and_entities_observed_sl
ml_no_of_improvements_submitted_till_date_sl
ml_no_of_improvements_by_state_sl
ml_total_unique_users_with_rubric_sl_pdr
ml_total_submissions_with_rubric_sl_pdr
ml_no_of_improvements_in_started_status_currently_sl
ml_unique_users_who_started_form_with_rubric_sl_pdr
ml_district_wise_no_of_submissions_vs_observation_status_with_rubric_sl
ml_district_wise_unique_entities_observed_sl
ml_district_wise_observation_status_and_entities_observed_table_a_sl_pdr
ml_no_of_improvements_submitted_with_evidence_till_date_sl
ml_unique_users_who_started_form_with_rubric_sl
ml_improvement_projects_status_by_block_table_sl_pdr
ml_total_entities_observed_with_rubric_sl
ml_district_wise_no_of_submissions_vs_observation_status_sl_pdr
ml_district_wise_unique_users_who_submitted_form_sl_pdr
ml_total_submissions_with_rubric_sl
ml_improvement_projects_status_by_district_table_sl_pdr
ml_total_entities_observed_with_rubric_sl_pdr
ml_total_unique_users_sl_pdr
ml_improvement_project_big_numbers_sl
ml_total_unique_users_sl
ml_no_of_improvements_by_district_sl
ml_criteria_wise_unique_entities_at_each_level_with_rubric_sl_pdr
ml_district_wise_observation_status_and_entities_observed_table_with_rubric_sl
ml_district_wise_observation_status_and_entities_observed_table_block_sl_pdr
ml_total_unique_users_with_rubric_sl
ml_unique_users_who_started_form_sl
ml_unique_users_who_started_form_sl_pdr
ml_block_wise_observation_status_and_entities_observed_sl
ml_improvement_projects_status_by_district_table_sl
ml_status_of_improvement_projects_district_wise_sl
ml_domain_wise_unique_entities_at_each_level_with_rubric_sl_pdr
ml_district_wise_unique_users_who_submitted_form_sl
ml_no_of_observations_in_started_status_currently_sl
ml_domain_wise_unique_entities_at_each_level_with_rubric_sl
ml_district_wise_unique_users_who_submitted_form_with_rubric_sl_pdr
ml_district_wise_no_of_submissions_vs_observation_status_sl
ml_criteria_wise_unique_entities_at_each_level_with_rubric_sl
ml_no_of_surveys_in_started_status_currently_sl
ml_no_of_surveys_submitted_till_date_sl
Please note that the specific steps may vary depending on your Jenkins configuration and environment. Ensure you have the necessary permissions and access rights to perform these actions.
In this release, we have to create a new data source in the Druid for The "No of users joining program" report, please follow the steps below:
1. Log in to the VPN environment.
2. Access the Druid UI "unified console" by navigating to the appropriate URL.
3. please select the ingestion in Tab.
4. locate the "..." (3 Dots) and select, it will be in the right corner of screen beside "Supervisors" and "Refresh"
5. After selecting you will see a popup with multiple selections in that select - Submit JSON supervisor.
6. and add below Json (Ingestion spec) to Submit supervisor and Submit.
Json (Ingestion spec) for new Data source ( Name - ml-user-program ):
Here in this json inside ioConfig please replace {{Env}} and {{kafka_IP}} with respective environment value
{
"type": "kafka",
"dataSchema": {
"dataSource": "ml-user-program",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"flattenSpec": {
"useFieldDiscovery": true,
"fields": [
{
"type": "jq",
"name": "user_id",
"expr": ".userId"
},
{
"type": "jq",
"name": "program_id",
"expr": ".programId"
},
{
"type": "jq",
"name": "program_externalId",
"expr": ".programExternalId"
},
{
"type": "jq",
"name": "program_name",
"expr": ".programName"
},
{
"type": "jq",
"name": "state_externalId",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"state\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"state\")) | .id else null end"
},
{
"type": "jq",
"name": "state_name",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"state\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"state\")) | .name else null end"
},
{
"type": "jq",
"name": "district_externalId",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"district\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"district\")) | .id else null end"
},
{
"type": "jq",
"name": "district_name",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"district\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"district\")) | .name else null end"
},
{
"type": "jq",
"name": "block_externalId",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"block\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"block\")) | .id else null end"
},
{
"type": "jq",
"name": "block_name",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"block\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"block\")) | .name else null end"
},
{
"type": "jq",
"name": "cluster_externalId",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\")) | .id else null end"
},
{
"type": "jq",
"name": "cluster_name",
"expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\")) | .name else null end"
},
{
"type": "jq",
"name": "organisation_id",
"expr": "if [.userProfile? | .organisations[]? | select (.isSchool == false)] | length > 0 then .userProfile? | .organisations[]? | select(.isSchool == false) | .organisationId else null end"
},
{
"type": "jq",
"name": "organisation_name",
"expr": "if [.userProfile? | .organisations[]? | select (.isSchool == false)] | length > 0 then .userProfile? | .organisations[]? | select(.isSchool == false) | .orgName else null end"
}
]
},
"dimensionsSpec": {
"dimensions": [
{
"type": "string",
"name": "program_id"
},
{
"type": "string",
"name": "program_externalId"
},
{
"type": "string",
"name": "program_name"
},
{
"type": "string",
"name": "state_externalId"
},
{
"type": "string",
"name": "state_name"
},
{
"type": "string",
"name": "district_externalId"
},
{
"type": "string",
"name": "district_name"
},
{
"type": "string",
"name": "block_externalId"
},
{
"type": "string",
"name": "block_name"
},
{
"type": "string",
"name": "cluster_externalId"
},
{
"type": "string",
"name": "cluster_name"
},
{
"type": "string",
"name": "organisation_id"
},
{
"type": "string",
"name": "organisation_name"
}
],
"dimensionsExclusions": []
},
"timestampSpec": {
"column": "createdAt",
"format": "iso"
}
}
},
"metricsSpec": [
{
"type": "longSum",
"name": "sum_user",
"fieldName": "user_id"
},
{
"type": "HLLSketchBuild",
"name": "unique_users",
"fieldName": "user_id"
},
{
"type": "count",
"name": "count_user"
}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "day",
"queryGranularity": "day",
"rollup": false
}
},
"ioConfig": {
"topic": "{{Env}}.programuser.info",
"consumerProperties": {
"bootstrap.servers": "{{kafka_IP}}:9092"
},
"taskCount": 1,
"replicas": 1,
"taskDuration": "PT14400S",
"useEarliestOffset": false,
"completionTimeout": "PT1800S"
},
"tuningConfig": {
"type": "kafka",
"reportParseExceptions": false,
"maxRowsPerSegment": 5000000
}
}