Skip to content

Latest commit

 

History

History
411 lines (358 loc) · 21.5 KB

File metadata and controls

411 lines (358 loc) · 21.5 KB

Release Note 6.0.0 ML Analytics Service

This version contains set of manual activites tasks that must be completed in order to improve to upgrade the ML Analytics Service code to 6.0.0. Please consider the following list of tasks to be completed.

Devops Changes:

New Environment Keys Added

We added new environment keys to the DevOps repository (PR link) to as required for new features and functionality. For configuration and access to outside services or resources, these keys will be utilised.

Please note you don't need to deploy the DevOps repo. Once the PR is merged, deploy this service, env variable will automatically add from the DevOps branch.

In this release, we have introduced new environment variables as follows.

The Below value of these keys can be overridden or have values defined as needed using the private devops Repo path E.g. : ansible/inventory/staging/managed-learn/common.yml

        ml_analytics_authorization_access_token : "{{ ml_analytics_authorization_access_token }}"
        ml_analytics_client_id : "{{ ml_analytics_client_id }}"
        ml_analytics_client_secret : "{{ ml_analytics_client_secret }}"
        ml_analytics_username : "{{ ml_analytics_username }}"
        ml_analytics_password : "{{ ml_analytics_password }}"
        ml_analytics_createdBy : "{{ ml_analytics_createdBy }}"
        ml_analytics_api_base_url : "https://{{ domain_name }}/"
        ml_analytics_reports_store : "{{ cloud_service_provider }}"
        ml_analytics_reports_container : "{{ cloud_storage_privatereports_bucketname }}"
        ml_analytics_driver_memory: "{{ ml_analytics_driver_memory | default('50g') }}"
        ml_analytics_executor_memory: "{{ ml_analytics_executor_memory | default('50g') }}"
        URL : "{{ ml_analytics_core_service }}"
        container_name : "{{ cloud_storage_telemetry_bucketname }}"


We can use the existing user credentials or we can create new user by using below curl

        curl --location 'https://staging.sunbirded.org/api/user/v1/create' \
        --header 'Authorization: Bearer {{Bearer_token}}' \
        --header 'x-authenticated-user-token: {{access_token}}' \
        --header 'Content-Type: application/json' \
        --data-raw '{
            "request": {
                "firstName": "report",
                "password": "{{password}}",
                "email": "[email protected]",
                "userName": "ReportCreator",
                "lastName": "creator",
                "emailVerified":true,
                "channel": "sunbird",
                "roles":["PROGRAM_MANAGER", "PROGRAM_DESIGNER", "REPORT_ADMIN", "REPORT_VIEWER"]
            }
        }'

        Note: 
        1. The request body can be modified and used accordingly  

        2. The "ml_analytics_authorization_access_token" access token should have access to below api endpoints
                
                refresh_token = "auth/realms/sunbird/protocol/openid-connect/token"
                access_token = "auth/v1/refresh/token"
                backend_create = "api/data/v1/report/jobs/submit"
                frontend_create = "/api/data/v1/report-service/report/create"
                frontend_get = "/api/data/v1/report-service/report/get/"
                backend_update = "/api/data/v1/report/jobs/"
                frontend_update = "/api/data/v1/report-service/report/update/"
                frontend_retire = "/api/data/v1/report-service/report/delete/"
                backend_retire = "/report/jobs/deactivate/"
                reports_list = "/api/data/v1/report-service/report/list"

        3. The value for ml_analytics_createdBy should be unique UUID of the user generated or being used and supplied to ml_analytics_username e.g. fb85a044-d9eb-479b-a55a-fer1bfaea14d

        4. The value for ml_analytics_client_secret is used for generating the keyclock access token. 
        e.g. fd241dce-46b9-47e1-97cf-1c7de7a44216

Deploy ml-analytics-service

To retrieve the latest release tag for version 6.0.0, please visit the following URL: https://github.com/Sunbird-Ed/ml-analytics-service/releases/tag e.g. release-6.0.0_RC14

To proceed with the deployment process, follow the steps below:

1. Log in to Jenkins.
2. Go to Dashboard -> Deploy -> staging -> managed-learn -> ml-analytics-service.
3. Click on "Build with parameters" and provide the latest release tag of ml-analytics-service in the field labeled "ml_analytics_version" and also provide the latest release branch of devops in the field labeled "private_branch". Initiate the build process.
4. Once the job is completed, the services will be deployed on the staging environment

Migrations

In this release, we have added automation script to create the reports using report config json file. it uses reports backend and frontend API's to create the report.

Step 1:

Login to the ml-analytics-service server

Step 2:

Switch to the user

sudo su data-pipeline

Step 3:

Activate virtual environment

. /opt/sparkjobs/spark_venv/bin/activate

Step 4:

Navigate to path

cd /opt/sparkjobs/ml-analytics-service/migrations/releases/6.0.0/

Step 5:

Run the script which will create backend and frontend report configs 

python index.py

Step 6:

Running the Jenkins Job to Generate CSV and JSON in the Cloud
To execute the Jenkins Job and generate the CSV and JSON files in the cloud, please follow the steps below:

1. Log in to the VPN environment.
2. Access the Jenkins UI by navigating to the appropriate URL.
3. Once logged in, locate the "Deploy" section and select the desired environment (e.g., Staging, Production, etc.).
4. In the selected environment, find the "DataPipeline" folder and click on it.
5. Inside the "DataPipeline" folder, locate the "RunReportjob" job and click on it.
6. On the job page, look for the option to "Build with Parameters" and click on it.
7. You will be prompted to provide the required parameters for the job.    
        1. Specify the "report_id" parameter with the appropriate value. 
            (Retrieve all the "report_id" value from the 12th point and pass it here one by one -For example, you can refer to the "ml_report_name" value) or (in this path -           
         https://github.com/Sunbird-Ed/ml-analytics-service/tree/release-6.0.0/migrations/releases/6.0.0/config/backend/create )
         Once you have obtained the required "report_id" value, provide it as the parameter during the job execution.

        2. The "private_branch" parameter should default to the latest branch, so no action is needed unless specified otherwise.

        3. If there is a "branch_or_tag" parameter, it should be set to the same value as the "private_branch" parameter.

8. Once you have entered the necessary parameters, click on the "Build" button to start the job.
9. The Jenkins job will now run, executing the required tasks to generate the CSV and JSON files in the cloud.
10. Monitor the job's progress through the Jenkins UI to ensure it completes successfully.
11. Once the job finishes, the CSV and JSON files should be generated in the designated cloud storage location.

12. report_id's to run in the RunReportjob in Jenkins :
    ml_no_of_users_joining_program_sl_pdr
    ml_no_of_improvements_in_progress_status_currently_sl
    ml_district_wise_observation_status_and_entities_observed_table_block_with_rubric_sl_pdr
    ml_no_of_improvements_certificate_issued_sl_pdr
    ml_total_submissions_sl
    ml_no_of_users_district_wise_sl
    ml_no_of_improvements_certificate_issued_sl
    ml_improvement_project_big_numbers_sl_pdr
    ml_district_wise_unique_users_who_submitted_form_with_rubric_sl
    ml_total_entities_observed_sl
    ml_district_wise_no_of_submissions_vs_observation_status_with_rubric_sl_pdr
    ml_district_wise_unique_entities_observed_sl_pdr
    ml_no_of_users_district_wise_sl_pdr
    ml_district_wise_unique_entities_observed_with_rubric_sl_pdr
    ml_total_entities_observed_sl_pdr
    ml_total_submissions_sl_pdr
    ml_status_of_improvement_projects_district_wise_sl_pdr
    ml_improvement_projects_status_by_district_table_block_sl
    ml_district_wise_unique_entities_observed_with_rubric_sl
    ml_no_of_observations_submitted_till_date_sl
    ml_district_wise_observation_status_and_entities_observed_table_a_with_rubric_sl_pdr
    ml_district_wise_observation_status_and_entities_observed_sl
    ml_no_of_improvements_submitted_till_date_sl
    ml_no_of_improvements_by_state_sl
    ml_total_unique_users_with_rubric_sl_pdr
    ml_total_submissions_with_rubric_sl_pdr
    ml_no_of_improvements_in_started_status_currently_sl
    ml_unique_users_who_started_form_with_rubric_sl_pdr
    ml_district_wise_no_of_submissions_vs_observation_status_with_rubric_sl
    ml_district_wise_unique_entities_observed_sl
    ml_district_wise_observation_status_and_entities_observed_table_a_sl_pdr
    ml_no_of_improvements_submitted_with_evidence_till_date_sl
    ml_unique_users_who_started_form_with_rubric_sl
    ml_improvement_projects_status_by_block_table_sl_pdr
    ml_total_entities_observed_with_rubric_sl
    ml_district_wise_no_of_submissions_vs_observation_status_sl_pdr
    ml_district_wise_unique_users_who_submitted_form_sl_pdr
    ml_total_submissions_with_rubric_sl
    ml_improvement_projects_status_by_district_table_sl_pdr
    ml_total_entities_observed_with_rubric_sl_pdr
    ml_total_unique_users_sl_pdr
    ml_improvement_project_big_numbers_sl
    ml_total_unique_users_sl
    ml_no_of_improvements_by_district_sl
    ml_criteria_wise_unique_entities_at_each_level_with_rubric_sl_pdr
    ml_district_wise_observation_status_and_entities_observed_table_with_rubric_sl
    ml_district_wise_observation_status_and_entities_observed_table_block_sl_pdr
    ml_total_unique_users_with_rubric_sl
    ml_unique_users_who_started_form_sl
    ml_unique_users_who_started_form_sl_pdr
    ml_block_wise_observation_status_and_entities_observed_sl
    ml_improvement_projects_status_by_district_table_sl
    ml_status_of_improvement_projects_district_wise_sl
    ml_domain_wise_unique_entities_at_each_level_with_rubric_sl_pdr
    ml_district_wise_unique_users_who_submitted_form_sl
    ml_no_of_observations_in_started_status_currently_sl
    ml_domain_wise_unique_entities_at_each_level_with_rubric_sl
    ml_district_wise_unique_users_who_submitted_form_with_rubric_sl_pdr
    ml_district_wise_no_of_submissions_vs_observation_status_sl
    ml_criteria_wise_unique_entities_at_each_level_with_rubric_sl 
    ml_no_of_surveys_in_started_status_currently_sl
    ml_no_of_surveys_submitted_till_date_sl


Please note that the specific steps may vary depending on your Jenkins configuration and environment. Ensure you have the necessary permissions and access rights to perform these actions.

Create a new data source in Druid:

In this release, we have to create a new data source in the Druid for The "No of users joining program" report, please follow the steps below:

1. Log in to the VPN environment.
2. Access the Druid UI  "unified console" by navigating to the appropriate URL.
3. please select the ingestion in Tab.
4. locate the "..." (3 Dots) and select, it will be in the right corner of screen beside "Supervisors" and "Refresh"
5. After selecting you will see a popup with multiple selections in that select - Submit JSON supervisor.
6. and add below Json (Ingestion spec) to Submit supervisor and Submit. 

Json (Ingestion spec) for new Data source ( Name - ml-user-program ): 
Here in this json inside ioConfig please replace {{Env}} and {{kafka_IP}} with respective environment value


    {
        "type": "kafka",
        "dataSchema": {
            "dataSource": "ml-user-program",
            "parser": {
                "type": "string",
                "parseSpec": {
                    "format": "json",
                    "flattenSpec": {
                        "useFieldDiscovery": true,
                        "fields": [
                            {
                                "type": "jq",
                                "name": "user_id",
                                "expr": ".userId"
                            },
                            {
                                "type": "jq",
                                "name": "program_id",
                                "expr": ".programId"
                            },
                            {
                                "type": "jq",
                                "name": "program_externalId",
                                "expr": ".programExternalId"
                            },
                            {
                                "type": "jq",
                                "name": "program_name",
                                "expr": ".programName"
                            },
                            {
                                "type": "jq",
                                "name": "state_externalId",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"state\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"state\")) | .id else null end"
                            },
                            {
                                "type": "jq",
                                "name": "state_name",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"state\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"state\")) | .name else null end"
                            },
                            {
                                "type": "jq",
                                "name": "district_externalId",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"district\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"district\")) | .id else null end"
                            },
                            {
                                "type": "jq",
                                "name": "district_name",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"district\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"district\")) | .name else null end"
                            },
                            {
                                "type": "jq",
                                "name": "block_externalId",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"block\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"block\")) | .id else null end"
                            },
                            {
                                "type": "jq",
                                "name": "block_name",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"block\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"block\")) | .name else null end"
                            },
                            {
                                "type": "jq",
                                "name": "cluster_externalId",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\")) | .id else null end"
                            },
                            {
                                "type": "jq",
                                "name": "cluster_name",
                                "expr": "if [.userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\"))] | length > 0 then .userProfile ? |.userLocations [] ? | select(.type | contains(\"cluster\")) | .name else null end"
                            },
                            {
                                "type": "jq",
                                "name": "organisation_id",
                                "expr": "if [.userProfile? | .organisations[]? | select (.isSchool == false)] | length > 0 then .userProfile? | .organisations[]? | select(.isSchool == false) | .organisationId else null end"
                            },
                            {
                                "type": "jq",
                                "name": "organisation_name",
                                "expr": "if [.userProfile? | .organisations[]? | select (.isSchool == false)] | length > 0 then .userProfile? | .organisations[]? | select(.isSchool == false) | .orgName else null end"
                            }
                        ]
                    },
                    "dimensionsSpec": {
                        "dimensions": [
                            {
                                "type": "string",
                                "name": "program_id"
                            },
                            {
                                "type": "string",
                                "name": "program_externalId"
                            },
                            {
                                "type": "string",
                                "name": "program_name"
                            },
                            {
                                "type": "string",
                                "name": "state_externalId"
                            },
                            {
                                "type": "string",
                                "name": "state_name"
                            },
                            {
                                "type": "string",
                                "name": "district_externalId"
                            },
                            {
                                "type": "string",
                                "name": "district_name"
                            },
                            {
                                "type": "string",
                                "name": "block_externalId"
                            },
                            {
                                "type": "string",
                                "name": "block_name"
                            },
                            {
                                "type": "string",
                                "name": "cluster_externalId"
                            },
                            {
                                "type": "string",
                                "name": "cluster_name"
                            },
                            {
                                "type": "string",
                                "name": "organisation_id"
                            },
                            {
                                "type": "string",
                                "name": "organisation_name"
                            }
                        ],
                        "dimensionsExclusions": []
                    },
                    "timestampSpec": {
                        "column": "createdAt",
                        "format": "iso"
                    }
                }
            },
            "metricsSpec": [
                {
                    "type": "longSum",
                    "name": "sum_user",
                    "fieldName": "user_id"
                },
                {
                    "type": "HLLSketchBuild",
                    "name": "unique_users",
                    "fieldName": "user_id"
                },
                {
                    "type": "count",
                    "name": "count_user"
                }
            ],
            "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "day",
                "queryGranularity": "day",
                "rollup": false
            }
        },
        "ioConfig": {
            "topic": "{{Env}}.programuser.info",
            "consumerProperties": {
                "bootstrap.servers": "{{kafka_IP}}:9092"
            },
            "taskCount": 1,
            "replicas": 1,
            "taskDuration": "PT14400S",
            "useEarliestOffset": false,
            "completionTimeout": "PT1800S"
        },
        "tuningConfig": {
            "type": "kafka",
            "reportParseExceptions": false,
            "maxRowsPerSegment": 5000000
        }
    }