- The data reconciliation app verifies consistency across different data sets used by Companies House including Oracle, MongoDB and Elasticsearch.
- Groups of comparators are responsible for comparing data sets, aggregating results and then publishing a message to a Kafka topic.
- Comparators are configured using environment variables.
- Git
- Java
- Maven
- MongoDB
- Apache Kafka
- Elasticsearch
- Oracle database
- From the command line, in the same folder as the Makefile run
make clean build
- Configure project environment variables where necessary (see below).
- Ensure MongoDB and Elasticsearch are running within the Companies House developer environment
- Start the service in the CHS developer environment
- The data reconciliation app is implemented using Apache Camel.
- A comparison is triggered when a timer elapses.
- The route triggers the desired function, which fetches the required data sets.
- Retrieved data sets are marshalled into a suitable model and compared with each other.
- Supported comparisons include:
- Count number of resources in a particular data set.
- Calculate symmetric difference between two data sets.
- Identify discrepancies between resources in two data sets.
- The result is transformed into a CSV file and uploaded to S3.
- A message is sent to Kafka after all required comparisons in the group have run.
- Queries used to retrieve data from Oracle and retrieve search hits from Elasticsearch are located on the classpath.
Variable | Description | Example |
SPRING_DATASOURCE_URL | The URL of the Oracle instance where CHIPS application data is stored | jdbc:oracle:thin@oraclehost:1521:db |
SPRING_DATASOURCE_USERNAME | The username that will be used to connect to Oracle | username |
SPRING_DATASOURCE_PASSWORD | The password that will be used to connect to Oracle | password |
SPRING_DATASOURCE_DRIVER_CLASS_NAME | The fully qualified class name of the driver that will be used to connect to Oracle | oracle.jdbc.OracleDriver |
Variable | Description | Example |
SPRING_DATA_MONGODB_URI | The URL of the MongoDB instance where CHS application data is stored | mongodb://mongohost:27017 |
ENDPOINT_MONGODB_COMPANY_PROFILE_DB_NAME | The name of the MongoDB database used to store company profiles | db_name |
ENDPOINT_MONGODB_COMPANY_PROFILE_COLLECTION_NAME | The name of the MongoDB collection used to store company profiles | collection_name |
ENDPOINT_MONGODB_READ_PREFERENCE | Determines how the MongoDB client routes read operations to members of a replica set | PRIMARY |
ENDPOINT_MONGODB_DSQ_OFFICER_DB_NAME | The name of the MongoDB database used to store disqualified officers | db_name |
ENDPOINT_MONGODB_DSQ_OFFICER_COLLECTION_NAME | The name of the MongoDB collection used to store disqualified officers | collection_name |
ENDPOINT_MONGODB_INSOLVENCY_DB_NAME | The name of the MongoDB database used to store company insolvency data | db_name |
ENDPOINT_MONGODB_INSOLVENCY_COLLECTION_NAME | The name of the MongoDB collection used to store company insolvency data | collection_name |
Variable | Description | Example |
ELASTICSEARCH_ALPHA_HOST | The hostname that will be used to connect to the Elasticsearch alphabetical search cluster | example.com |
ELASTICSEARCH_ALPHA_INDEX | The name of the index that alphabetical search hits will be retrieved from | index_name |
ELASTICSEARCH_ALPHA_PORT | The port number that will be used to connect to the Elasticsearch alphabetical search cluster | 9200 |
ELASTICSEARCH_ALPHA_PROTOCOL | The protocol that will be used to connect to the Elasticsearch alphabetical search cluster | https |
ELASTICSEARCH_ALPHA_SEGMENTS | The number of slices that the scrolling search will be split into | 3 |
ELASTICSEARCH_ALPHA_SLICE_SIZE | The number of hits that the scrolling search will return in each response | 10000 |
ELASTICSEARCH_ALPHA_SLICE_FIELD | The field that will be used to split results of a scrolling search | _uid |
ELASTICSEARCH_PRIMARY_HOST | The hostname that will be used to connect to the Elasticsearch primary search cluster | example.com |
ELASTICSEARCH_PRIMARY_INDEX | The name of the index that primary search hits will be retrieved from | index_name |
ELASTICSEARCH_PRIMARY_PORT | The port number that will be used to connect to the Elasticsearch primary search cluster | 9200 |
ELASTICSEARCH_PRIMARY_PROTOCOL | The protocol that will be used to connect to the Elasticsearch primary search cluster | https |
ELASTICSEARCH_PRIMARY_SEGMENTS | The number of slices that the scrolling search will be split into | 3 |
ELASTICSEARCH_PRIMARY_SLICE_SIZE | The number of hits that the scrolling search will return in each response | 10000 |
ELASTICSEARCH_PRIMARY_SLICE_FIELD | The field that will be used to split results of a scrolling search | _uid |
ENDPOINT_ELASTICSEARCH_LOG_INDICES | Used to log a tally of the number of Elasticsearch search hits that have been processed | 10000 |
Variable | Description | Example |
RESULTS_BUCKET | The S3 bucket to which results will be uploaded | bucket_name |
AWS_ACCESS_KEY_ID | The access key that will be used to connect to AWS | access_key |
AWS_SECRET_ACCESS_KEY | The secret access key that will be used to connect to AWS | secret_access_key |
AWS_REGION | The AWS region that the S3 client will connect to | eu-west-2 |
RESULTS_EXPIRY_TIME_IN_MILLIS | The duration in milliseconds for which comparison results can be accessed | 600000 |
Variable | Description | Example |
SCHEMA_REGISTRY_URL | The URL of the Kafka schema registry | example.com |
KAFKA_BROKER_ADDR | The URL of the Kafka broker | example.com |
Variable | Description | Example |
CACHE_EXPIRY_IN_SECONDS | The duration in seconds after which cached results will be evicted | 300 |
Each comparator belongs to a comparison group. After all comparators in the comparison group have run, results produced by each comparator will be published to S3 and a message will be sent to a Kafka topic.
The following tables contain toggles (for enabling/disabling each comparator) and timer delays (after application startup for each comparator).
Note: the application will only start when one or more comparison group toggles have been enabled; when no comparison group toggles have been enabled an error message will be logged i.e.
No aggregation group models enabled; must be at least one
Variable | Description | Example |
COMPANY_COUNT_MONGO_ORACLE_ENABLED | Company count comparator toggle | "true" / "false" |
COMPANY_COUNT_MONGO_ORACLE_DELAY | Company count comparator delay | "30s" |
COMPANY_NUMBER_MONGO_ORACLE_ENABLED | Company number comparator toggle | "true" / "false" |
COMPANY_NUMBER_MONGO_ORACLE_DELAY | Company number comparator delay | "1m30s" |
COMPANY_STATUS_MONGO_ORACLE_ENABLED | Company status comparator toggle | "true" / "false" |
COMPANY_STATUS_MONGO_ORACLE_DELAY | Company status comparator delay | "9m30s" |
Variable | Description | Example |
DSQ_OFFICER_ID_MONGO_ORACLE_ENABLED | Disqualified officer comparator toggle | "true" / "false" |
DSQ_OFFICER_ID_MONGO_ORACLE_DELAY | Disqualified officer comparator delay | "4m30s" |
Variable | Description | Example |
COMPANY_NUMBER_MONGO_PRIMARY_ENABLED | Primary index company number comparator toggle | "true" / "false" |
COMPANY_NUMBER_MONGO_PRIMARY_DELAY | Primary index company number comparator delay | "2m30s" |
COMPANY_NUMBER_MONGO_ALPHA_ENABLED | Alpha index company number comparator toggle | "true" / "false" |
COMPANY_NUMBER_MONGO_ALPHA_DELAY | Alpha index company number comparator delay | "3m30s" |
COMPANY_NAME_MONGO_PRIMARY_ENABLED | Primary index company name comparator toggle | "true" / "false" |
COMPANY_NAME_MONGO_PRIMARY_DELAY | Primary index company name comparator delay | "5m30s" |
COMPANY_NAME_MONGO_ALPHA_ENABLED | Alpha index company name comparator toggle | "true" / "false" |
COMPANY_NAME_MONGO_ALPHA_DELAY | Alpha index company name comparator delay | "6m30s" |
COMPANY_STATUS_MONGO_PRIMARY_ENABLED | Primary index company status comparator toggle | "true" / "false" |
COMPANY_STATUS_MONGO_PRIMARY_DELAY | Primary index company status comparator delay | "7m30s" |
COMPANY_STATUS_MONGO_ALPHA_ENABLED | Primary index company status comparator toggle | "true" / "false" |
COMPANY_STATUS_MONGO_ALPHA_DELAY | Primary index company status comparator delay | "8m30s" |
Variable | Description | Example |
INSOLVENCY_COMPANY_NUMBER_MONGO_ORACLE_ENABLED | Insolvency company number comparator toggle | "true" / "false" |
INSOLVENCY_COMPANY_NUMBER_MONGO_ORACLE_DELAY | Insolvency company number comparator delay | "10m30s" |
INSOLVENCY_CASE_COUNT_MONGO_ORACLE_ENABLED | Insolvency case count comparator toggle | "true" / "false" |
INSOLVENCY_CASE_COUNT_MONGO_ORACLE_DELAY | Insolvency case count comparator delay | "11m30s" |
Variable | Description | Example |
EMAIL_RECIPIENT_LIST | The email accounts that will be notified when results from a comparison are available | [email protected] |
EMAIL_APPLICATION_ID | Template configuration for the email sender | application_id |
EMAIL_MESSAGE_ID | Template configuration for the email sender | message_id |
EMAIL_MESSAGE_TYPE | Template configuration for the email sender | message_type |
EMAIL_SENDER | The value of the email's To field | [email protected] |
Variable | Description | Example |
RESULTS_INITIAL_CAPACITY | Used to optimise collections for the number of expected results | 1000000 |
mvn compile jib:dockerBuild -Dimage=169942020521.dkr.ecr.eu-west-1.amazonaws.com/local/data-reconciliation
Clone Docker CHS Development and follow the steps in the README.
Enable the
module -
tilt up
and wait for all services to start
Development mode is available for this service in Docker CHS Development.
./bin/chs-dev development enable data-reconciliation
This will clone the data reconciliation app into the repositories folder. Any changes to the code, or resources will automatically trigger a rebuild and reluanch.
The code present in this repository is used to define and deploy a dockerised container in AWS ECS. This is done by calling a module from terraform-modules. Application specific attributes are injected and the service is then deployed using Terraform via the CICD platform 'Concourse'.
Application specific attributes | Value | Description |
ECS Cluster | data-reconciliation-service | ECS cluster (stack) the service belongs to |
Load balancer | non required | The load balancer that sits in front of the service |
Concourse pipeline | Pipeline link Pipeline code |
Concourse pipeline link in shared services |
- Please refer to the ECS Development and Infrastructure Documentation for detailed information on the infrastructure being deployed.
- Ensure the terraform runner local plan executes without issues. For information on terraform runners please see the Terraform Runner Quickstart guide.
- If you encounter any issues or have questions, reach out to the team on the #platform slack channel.
- Any secrets required for this service will be stored in Vault. For any updates to the Vault configuration, please consult with the #platform team and submit a workflow request.