You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to ensure observability into our Harvesting 2.0 pipeline, data.gov wants a Controller module that orchestrates the harvesting sub-processes.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
GIVEN a Python controller
WHEN a harvesting API event is triggered by Data providers or Datagov admins
THEN a new harvesting job is initiated by the controller
Background
Data.gov wants observability and resilience build into the Harvesting 2.0 pipeline.
Controller should ensure that a job is tracked throughout the harvesting lifecycle of:
Extraction
Validation
Transformation
This process should be transparent, observable and idempotent.
All traffic should be encrypted in transit and at rest.
Sketch
Create python controller module
Create tests in pytest that ensure mock output from modules is traceable and that errors which are thrown at any step in the process are reported accurately
The text was updated successfully, but these errors were encountered:
Upon review of this issue, it has been decided that we will combine the core code of this module into the existing harvesting module as another feature. We are creating this capability specifically to interface with the harvesting extract, transform, validate, compare and load submodules. There is an abstraction between the logical algorithm (the code to run) and the implementation of the application/service (running the code) (i.e. We will create a separate repo for the deployment of the application or service which would call this core logic from the harvesting module).
As a result, the current layout of the harvesting module should look something like:
In terms of the features of the controller, those will be continuously updated in our Wiki doc and deals with the infrastructure to support the management of the 'job' and 'record' queues. The next ticket in this sequence is:
User Story
In order to ensure observability into our Harvesting 2.0 pipeline, data.gov wants a Controller module that orchestrates the harvesting sub-processes.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
WHEN a harvesting API event is triggered by Data providers or Datagov admins
THEN a new harvesting job is initiated by the controller
Background
Data.gov wants observability and resilience build into the Harvesting 2.0 pipeline.
Controller should ensure that a job is tracked throughout the harvesting lifecycle of:
This process should be transparent, observable and idempotent.
Security Considerations (required)
All traffic should be encrypted in transit and at rest.
Sketch
The text was updated successfully, but these errors were encountered: