You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to manage large numbers of harvest jobs, data.gov wants to define a series of queue systems using Redis.
Acceptance Criteria
First we will define the queues themselves:
queue
purpose
job
jobs waiting to be picked up by the harvester pipeline
extract
harvest source waiting to have catalog parsed
compare
an incoming record with unique UUID waiting to be compared with the current record of same identifier
validate
a record in need of validation against expected schema
transform
a record in need of transformation from one schema to another
load
a record ready to be uploaded into current catalog UI (currently created, updated, or deleted in CKAN)
Then we will define their lifecycles:
queue
lifecycle state
definition
job
create
a job is awaiting being picked up by the harvester
extract
a harvest source being extracted to catalog of records
compare
a catalog of records is awaiting being compared with its companion in CKAN
processing
record-level processing of add, update, delete
completion
harvest job has finished successfully or in error
extract
create
a harvest source in queue
processing
a harvest source is being extracted
completion
all records extracted from harvest source and saved to S3 under appropriate prefix
compare
create
a catalog of records in queue
processing
a catalog of records is being compared with harvest source found in UI
completion
individual records have been sent to next step, determined by whether they need to be added, updated or deleted
validate
create
an individual record
processing
validation against given schema
completion
pass/fail parsed against schema
transform
create
an individual record
processing
transforming a record in one schema to another schema
completion
record sent to validation queue for final validation of successful transformation
load
create
an individual record
processing
RESTful operation against CKAN catalog based on whether the record should be created, updated, or deleted
completion
success or failure of that process
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Multiple harvest jobs running concurrently will consume excessive system resources. Regardless of pipeline speed, we would like to define a definitive FIFO (first in, first out) system to guarantee linear processing of harvest sources.
User Story
In order to manage large numbers of harvest jobs, data.gov wants to define a series of queue systems using Redis.
Acceptance Criteria
First we will define the queues themselves:
created
,updated
, ordeleted
in CKAN)Then we will define their lifecycles:
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Multiple harvest jobs running concurrently will consume excessive system resources. Regardless of pipeline speed, we would like to define a definitive FIFO (first in, first out) system to guarantee linear processing of harvest sources.
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
None
Sketch
The text was updated successfully, but these errors were encountered: