FP-Akka

FP-Akka is a biodiversity data cleaning tool produced by the FilteredPush and Kurator projects with support from the US National Science Foundation. It is able to run one of a small set of predefined data quality control workflows on flat DarwinCore data held in MongoDB, DarwinCore archive files with a dwc:Occurrence core, or csv files containing DarwinCore data.

Documentation

See: http://wiki.datakurator.org/wiki/FP-Akka_User_Documentation

Building

To build FP-Akka from source, use maven.

mvn install

The buisness logic internals of the workflow actors can be found in the FP-KurationServices package. You will need to build FP-KurationServices to build FP-Akka.

https://github.com/FilteredPush/FP-KurationServices

Running

To run FP-Akka, use java.

java -jar FP-Akka-{version}-workflowstarter.jar -w dwca -i dwca-file.zip -o output.json

JSON output can be converted to human readable spreadsheets with the FP-postprocessor.jar tool.

Available Workflows are: DwCa, CSV, MONGO (specify workflow to run with -w).

The DwCa (-w DwCa) workflow checks an occurrence file and reports (in JSON) on quality aspects of the scientific name, event date, georeference, and basis of record elements of the data.

Options for the DwCa workflow (-w DwCa) are:

-a VAL : Authority to check scientific names against (IPNI, IF, WoRMS, COL, GBIF, GlobalNames), default GBIF.
-i VAL : Input occurrence.txt (tab delimited occurrence core from a DwC archive) file.
-l N   : Limit on the number of records to read before stopping.
-o VAL : output JSON file
-t     : Run scientific name validator in taxonomic mode (look up name in current use).

Options for the CSV workflow (-w CSV) are:

-a VAL : Authority to check scientific names against (IPNI, IF, WoRMS, COL, GBIF, GlobalNames), default GBIF.
-i VAL : Input CSV file
-l N   : Limit on the number of records to read before stopping.
-o VAL : output file (.json unless -s is specified, in which case .csv)
-s     : Only check scientific names with SciNameValidator (outputs will be .csv, not .json)
-t     : SciNameValidator taxonomicMode Mode (look up name in current use).

The CSV workflow will perform one of two different operations.
It, like the DwCA workflow, can operate on flat DarwinCore and check quality aspects of the scientific name, event date, and georeference, reporting the results in JSON. Alternatively (with -s), the CSV workflow can take a flat csv file of dwc:scientificName and dwc:scientificNameAuthorship and report on matches on those names found in specified authorities in a CSV output.
This operation is intended for quality control of taxonomic authority files against external authorities.

Options for the MongoDB workflow (-w Mongo) are:

-a VAL  : Authority to check scientific names against (IPNI, IF, WoRMS, COL, GBIF, GlobalNames), default GBIF.
-ci VAL : Input Collection in mongo to query for records to process.
-co VAL : Output Collection in mongo into which to write results.
-d VAL  : db
-h VAL  : MongoDB Host
-l N    : Limit on the number of records to read before stopping.
-q VAL  : Query on Mongo collection to select records to process, e.g. {institutionCode:\"NMSU\"} 
-t      : Run scientific name validator in taxonomic mode (look up name in current use).

The Mongo workflow performs the same checks as the DwCa workflow, except it queries a collection in a MongoDB datastore for the input DarwinCore (in JSON), and writes the output into a mongo collection. It is intended for use as an analytical capability in a FilteredPush node.

There is also an experimental jettyStarter workflow that can be run (e.g. from eclipse) to run as a service able to load data from the iDigBio API. See the org.filteredpush.akka.workflows.jettyStarter class.

Name		Name	Last commit message	Last commit date
Latest commit History 1,932 Commits
.settings		.settings
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
LICENSE.txt		LICENSE.txt
README.md		README.md
occurrence_demoset.txt		occurrence_demoset.txt
output_demoset.xls		output_demoset.xls
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FP-Akka

Documentation

Building

Running

About

Releases 1

Packages

Contributors 3

Languages

License

FilteredPush/FP-Akka

Folders and files

Latest commit

History

Repository files navigation

FP-Akka

Documentation

Building

Running

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages