Bulkrax is a batteries included importer for Samvera applications. It currently includes support for OAI-PMH (DC and Qualified DC) and CSV out of the box. It is also designed to be extensible, allowing you to easily add new importers in to your application or to include them with other gems. Bulkrax provides a full admin interface including creating, editing, scheduling and reviewing imports.
Add this line to your application's Gemfile:
gem 'bulkrax'
# or if using from github
gem 'bulkrax', git: 'https://github.com/samvera/bulkrax.git', branch: 'main'
And then execute:
$ bundle install
$ rails generate bulkrax:install
$ rails db:migrate
If using Sidekiq, set up queues for import
and export
.
If posix-spawn is failing to bundle on an ARM based processor, try the following
bundle config build.posix-spawn --with-cflags="-Wno-incompatible-function-pointer-types"
Then rebundle. See rtomayko/posix-spawn#92
Add this line to your application's Gemfile:
gem 'bulkrax'
And then execute:
$ bundle install
$ rails db:migrate
Mount the engine in your routes file
mount Bulkrax::Engine, at: '/'
If using Sidekiq, set up queues for import
and export
.
# in config/sidekiq.yml
:queues:
- default
- import # added
- export # added
# your other queues ...
# in app/assets/javascripts/application.js - before //= require_tree .
//= require bulkrax/application
# in app/assets/stylesheets/application.css - before *= require_self
*= require 'bulkrax/application'
You'll want to add an initializer to configure the importer to your needs:
# config/initializers/bulkrax.rb
Bulkrax.setup do |config|
# some configuration
end
The configuration guide provides detailed instructions on the various available configurations.
Example:
Bulkrax.setup do | config |
# If the work type isn't provided during import, use Image
config.default_work_type = 'Image'
# Setup a field mapping for the OaiDcParser
# Your application metadata fields are the key
# from: fields in the incoming source data
config.field_mappings = {
"Bulkrax::OaiDcParser" => {
"contributor" => { from: ["contributor"] },
"creator" => { from: ["creator"] },
"date_created" => { from: ["date"] },
"description" => { from: ["description"] },
"identifier" => { from: ["identifier"] },
"language" => { from: ["language"], parsed: true },
"publisher" => { from: ["publisher"] },
"related_url" => { from: ["relation"] },
"rights_statement" => { from: ["rights"] },
"source" => { from: ["source"], source_identifier: true },
"subject" => { from: ["subject"], parsed: true },
"title" => { from: ["title"] },
"resource_type" => { from: ["type"], parsed: true },
"remote_files" => { from: ["thumbnail_url"], parsed: true }
}
}
end
An Import needs to know what Work Type to create. The importer looks for:
- An incoming metadata field mapped to 'model'
- An incoming metadata field mapped to 'work_type'
If it does not find either of these, or the data they contain is not a valid Work Type in the repository, the default_work_type
will be used.
The install generator sets default_work_type
to the first Work Type returned by Hyrax.config.curation_concerns
(stringified), but this can be overwritten by setting default_work_type
in config/initializer/bulkrax.rb
as shown above.
It's unlikely that the incoming import data has fields that exactly match those in your repository. Field mappings allow you to tell bulkrax how to map field in the incoming data to a field in your application.
By default, a mapping for the OAI parser has been added to map standard oai_dc fields to Hyrax basic_metadata. The other parsers have no default mapping, and will map any incoming fields to Hyrax properties with the same name. Configurations can be added in config/initializers/bulkrax.rb
Configuring field mappings is documented in the Bulkrax Configuration Guide.
- The BagIt Parser will import files in the data folder of the bag.
- The CSV folder will import files in columns named file (located local to the import csv file in a folder called files) or remote_files (where urls are supplied).
- The OAI parser will import a thumbnail_url specified during import. Pattern matching is supported.
- The XML Parser is not configured to import files by default. To configure URL import, map an incoming element to the remote_files Hyrax property. To map local files for import, we suggest utilizing the
HasLocalProcessing
class injected by the generator.
For example:
module Bulkrax::HasLocalProcessing
def add_local
parsed_metadata['file'] = image_paths
end
# Files are in a folder called files, relative to the import file
# with a sub-folder that matches the system_identifier_field
def image_paths
import_path = importerexporter.parser_fields['import_file_path']
import_path = File.dirname(import_path) if File.file?(import_path)
real_path = File.join(import_path, 'files', "#{parsed_metadata[Bulkrax.system_identifier_field].first}")
Dir.glob(real_path)
end
end
For further information on how to extend and customize Bulkrax, please see the Bulkrax Customization Guide.
Once you have Bulkrax installed, you will have access to an easy to use interface with which you are able to create, edit, delete, run, and re-run imports and exports.
Imports can be scheduled to run once or on a daily, monthly or yearly interval.
Import and export is available to admins via the Importers tab on the dashboard. Export currently supports CSV only.
From the admin dashboard, select the "Importers" tab. You will see a list of previously created importers with details of last run, next run, number of records enqueued and processed, failures, deleted upstream records, and total records. From this page you can create a new importer, edit an importer or delete an importer.
From the admin dashboard, select the "Exporters" tab. You will see a list of previously created exporters with details of last run, number of records enqueued and processed, failures, deleted upstream records, and total records. From this page you can create a new exporter, edit an exporter or delete an exporter.
To create a new importer, select the "New" button on the Importers or Exporters page and complete the form. Name and, for Importer, Administrative set are required. When you select a parser, you will see a set of specific fields to complete.
To edit an importer or exporter, select the edit icon (pencil) and complete the form.
To delete an importer or exporter, select the delete (x) icon.
Once your the exporter has run, a download icon will appear on the exporters menu page.
- Ruby 2.7 or newer is required
- Hyrax 2.3 or newer is required
If you're working on a PR for this project, create a feature branch off of main
.
This repository follows the Samvera Community Code of Conduct and language recommendations. Please do not create a branch called master
for this repository or as part of your pull request; the branch will either need to be removed or renamed before it can be considered for inclusion in the code base and history of this repository.
See CONTRIBUTING.md for contributing guidelines.
We encourage everyone to help improve this project. Bug reports and pull requests are welcome on GitHub at https://github.com/samvera/bulkrax.
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
Questions can be sent to [email protected]. Please make sure to include "Bulkrax" in the subject line of your email.
The gem is available as open source under the terms of the Apache 2.0 License.