CSV Importer

Bulkrax can import from a CSV file that follows the following guidelines.

The CSV must have a header row called source_identifier which must contain a unique identifier for the item to be stored against the imported Work in the Bulkrax system_identifier_field.

The CSV must also contain a title column, and must have titles for all works.

All columns will be imported if the header matches an existing metadata property in Hyrax, eg. title, creator etc.

In addition, the following columns will be imported:

file
collection

Collections

A column headed collection will be used to define which collection imported works should be added to.

Multiple collections can be supplied, if separated with a semi-colon.

If the value provided matches a value found in the system_identifier_field of an existing collection, then works will be added to that collection. If not, a new collection will be created and both title and system_identifier_field will be set to the value supplied in the collection column.

For example

source_identifier	title	collection
imported_work_1	Work One	Collection One
imported_work_2	Work Two	Collection One; Collection Two

In the first row (after the header), the Work being imported will be added to Collection One, and in the second, to both Collection One and Collection Two.

If either of those already exist, then the existing collection is used. If not, a new one is created.

Files

Files will be imported from a column called file if it is present

This column will contain filenames

Files must be placed in a directory called files relative to the location of the CSV file

Multiple files can be imported, if separated by a semi-colon (filenames themselves MUST NOT contain semi-colons)

For example

source_identifier	title	creator	publisher	file
first_work	First work title	Smith, John	Faber and Faber	document.pdf
second_work	Second work title	Jones, David	Macmillan	firstdocument.docx; seconddocument.pdf
third_work	Third work title	Other, A.N.	Penguin

If the CSV to be imported is located at

/tmp/imports/1/csv-to-be-imported.csv

The files would be at:

/tmp/imports/1/files/document.pdf
/tmp/imports/1/files/firstdocument.docx
/tmp/imports/1/files/seconddocument.pdf

The third_work does not have any associated files.

System Identifier Field

In Bulkrax, the system_identifier_field is set within the configuration to an existing Hyrax field. This field will be used to store the import identifier or the Work or Collection. When the import runs, it checks whether a Work or Collection already exists with that identifier, and if so, it updates that existing item. If it does not, then a new item is created.

The default system_identifier_field is set to 'source'. This can be changed in the local application as follows:

# config/initializers/bulkrax.rb

Bulkrax.setup do | config |
  config.system_identifier_field = 'identifier'
end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV Importer

CSV Importer

Collections

Files

System Identifier Field

Clone this wiki locally