Skip to content

CSV Importer

Julie Allinson edited this page Jul 29, 2019 · 32 revisions

CSV Importer

Bulkrax can import from a CSV file that follows the following guidelines.

The CSV must have a header row called source_identifier which must contain a unique identifier for the item to be stored against the imported Work in the Bulkrax system_identifier_field.

The CSV must also contain a title column, and must have titles for all works.

All columns will be imported if the header matches an existing metadata property in Hyrax, eg. title, creator etc.

In addition, the following columns will be imported:

  • file
  • collection

Collections

A column headed collection will be used to define which collection imported works should be added to.

Multiple collections can be supplied, if separated with a semi-colon.

If the value provided matches a value found in the system_identifier_field of an existing collection, then works will be added to that collection. If not, a new collection will be created and both title and system_identifier_field will be set to the value supplied in the collection column.

For example

source_identifier title collection
imported_work_1 Work One Collection One
imported_work_2 Work Two Collection One; Collection Two

In the first row (after the header), the Work being imported will be added to Collection One, and in the second, to both Collection One and Collection Two.

If either of those already exist, then the existing collection is used. If not, a new one is created.

Files

Files will be imported from a column called file if it is present

This column will contain filenames

Files must be placed in a directory called files relative to the location of the CSV file

Multiple files can be imported, if separated by a semi-colon (filenames themselves MUST NOT contain semi-colons)

For example

source_identifier title creator publisher file
first_work First work title Smith, John Faber and Faber document.pdf
second_work Second work title Jones, David Macmillan firstdocument.docx; seconddocument.pdf
third_work Third work title Other, A.N. Penguin

If the CSV to be imported is located at

/tmp/imports/1/csv-to-be-imported.csv

The files would be at:

/tmp/imports/1/files/document.pdf
/tmp/imports/1/files/firstdocument.docx
/tmp/imports/1/files/seconddocument.pdf

The third_work does not have any associated files.

System Identifier Field

In Bulkrax, the system_identifier_field is set within the configuration to an existing Hyrax field. This field will be used to store the import identifier or the Work or Collection. When the import runs, it checks whether a Work or Collection already exists with that identifier, and if so, it updates that existing item. If it does not, then a new item is created.

The default system_identifier_field is set to 'source'. This can be changed in the local application as follows:

# config/initializers/bulkrax.rb

Bulkrax.setup do | config |
  config.system_identifier_field = 'identifier'
end
Clone this wiki locally