Skip to content

CSV Importer

Benjamin Kiah Stroud edited this page Aug 10, 2020 · 32 revisions

CSV Importer

Bulkrax can import from a CSV file that follows the following guidelines.

Required fields

  • The CSV MUST have a header row to uniquely identify the record.
  • This header row MUST be called either source_identifier or the field name configured in config/initializers/bulkrax.rb as source_identifier_field_mapping.
  • The source_identifier field MUST contain a unique identifier for the item.
  • The CSV MUST have a title column
  • There MUST be a source_identifier and title for all works

Note: The source identifier is added to the imported Work in the system_identifier_field (see below for an explanation of the system identifier field). The default is source.

Supported fields

All columns will be imported if the column name matches an existing metadata property in Hyrax, eg. title, creator etc.

In addition, the following columns will be imported:

  • collection
  • file
  • remote_files
  • model

Collections

A column headed collection will be used to define which collection imported works should be added to.

Multiple collections can be supplied, if separated with a semi-colon (;) or pipe (|).

If the value provided matches a value found in the system_identifier_field of an existing collection, then works will be added to that collection. If not, a new collection will be created and both title and system_identifier_field will be set to the value supplied in the collection column.

For example

source_identifier title collection
imported_work_1 Work One Collection One
imported_work_2 Work Two Collection One; Collection Two

In the first row (after the header), the Work being imported will be added to Collection One, and in the second, to both Collection One and Collection Two.

If either of those already exist, then the existing collection is used. If not, a new one is created.

Model

The model column is used to determine the work type. It is not required. In it's absence, either the field mapping or default_work_type will be used. Read more about these in the Configuration guide.

Files

Files will be imported from a column called file or remote_files if they are present.

The remote_files column will contain URLs to files which will be downloaded and imported. Multiple files can be imported, if separated by a semi-colon (;) or pipe (|) (URLs themselves MUST NOT contain semi-colons or pipes).

The file column will contain filenames (these must be unique). Multiple files can be imported, if separated by a semi-colon (;) or pipe (|) (filenames themselves MUST NOT contain semi-colons or pipes).

Files Location

If imported from a pre-existing server location, files MUST be placed in a directory called files relative to the location of the CSV file.

If uploading using Browse Everything, the location of the files will be handled by the system.

For example:

source_identifier title creator publisher file
first_work First work title Smith, John Faber and Faber document.pdf
second_work Second work title Jones, David Macmillan firstdocument.docx; seconddocument.pdf
third_work Third work title Other, A.N. Penguin

If the CSV to be imported is located at

/tmp/imports/1/csv-to-be-imported.csv

The files would be at:

/tmp/imports/1/files/document.pdf
/tmp/imports/1/files/firstdocument.docx
/tmp/imports/1/files/seconddocument.pdf

The third_work does not have any associated files.

Importing Metadata and Files from a Zip file

A Zip file containing a single CSV and a folder named files/ can be imported by the CSV Importer. The structure of the Zip is very important and is as follows:

metadata.csv
files/
  |
  file_1.png
  file_2.jpg

See the Files Location guide for how to reference the files within the CSV

In Finder, select the CSV and the files/ folder (cmd + click to select multiple items), right click, and select Compress. This will create the Zip file that will be imported.

NOTE: The names of the files themselves don't matter, as long as they match what's in the files column in the CSV. Likewise, the name of the CSV does not matter. However, the name of the folder containing the files does matter and should be written exactly as "files" (lowercase and plural). Also, the structure of the Zip is important; for example, if you compress a directory containing the CSV and the files/ folder, it will not import properly.

Configuration and Customization

Please see the Configuration guide for information on how to configure and customize import. For example, by excluding columns from import, or splitting data on specific delimeters.

Clone this wiki locally