-
Notifications
You must be signed in to change notification settings - Fork 23
CSV Importer
Bulkrax can import from a CSV file that follows the following guidelines.
- The CSV MUST have a header row to uniquely identify the record.
- This header row MUST be called either
source_identifier
or the field name configured inconfig/initializers/bulkrax.rb
assource_identifier_field_mapping
. - The source_identifier field MUST contain a unique identifier for the item.
- The CSV MUST have a
title
column - There MUST be a
source_identifier
andtitle
for all works
Note: The source identifier is added to the imported Work in the system_identifier_field
(see below for an explanation of the system identifier field). The default is source.
All columns will be imported if the column name matches an existing metadata property in Hyrax, eg. title, creator etc.
In addition, the following columns will be imported:
- collection
- file
- remote_files
- model
A column headed collection
will be used to define which collection imported works should be added to.
Multiple collections can be supplied, if separated with a semi-colon (;) or pipe (|).
If the value provided matches a value found in the system_identifier_field
of an existing collection, then works will be added to that collection. If not, a new collection will be created and both title and system_identifier_field
will be set to the value supplied in the collection column.
For example
source_identifier | title | collection |
---|---|---|
imported_work_1 | Work One | Collection One |
imported_work_2 | Work Two | Collection One; Collection Two |
In the first row (after the header), the Work being imported will be added to Collection One, and in the second, to both Collection One and Collection Two.
If either of those already exist, then the existing collection is used. If not, a new one is created.
The model column is used to determine the work type. It is not required. In it's absence, either the field mapping or default_work_type will be used. Read more about these in the Configuration guide.
Files will be imported from a column called file
or remote_files
if they are present.
The remote_files
column will contain URLs to files which will be downloaded and imported. Multiple files can be imported, if separated by a semi-colon (;) or pipe (|) (URLs themselves MUST NOT contain semi-colons or pipes).
The file
column will contain filenames (these must be unique). Multiple files can be imported, if separated by a semi-colon (;) or pipe (|) (filenames themselves MUST NOT contain semi-colons or pipes).
If imported from a pre-existing server location, files MUST be placed in a directory called files
relative to the location of the CSV file.
If uploading using Browse Everything, the location of the files will be handled by the system.
For example:
source_identifier | title | creator | publisher | file |
---|---|---|---|---|
first_work | First work title | Smith, John | Faber and Faber | document.pdf |
second_work | Second work title | Jones, David | Macmillan | firstdocument.docx; seconddocument.pdf |
third_work | Third work title | Other, A.N. | Penguin |
If the CSV to be imported is located at
/tmp/imports/1/csv-to-be-imported.csv
The files would be at:
/tmp/imports/1/files/document.pdf
/tmp/imports/1/files/firstdocument.docx
/tmp/imports/1/files/seconddocument.pdf
The third_work does not have any associated files.
A Zip file containing a single CSV and a folder named files/
can be imported by the CSV Importer. The structure of the Zip is very important and is as follows:
metadata.csv
files/
|
file_1.png
file_2.jpg
See the Files Location guide for how to reference the files within the CSV
In Finder, select the CSV and the files/
folder (cmd + click
to select multiple items), right click, and select Compress. This will create the Zip file that will be imported.
NOTE: The names of the files themselves don't matter, as long as they match what's in the files
column in the CSV. Likewise, the name of the CSV does not matter. However, the name of the folder containing the files does matter and should be written exactly as "files" (lowercase and plural). Also, the structure of the Zip is important; for example, if you compress a directory containing the CSV and the files/
folder, it will not import properly.
Please see the Configuration guide for information on how to configure and customize import. For example, by excluding columns from import, or splitting data on specific delimeters.