-
Notifications
You must be signed in to change notification settings - Fork 23
CSV Importer
Bulkrax can import from a CSV file that follows the following guidelines.
The CSV must have a header row called source_identifier
which must contain a unique identifier for the item to be stored against the imported Work in the Bulkrax system_identifier_field
(see below for an explanation of the system identifier field).
The CSV must also contain a title
column, and must have titles for all works.
All columns will be imported if the header matches an existing metadata property in Hyrax, eg. title, creator etc.
In addition, the following columns will be imported:
- file
- collection
A column headed collection
will be used to define which collection imported works should be added to.
Multiple collections can be supplied, if separated with a semi-colon.
If the value provided matches a value found in the system_identifier_field
of an existing collection, then works will be added to that collection. If not, a new collection will be created and both title and system_identifier_field
will be set to the value supplied in the collection column.
For example
source_identifier | title | collection |
---|---|---|
imported_work_1 | Work One | Collection One |
imported_work_2 | Work Two | Collection One; Collection Two |
In the first row (after the header), the Work being imported will be added to Collection One, and in the second, to both Collection One and Collection Two.
If either of those already exist, then the existing collection is used. If not, a new one is created.
Files will be imported from a column called file
if it is present
This column will contain filenames
Files must be placed in a directory called files
relative to the location of the CSV file
Multiple files can be imported, if separated by a semi-colon (filenames themselves MUST NOT contain semi-colons)
For example
source_identifier | title | creator | publisher | file |
---|---|---|---|---|
first_work | First work title | Smith, John | Faber and Faber | document.pdf |
second_work | Second work title | Jones, David | Macmillan | firstdocument.docx; seconddocument.pdf |
third_work | Third work title | Other, A.N. | Penguin |
If the CSV to be imported is located at
/tmp/imports/1/csv-to-be-imported.csv
The files would be at:
/tmp/imports/1/files/document.pdf
/tmp/imports/1/files/firstdocument.docx
/tmp/imports/1/files/seconddocument.pdf
The third_work does not have any associated files.
In Bulkrax, the system_identifier_field is set within the configuration to an existing Hyrax field. This field will be used to store the import identifier or the Work or Collection. When the import runs, it checks whether a Work or Collection already exists with that identifier, and if so, it updates that existing item. If it does not, then a new item is created.
The default system_identifier_field is set to 'source'. This can be changed in the local application as follows:
# config/initializers/bulkrax.rb
Bulkrax.setup do | config |
config.system_identifier_field = 'identifier'
end