Lesson: Add attached files

Goals

Attaching file sub-resources to models
See where files are stored in Fedora objects and how to retrieve them

Explanation

So far, we've only added metadata to our objects. Let's attach a file that has some content to it. For example, for our BibliographicFile model, this could be a image of the bibliographic resource's cover or a pdf of the bibliographic resource's content, or for the PageFile model, an image or pdf of a single page.

In this case, we'll add a file where we can store a pdf of a page.

Steps

Step 1: In the console, add a content file resource to the Page model

By defining our PageFile model to include the behaviors of a generic file, it is ready to have the page content uploaded. Each file you want to upload will go into a separate generic file. Generic files are defined to hold one uploaded content file and any number of derivatives of the uploaded content, for example a thumbnail image file and full text file. The following shows an example of uploading a content file.

require 'open-uri'
pf1 = PageFile.find('page-1')
=> #<PageFile id: "page-1", page_number: 1, text: "Once upon a midnight dreary...", head_id: nil, tail_id: nil>

file1 = open("https://github.com/projecthydra-labs/hydra-works/wiki/raven_files/TheRaven_page1.pdf","r")
=> #<Tempfile:/var/folders/cm/zq5vgsj946n5hws81m85h5fr0000gn/T/open-uri20150922-869-2uceq0>

Hydra::Works::UploadFileToGenericFile.call(pf1, file1)
=> #<PageFile id: "page-1", page_number: 1, text: "Once upon a midnight dreary...", head_id: nil, tail_id: nil>

pf1.save
=> true

pf1.files
=> [#<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/a64557f8-1c74-4cf0-9d55-3acaebf98bc7" >]

NOTE: There are several ways to create a file that is acceptable to the UploadFileToGenericFile service. See the documentation in the header of the service definition file for an exhaustive list. At the writing of this tutorial, the list of accepted content files is...

    # @param [IO,File,Rack::Multipart::UploadedFile, #read] object that will be the contents. If file responds to :mime_type or :original_name, those will be called to provide technical metadata.

Step 2: View the contents from Fedora

Copy the URL you get when you run pf1.files and paste it into your browser. You will need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin).

NOTE: Some browsers will recognize that this is a pdf file and open it appropriately. Or it may try to open it as text and you will need to choose to open it with Adobe Reader.

Step 3: Fix the mimetype set by github

If you used open-uri to open the file directly from github, then the mimetype on the file is incorrectly set to "application/octet-stream". We are going to change it to "application/pdf" before continuing with derivatives.

f1 = pf1.files.first
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-31/files/2520d9d5-4631-4f20-9fce-f40eb1bd1095" >

f1.mime_type
=> "application/octet-stream"

f1.mime_type = 'application/pdf'
=> "application/pdf"

pf1.files.first.mime_type
=> "application/pdf"

Step 4: Generate a thumbnail derivative

There are dependencies that have to be installed prior to being able to generate a thumbnail. See hydra-derivatives for the dependency list and other useful information on working with the hydra-derivatives gem.

Once dependencies have been installed, type the following in the rails console to generate a thumbnail.

Hydra::Works::GenerateThumbnail.call(pf1)
=> #<PageFile id: "page-1", page_number: 1, text: "Once upon a midnight dreary...", head_id: nil, tail_id: nil>

pf1.files
=> [#<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/a64557f8-1c74-4cf0-9d55-3acaebf98bc7" >, #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/59c82c36-8c75-4c48-9af9-0d39835bc29f" >]

pf1.thumbnail
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/3b1492be-55c3-4415-a28b-713311f4bdf9" >

Step 5: View the thumbnail from Fedora

Copy the URL from pf1.thumbnail and paste it into your browser. You will need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin).

Step 6: Generate full text derivative

To generate the full text derivative, type the following in the rails console.

extracted_text = Hydra::Works::FullTextExtractionService.run(pf1)
=> # all the text for page 1

pf1.build_extracted_text
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/7d761246-e65b-48d1-b6c3-0e9537cdf5f2" >

pf1.extracted_text.content = extracted_text
=> # all the text for page 1

pf1.save
=> true

pf1.extracted_text
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/7d761246-e65b-48d1-b6c3-0e9537cdf5f2" >

NOTE: The process for generating derivatives is under review and will likely change such that all derivatives are generated through the hydra-derivatives gem.

Step 7: View the extracted text from Fedora

Copy the URL from pf1.extracted_text and paste it into your browser. You will need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin).

Next Step

Proceed to BONUS Lesson: Generate Rails Scaffolding for Creating and Editing Bibliographic Resources or explore other [Dive into Hydra-Works](Dive into Hydra-Works#Bonus) tutorial bonus lessons.

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly