Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sword Documentation #85

Open
1 of 5 tasks
jillpe opened this issue Jun 10, 2024 · 9 comments
Open
1 of 5 tasks

Sword Documentation #85

jillpe opened this issue Jun 10, 2024 · 9 comments
Assignees
Labels

Comments

@jillpe
Copy link

jillpe commented Jun 10, 2024

Summary

Documentation on how to use the Sword2 integration would be helpful for the pals team and their users

Acceptance Criteria

  • Documentation on how to use Sword2 integration is recorded and shared with the pals team
  • it should include the H4C perspective [updated per Clinton's comment]
    • How does a tenant request/configure the plugin at the tenant level?
    • How does a tenant request/configure API keys at the account level?
      • [must ask a dev to generate API keys]
    • What are the metadata mapping expectations from the H4C instance?
@jillpe jillpe added the Pitt label Jun 10, 2024
@jillpe
Copy link
Author

jillpe commented Jun 13, 2024

Documentation is in the wiki - is this helpful?
https://github.com/CottageLabs/willow_sword/wiki/Usage#authorizing-sword-requests

@jillpe jillpe moved this to Pitt QA in palni-palci Sep 4, 2024
@ctgraham
Copy link

ctgraham commented Sep 11, 2024

This documentation looks good from the plugin perspective, but we also need instructions from the H4C service perspective. This additional detail would not live in the CottageLabs/willow_sword wiki, but would be published by PALs. It should cover:

  • How does a tenant request/configure the plugin at the tenant level?
  • How does a tenant request/configure API keys at the account level?
  • What are the metadata mapping expectations from the H4C instance?

@ShanaLMoore ShanaLMoore added the needs rework issue needs additional work label Sep 11, 2024
@ShanaLMoore ShanaLMoore moved this from Pitt QA to Ready for Development in palni-palci Sep 11, 2024
@ctgraham
Copy link

ctgraham commented Nov 8, 2024

What are the metadata mapping expectations from the H4C instance?

This is tentatively addressed by asserting that the SWORD Server is the authority for the metadata. The server is not responsible for the mapping, but rather the client is.

Note that the SWORD protocol does not provide support for the client requesting a specific metadata format on the GET requests; thus, the client is implicitly expected to perform mapping from the schema on the server for reads. Similarly, the client can use any metadata format which the server supports for PUT/POST, but it is the client's responsibility to map to one of these schemas.

We'll need buy-in from the clients for this to work.

@kirkkwang
Copy link
Contributor

kirkkwang commented Jan 31, 2025

@ctgraham
Can you take a look at this documentation and give feedback?

Hyku's SWORD V2 API Documentation

Overview

Hyku 6 comes with WillowSword pre-installed and ready to use. All tenants can utilize WillowSword's SWORD V2 implementation for depositing and managing content, provided they have a valid API key. This document describes the SWORD V2 endpoints serviced by the willow_sword engine.

API Key Generation

Currently, API keys can only be generated by developers with access to the Rails console. Here's how to generate an API key for a user:

require 'securerandom'
u = User.find_by_email('[email protected]')
u.api_key = SecureRandom.uuid
u.save!

All endpoints require authentication via this API key passed in the request headers.

Authentication

All requests require an API key to be included in the headers:

Api-key: example-key-12345

Failure to provide a valid API key will result in a 403 Forbidden response.

Metadata Namespaces

The following metadata namespaces are used in SWORD responses:

Prefix Namespace URI Description
dc http://purl.org/dc/elements/1.1/ Dublin Core (DC) elements for broad compatibility.
dcterms http://purl.org/dc/terms/ Dublin Core Terms (DC Terms), an extended set of metadata elements.
h4cmeta https://hykucommons.org/schema/metadata Hyku Commons-specific metadata for descriptive elements.
h4csys https://hykucommons.org/schema/system Hyku Commons-specific system metadata (e.g., internal identifiers).

Dublin Core vs. Dublin Core Terms

Dublin Core Terms (dcterms) is a superset of the original Dublin Core (dc) metadata standard. While Hyrax/Hyku primarily uses DC Terms, the SWORD response also includes DC elements to support legacy clients that may only recognize the dc namespace whenever possible.

For example:

<!-- This: -->
<dc:title>A Test GenericWork</dc:title>

<!-- Instead of this: -->
<dcterms:title>A Test GenericWork</dcterms:title>

Including both ensures that clients working with older metadata standards can still interact with the system.

Example feed:

<feed dc="http://purl.org/dc/elements/1.1/" dcterms="http://purl.org/dc/terms/" h4cmeta="https://hykucommons.org/schema/metadata" h4csys="https://hykucommons.org/schema/system">
  <title>A Test, For Documentation</title>
  <content rel="src" href="https://demo.hykucommons.org/sword/collections/default/works/some-work-id-1234"/>
  <link rel="edit" href="https://demo.hykucommons.org/sword/collections/default/works/some-work-id-1234/file_sets"/>
  <entry>
    <content rel="src" href="https://demo.hykucommons.org/sword/collections/default/works/some-work-id-1234/file_sets/some-file-set-id-1234"/>
    <link rel="edit" href="https://demo.hykucommons.org/sword/collections/default/works/some-work-id-1234/file_sets/some-file-set-id-1234"/>
  </entry>
  <h4csys:id>some-work-id-1234</h4csys:id>
  <h4csys:internal_resource>Image</h4csys:internal_resource>
  <h4csys:created_at>2024-07-10 23:04:45 UTC</h4csys:created_at>
  <h4csys:updated_at>2025-02-05 22:19:27 UTC</h4csys:updated_at>
  <h4csys:new_record>false</h4csys:new_record>
  <h4csys:date_modified>2025-02-03T23:30:24+00:00</h4csys:date_modified>
  <h4csys:date_uploaded>2025-02-03T23:30:24+00:00</h4csys:date_uploaded>
  <h4csys:depositor>[email protected]</h4csys:depositor>
  <h4csys:state>http://fedora.info/definitions/1/0/access/ObjState#active</h4csys:state>
  <h4cmeta:embargo_id>some-embargo-id-1234</h4cmeta:embargo_id>
  <h4cmeta:title>A Test</h4cmeta:title>
  <h4cmeta:title>For Documentation</h4cmeta:title>
  <h4cmeta:admin_set_id>some-admin-set-id-1234</h4cmeta:admin_set_id>
  <h4cmeta:member_ids>some-file-set-id-1234</h4cmeta:member_ids>
  <h4cmeta:based_near>https://sws.geonames.org/5391811/</h4cmeta:based_near>
  <h4cmeta:contributor>Smith, Josh</h4cmeta:contributor>
  <h4cmeta:creator>Smith, John</h4cmeta:creator>
  <h4cmeta:date_created>2020-07-07</h4cmeta:date_created>
  <h4cmeta:license>https://creativecommons.org/licenses/by-nc/4.0/</h4cmeta:license>
  <h4cmeta:resource_type>Capstone Project</h4cmeta:resource_type>
  <h4cmeta:rights_statement>http://rightsstatements.org/vocab/NoC-CR/1.0/</h4cmeta:rights_statement>
  <h4cmeta:subject>WillowSword</h4cmeta:subject>
  <h4cmeta:visibility_during_embargo>restricted</h4cmeta:visibility_during_embargo>
  <h4cmeta:visibility_after_embargo>open</h4cmeta:visibility_after_embargo>
  <h4cmeta:embargo_release_date>2027-10-04T00:00:00+00:00</h4cmeta:embargo_release_date>
  <dc:title>A Test</dc:title>
  <dc:title>For Documentation</dc:title>
  <dcterms:abstract>This is the abstract, it's so short!</dcterms:abstract>
  <dc:contributor>Smith, Josh</dc:contributor>
  <dc:creator>Smith, John</dc:creator>
  <dc:date>2020-07-07</dc:date>
  <dcterms:license>https://creativecommons.org/licenses/by-nc/4.0/</dcterms:license>
  <dc:type>Capstone Project</dc:type>
  <dc:rights>http://rightsstatements.org/vocab/NoC-CR/1.0/</dc:rights>
  <dc:subject>WillowSword</dc:subject>
  <dcterms:modified>2025-02-03T23:30:24+00:00</dcterms:modified>
  <dcterms:dateSubmitted>2025-02-03T23:30:24+00:00</dcterms:dateSubmitted>
</feed>

Admin Sets & User Collections

The service document will deliver collection elements representing both Admin Sets and User Collections. The type child element will describe this with values of "AdminSet" and "Collection".

Type Description
AdminSet Represents an Admin Set in Hyrax, used for policy and workflow control.
Collection Represents a User Collection, created by end users to organize works.

Example feed

<feed xmlns="http://www.w3.org/2005/Atom">
  <title>A Collection</title>
  <type>Collection</type>
  <link rel="edit" href="https://demo.hykucommons.org/sword/collections/some-collection-id/works"/>
  <entry>
    <content rel="src" href="https://demo.hykucommons.org/sword/collections/some-collection-id/works/some-work-id"/>
    <link rel="edit" href="https://demo.hykucommons.org/sword/collections/some-collection-id/works/some-work-id/file_sets"/>
  </entry>
</feed>

Base URL

Replace https://demo.hykucommons.org with your instance's base URL in all examples.

Endpoints

1. Get Service Document

Retrieves the SWORD service document describing available collections and accepted package formats.

Request:

curl --request GET \
  --url https://demo.hykucommons.org/sword/service_document \
  --header 'Content-Type: application/xml' \
  --header 'Api-key: example-key-12345'

2. Get Collection

Returns a list of works in the specified collection.

Request:

curl --request GET \
  --url https://demo.hykucommons.org/sword/collections/example-collection-id-12345 \
  --header 'Content-Type: application/xml' \
  --header 'Api-key: example-key-12345'

3. Add New Work (Metadata Only)

There are two methods to create a new work with metadata only:

a. Binary Data Method

Request:

curl --request POST \
  --url https://demo.hykucommons.org/sword/collections/default/works \
  --header 'Content-Disposition: attachment; filename=metadata.xml' \
  --header 'Content-Type: application/xml' \
  --header 'In-Progress: false' \
  --header 'On-Behalf-Of: [email protected]' \
  --header 'Packaging: application/atom+xml;type=entry' \
  --header 'Api-key: example-key-12345' \
  --data-binary @dc.xml

b. Form Data Method

Request:

curl --request POST \
  --url https://demo.hykucommons.org/sword/collections/default/works \
  --header 'In-Progress: false' \
  --header 'On-Behalf-Of: [email protected]' \
  --header 'Api-key: example-key-12345' \
  -F [email protected]

4. Add New Work with Files

Several methods are available for adding works with associated files:

a. Zip File Method (Binary)

Request:

curl --request POST \
  --url https://demo.hykucommons.org/sword/collections/default/works/ \
  --header 'Content-Disposition: attachment; filename=testPackage.zip' \
  --header 'Content-Type: application/zip' \
  --header 'In-Progress: false' \
  --header 'On-Behalf-Of: [email protected]' \
  --header 'Packaging: http://purl.org/net/sword/package/BagIt' \
  --header 'Api-key: example-key-12345' \
  --data-binary @testPackage.zip

b. Form Data Method with Separate Metadata

Request:

curl --request POST \
  --url https://demo.hykucommons.org/sword/collections/default/works/ \
  --header 'In-Progress: false' \
  --header 'On-Behalf-Of: [email protected]' \
  --header 'Api-key: example-key-12345' \
  -F [email protected] \
  -F [email protected]

c. Form Data Method with Combined Package

Request:

curl --request POST \
  --url https://demo.hykucommons.org/sword/collections/default/works/ \
  --header 'In-Progress: false' \
  --header 'On-Behalf-Of: [email protected]' \
  --header 'Api-key: example-key-12345' \
  -F [email protected]

5. BagIt Package Support

The API supports BagIt-formatted packages:

Request:

curl --request POST \
  --url https://demo.hykucommons.org/sword/collections/default/works/ \
  --header 'Content-Disposition: attachment; filename=testPackage1InBagit.zip' \
  --header 'Content-MD5: example-md5-12345' \
  --header 'Content-Type: application/zip' \
  --header 'Packaging: http://purl.org/net/sword/package/BagIt' \
  --header 'Api-key: example-key-12345' \
  --data-binary @testPackageInBagit.zip

6. Retrieve Work Information

Request:

curl --request GET \
  --url https://demo.hykucommons.org/sword/collections/default/works/example-work-id-12345 \
  --header 'Api-key: example-key-12345'

7. File Operations

Get File Metadata

Request:

curl --request GET \
  --url https://demo.hykucommons.org/sword/collections/default/works/example-work-id-12345/file_sets/example-file-set-id-12345 \
  --header 'Api-key: example-key-12345'

Add File to Existing Work

Request:

curl --request POST \
  --url https://demo.hykucommons.org/sword/collections/default/works/example-work-id-12345/file_sets/ \
  --header 'Content-Disposition: attachment; filename=example.pdf' \
  --header 'Content-MD5: example-md5-12345' \
  --header 'Content-Type: application/pdf' \
  --header 'In-Progress: false' \
  --header 'On-Behalf-Of: [email protected]' \
  --header 'Packaging: http://purl.org/net/sword/package/Binary' \
  --header 'Api-key: example-key-12345' \
  --data-binary @example.pdf

8. Update Operations

Update Work Metadata

Request:

curl --request PUT \
  --url https://demo.hykucommons.org/sword/collections/default/works/example-work-id-12345/ \
  --header 'Content-Disposition: attachment; filename=metadata.xml' \
  --header 'Content-Type: application/xml' \
  --header 'Packaging: application/atom+xml;type=entry' \
  --header 'Api-key: example-key-12345' \
  --data-binary @dc.xml

Update File Metadata

Request:

curl --request PUT \
  --url https://demo.hykucommons.org/sword/collections/default/works/example-work-id-12345/file_sets/example-file-set-id-12345 \
  --header 'Content-Disposition: attachment; filename=metadata.xml' \
  --header 'Content-Type: application/xml' \
  --header 'Packaging: application/atom+xml;type=entry' \
  --header 'Api-key: example-key-12345' \
  --data-binary @dc.xml

Testing

A set of test files is available for validating your implementation:
swordv2_test_files.zip

Example XML

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
  <title>Test Record</title>
  <creator>Hyku User</creator>
  <subject>Sword</subject>
  <subject>Implementation</subject>
  <description>This is a test dc record to test</description>
  <publisher>PALNI-PALCI</publisher>
  <contributor>Notch8</contributor>
  <created>29/05/2018</created>
  <source>https://www.hykucommons.org/</source>
  <language>English</language>
  <rights_statement>http://rightsstatements.org/vocab/InC/1.0/</rights_statement>
  <resource_type>Article</resource_type>
  <keyword>Test</keyword>
  <keyword>Another</keyword>
  <visibility>lease</visibility>
  <visibility_during_lease>authenticated</visibility_during_lease>
  <lease_expiration_date>2024-05-31</lease_expiration_date>
  <visibility_after_lease>restricted</visibility_after_lease>
  <!-- if no admin set id is specified, it will fallback to the Default Admin Set -->
  <admin_set_id>some-admin-set-id</admin_set_id> 
</metadata>

Acknowledgments

This implementation is based on Notch8's fork notch8/willow_sword of the original WillowSword gem created by Cottage Labs CottageLabs/willow_sword. The fork was necessary to modernize the gem for compatibility with Hyrax's Valkyrie resources, moving away from the previous ActiveFedora implementation.

@laritakr laritakr moved this to Client QA in PalsKnapsack Jan 31, 2025
@laritakr laritakr removed this from palni-palci Jan 31, 2025
@ctgraham
Copy link

ctgraham commented Feb 4, 2025

@kirkkwang , this looks good to me. When we settle on the metadata namespaces, we should reference the availability of DC metadata, and of Hyku/Hyrax specific metadata elements. We should also mention the assumptions we made in the service document about admin sets and user collections.

For the latter, something like:

The service document will deliver collection elements representing both Admin Sets and User Collections. The type child element will describe this with values of "AdminSet" and "UserCollection"

In the pittir tenant there are also collection types of "University of Pittsburgh ETDs", "Academic Units". Do these also count as User Collections, or something different?

If something is SWORD deposited without an Admin Set referenced in the metadata, I presume it becomes part of the Default Admin Set. Is that something we should name here, or is that well known enough that it doesn't bear repeating specifically in the SWORD context?

@kirkkwang
Copy link
Contributor

@ctgraham I've updated the documentation with (towards the top) new changes.

I went on the pittir tenant and didn't see the "University of Pittsburgh ETDs", "Academic Units" collections so I'm not exactly sure what they are.

You are correct, if no admin set id is supplied it should go to the Default Admin Set. I added something in the example xml to make that clear.

@ctgraham
Copy link

ctgraham commented Feb 6, 2025

"University of Pittsburgh ETDs", "Academic Units" are Collection Types, so the Collections would be created under these.

Image

@ShanaLMoore ShanaLMoore removed the needs rework issue needs additional work label Feb 10, 2025
@jillpe jillpe moved this from Client QA to Client Verification in PalsKnapsack Feb 17, 2025
@kirkkwang
Copy link
Contributor

@ctgraham So basically a User Collection is a type of Collection and University of Pittsburgh ETDs, Academic Units, etc are also a type of Collection.

@ctgraham
Copy link

So then, perhaps:

Admin Sets & User Collections

The service document will deliver collection
elements representing both Admin Sets and User Collections. The type
child element will describe this with values of "AdminSet" and
"Collection".

Type Description
AdminSet Represents an Admin Set in Hyrax, used for policy and workflow control.
Collection Represents a User Collection, created by end users to organize works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Client Verification
Development

No branches or pull requests

4 participants