-
Notifications
You must be signed in to change notification settings - Fork 501
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
merged some conflicts from develop #3921
- Loading branch information
Showing
8 changed files
with
445 additions
and
0 deletions.
There are no files selected for viewing
8 changes: 8 additions & 0 deletions
8
doc/sphinx-guides/source/_static/installation/files/root/auth-providers/orcid-sandbox.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"id":"orcid-sandbox", | ||
"factoryAlias":"oauth2", | ||
"title":"ORCID", | ||
"subtitle":"", | ||
"factoryData":"type: orcid | userEndpoint: https://api.sandbox.orcid.org/v1.2/{ORCID}/orcid-profile | clientId: FIXME | clientSecret: FIXME", | ||
"enabled":true | ||
} |
5 changes: 5 additions & 0 deletions
5
...es/source/_static/installation/files/root/big-data-support/checksumValidationSuccess.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"status": "validation passed", | ||
"uploadFolder": "DNXV2H", | ||
"totalSize": 1234567890 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
Big Data Support | ||
================ | ||
|
||
Big data support is highly experimental. Eventually this content will move to the Installation Guide. | ||
|
||
.. contents:: |toctitle| | ||
:local: | ||
|
||
Various components need to be installed and configured for big data support. | ||
|
||
Data Capture Module (DCM) | ||
------------------------- | ||
|
||
Data Capture Module (DCM) is an experimental component that allows users to upload large datasets via rsync over ssh. | ||
|
||
Install a DCM | ||
~~~~~~~~~~~~~ | ||
|
||
Installation instructions can be found at https://github.com/sbgrid/data-capture-module . Note that a shared filesystem between Dataverse and your DCM is required. You cannot use a DCM with non-filesystem storage options such as Swift. | ||
|
||
Once you have installed a DCM, you will need to configure two database settings on the Dataverse side. These settings are documented in the :doc:`/installation/config` section of the Installation Guide: | ||
|
||
- ``:DataCaptureModuleUrl`` should be set to the URL of a DCM you installed. | ||
- ``:UploadMethods`` should be set to ``dcm/rsync+ssh``. | ||
|
||
This will allow your Dataverse installation to communicate with your DCM, so that Dataverse can download rsync scripts for your users. | ||
|
||
Downloading rsync scripts via Dataverse API | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The rsync script can be downloaded from Dataverse via API using an authorized API token. In the curl example below, substitute ``$PERSISTENT_ID`` with a DOI or Handle: | ||
|
||
``curl -H "X-Dataverse-key: $API_TOKEN" $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/rsync?persistentId=$PERSISTENT_ID`` | ||
|
||
How a DCM reports checksum success or failure to Dataverse | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Once the user uploads files to a DCM, that DCM will perform checksum validation and report to Dataverse the results of that validation. The DCM must be configured to pass the API token of a superuser. The implementation details, which are subject to change, are below. | ||
|
||
The JSON that a DCM sends to Dataverse on successful checksum validation looks something like the contents of :download:`checksumValidationSuccess.json <../_static/installation/files/root/big-data-support/checksumValidationSuccess.json>` below: | ||
|
||
.. literalinclude:: ../_static/installation/files/root/big-data-support/checksumValidationSuccess.json | ||
:language: json | ||
|
||
- ``status`` - The valid strings to send are ``validation passed`` and ``validation failed``. | ||
- ``uploadFolder`` - This is the directory on disk where Dataverse should attempt to find the files that a DCM has moved into place. There should always be a ``files.sha`` file and a least one data file. ``files.sha`` is a manifest of all the data files and their checksums. The ``uploadFolder`` directory is inside the directory where data is stored for the dataset and may have the same name as the "identifier" of the persistent id (DOI or Handle). For example, you would send ``"uploadFolder": "DNXV2H"`` in the JSON file when the absolute path to this directory is ``/usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/DNXV2H/DNXV2H``. | ||
- ``totalSize`` - Dataverse will use this value to represent the total size in bytes of all the files in the "package" that's created. If 360 data files and one ``files.sha`` manifest file are in the ``uploadFolder``, this value is the sum of the 360 data files. | ||
|
||
|
||
Here's the syntax for sending the JSON. | ||
|
||
``curl -H "X-Dataverse-key: $API_TOKEN" -X POST -H 'Content-type: application/json' --upload-file checksumValidationSuccess.json $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/checksumValidation?persistentId=$PERSISTENT_ID`` | ||
|
||
Troubleshooting | ||
~~~~~~~~~~~~~~~ | ||
|
||
The following low level command should only be used when troubleshooting the "import" code a DCM uses but is documented here for completeness. | ||
|
||
``curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$DV_BASE_URL/api/batch/jobs/import/datasets/files/$DATASET_DB_ID?uploadFolder=$UPLOAD_FOLDER&totalSize=$TOTAL_SIZE"`` | ||
|
||
Repository Storage Abstraction Layer (RSAL) | ||
------------------------------------------- | ||
|
||
For now, please see https://github.com/sbgrid/rsal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
{ | ||
"datasetVersion": { | ||
"metadataBlocks": { | ||
"citation": { | ||
"fields": [ | ||
{ | ||
"value": "HTML & More", | ||
"typeClass": "primitive", | ||
"multiple": false, | ||
"typeName": "title" | ||
}, | ||
{ | ||
"value": [ | ||
{ | ||
"authorName": { | ||
"value": "Markup, Marty", | ||
"typeClass": "primitive", | ||
"multiple": false, | ||
"typeName": "authorName" | ||
}, | ||
"authorAffiliation": { | ||
"value": "W4C", | ||
"typeClass": "primitive", | ||
"multiple": false, | ||
"typeName": "authorAffiliation" | ||
} | ||
} | ||
], | ||
"typeClass": "compound", | ||
"multiple": true, | ||
"typeName": "author" | ||
}, | ||
{ | ||
"value": [ | ||
{ | ||
"datasetContactEmail": { | ||
"typeClass": "primitive", | ||
"multiple": false, | ||
"typeName": "datasetContactEmail", | ||
"value": "[email protected]" | ||
}, | ||
"datasetContactName": { | ||
"typeClass": "primitive", | ||
"multiple": false, | ||
"typeName": "datasetContactName", | ||
"value": "Markup, Marty" | ||
} | ||
} | ||
], | ||
"typeClass": "compound", | ||
"multiple": true, | ||
"typeName": "datasetContact" | ||
}, | ||
{ | ||
"value": [ | ||
{ | ||
"dsDescriptionValue": { | ||
"value": "BEGIN<br></br>END", | ||
"multiple": false, | ||
"typeClass": "primitive", | ||
"typeName": "dsDescriptionValue" | ||
} | ||
} | ||
], | ||
"typeClass": "compound", | ||
"multiple": true, | ||
"typeName": "dsDescription" | ||
}, | ||
{ | ||
"value": [ | ||
"Medicine, Health and Life Sciences" | ||
], | ||
"typeClass": "controlledVocabulary", | ||
"multiple": true, | ||
"typeName": "subject" | ||
} | ||
], | ||
"displayName": "Citation Metadata" | ||
} | ||
} | ||
} | ||
} |
164 changes: 164 additions & 0 deletions
164
src/main/java/edu/harvard/iq/dataverse/engine/command/impl/ImportFromFileSystemCommand.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
package edu.harvard.iq.dataverse.engine.command.impl; | ||
|
||
import com.google.common.base.Strings; | ||
import edu.harvard.iq.dataverse.Dataset; | ||
import edu.harvard.iq.dataverse.DatasetVersion; | ||
import edu.harvard.iq.dataverse.authorization.Permission; | ||
import edu.harvard.iq.dataverse.batch.jobs.importer.ImportMode; | ||
import edu.harvard.iq.dataverse.batch.jobs.importer.filesystem.FileRecordWriter; | ||
import edu.harvard.iq.dataverse.engine.command.AbstractCommand; | ||
import edu.harvard.iq.dataverse.engine.command.CommandContext; | ||
import edu.harvard.iq.dataverse.engine.command.DataverseRequest; | ||
import edu.harvard.iq.dataverse.engine.command.RequiredPermissions; | ||
import edu.harvard.iq.dataverse.engine.command.exception.CommandException; | ||
import edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException; | ||
import static edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder.jsonObjectBuilder; | ||
import java.io.File; | ||
import java.util.Properties; | ||
import java.util.logging.Level; | ||
import java.util.logging.Logger; | ||
import javax.batch.operations.JobOperator; | ||
import javax.batch.operations.JobSecurityException; | ||
import javax.batch.operations.JobStartException; | ||
import javax.batch.runtime.BatchRuntime; | ||
import javax.json.JsonObject; | ||
import javax.json.JsonObjectBuilder; | ||
|
||
@RequiredPermissions(Permission.EditDataset) | ||
public class ImportFromFileSystemCommand extends AbstractCommand<JsonObject> { | ||
|
||
private static final Logger logger = Logger.getLogger(ImportFromFileSystemCommand.class.getName()); | ||
|
||
final Dataset dataset; | ||
final String uploadFolder; | ||
final Long totalSize; | ||
final String mode; | ||
final ImportMode importMode; | ||
|
||
public ImportFromFileSystemCommand(DataverseRequest aRequest, Dataset theDataset, String theUploadFolder, Long theTotalSize, ImportMode theImportMode) { | ||
super(aRequest, theDataset); | ||
dataset = theDataset; | ||
uploadFolder = theUploadFolder; | ||
totalSize = theTotalSize; | ||
importMode = theImportMode; | ||
mode = theImportMode.toString(); | ||
} | ||
|
||
@Override | ||
public JsonObject execute(CommandContext ctxt) throws CommandException { | ||
JsonObjectBuilder bld = jsonObjectBuilder(); | ||
/** | ||
* batch import as-individual-datafiles is disabled in this iteration; | ||
* only the import-as-a-package is allowed. -- L.A. Feb 2 2017 | ||
*/ | ||
String fileMode = FileRecordWriter.FILE_MODE_PACKAGE_FILE; | ||
try { | ||
/** | ||
* Current constraints: 1. only supports merge and replace mode 2. | ||
* valid dataset 3. valid dataset directory 4. valid user & user has | ||
* edit dataset permission 5. only one dataset version 6. dataset | ||
* version is draft | ||
*/ | ||
if (!mode.equalsIgnoreCase("MERGE") && !mode.equalsIgnoreCase("REPLACE")) { | ||
String error = "Import mode: " + mode + " is not currently supported."; | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
if (!fileMode.equals(FileRecordWriter.FILE_MODE_INDIVIDUAL_FILES) && !fileMode.equals(FileRecordWriter.FILE_MODE_PACKAGE_FILE)) { | ||
String error = "File import mode: " + fileMode + " is not supported."; | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
File directory = new File(System.getProperty("dataverse.files.directory") | ||
+ File.separator + dataset.getAuthority() + File.separator + dataset.getIdentifier()); | ||
if (!isValidDirectory(directory)) { | ||
String error = "Dataset directory is invalid. " + directory; | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
|
||
if (Strings.isNullOrEmpty(uploadFolder)) { | ||
String error = "No uploadFolder specified"; | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
|
||
File uploadDirectory = new File(System.getProperty("dataverse.files.directory") | ||
+ File.separator + dataset.getAuthority() + File.separator + dataset.getIdentifier() | ||
+ File.separator + uploadFolder); | ||
if (!isValidDirectory(uploadDirectory)) { | ||
String error = "Upload folder is not a valid directory."; | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
|
||
if (dataset.getVersions().size() != 1) { | ||
String error = "Error creating FilesystemImportJob with dataset with ID: " + dataset.getId() + " - Dataset has more than one version."; | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
|
||
if (dataset.getLatestVersion().getVersionState() != DatasetVersion.VersionState.DRAFT) { | ||
String error = "Error creating FilesystemImportJob with dataset with ID: " + dataset.getId() + " - Dataset isn't in DRAFT mode."; | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
|
||
try { | ||
long jid; | ||
Properties props = new Properties(); | ||
props.setProperty("datasetId", dataset.getId().toString()); | ||
props.setProperty("userId", getUser().getIdentifier().replace("@", "")); | ||
props.setProperty("mode", mode); | ||
props.setProperty("fileMode", fileMode); | ||
props.setProperty("uploadFolder", uploadFolder); | ||
if (totalSize != null && totalSize > 0) { | ||
props.setProperty("totalSize", totalSize.toString()); | ||
} | ||
JobOperator jo = BatchRuntime.getJobOperator(); | ||
jid = jo.start("FileSystemImportJob", props); | ||
if (jid > 0) { | ||
bld.add("executionId", jid).add("message", "FileSystemImportJob in progress"); | ||
return bld.build(); | ||
} else { | ||
String error = "Error creating FilesystemImportJob with dataset with ID: " + dataset.getId(); | ||
logger.info(error); | ||
throw new CommandException(error, this); | ||
} | ||
|
||
} catch (JobStartException | JobSecurityException ex) { | ||
String error = "Error creating FilesystemImportJob with dataset with ID: " + dataset.getId() + " - " + ex.getMessage(); | ||
logger.info(error); | ||
throw new IllegalCommandException(error, this); | ||
} | ||
|
||
} catch (Exception e) { | ||
bld.add("message", "Import Exception - " + e.getMessage()); | ||
return bld.build(); | ||
} | ||
} | ||
|
||
/** | ||
* Make sure the directory path is truly a directory, exists and we can read | ||
* it. | ||
* | ||
* @return isValid | ||
*/ | ||
private boolean isValidDirectory(File directory) { | ||
String path = directory.getAbsolutePath(); | ||
if (!directory.exists()) { | ||
logger.log(Level.SEVERE, "Directory " + path + " does not exist."); | ||
return false; | ||
} | ||
if (!directory.isDirectory()) { | ||
logger.log(Level.SEVERE, path + " is not a directory."); | ||
return false; | ||
} | ||
if (!directory.canRead()) { | ||
logger.log(Level.SEVERE, "Unable to read files from directory " + path + ". Permission denied."); | ||
return false; | ||
} | ||
return true; | ||
} | ||
|
||
} |
Oops, something went wrong.