Skip to content

Commit

Permalink
Merge branch 'develop' into 11075-ror #11075
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed Jan 13, 2025
2 parents f80b4a1 + 4373753 commit 8333b1b
Show file tree
Hide file tree
Showing 16 changed files with 197 additions and 27 deletions.
4 changes: 4 additions & 0 deletions doc/release-notes/10171-exlude-metadatablocks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional ``excludeMetadataBlocks`` parameter,
that specifies whether the metadataBlocks should be listed in the output. It defaults to ``false``, preserving backward
compatibility. (Note that for a dataset with a large number of versions and/or metadataBlocks having the metadata blocks
included can dramatically increase the volume of the output). See also [the guides](https://dataverse-guide--10778.org.readthedocs.build/en/10778/api/native-api.html#list-versions-of-a-dataset), #10778, and #10171.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Json Printer Bug fix

DatasetFieldTypes in MetadataBlock response that are also a child of another DatasetFieldType were being returned twice. The child DatasetFieldType was included in the "fields" object as well as in the "childFields" of it's parent DatasetFieldType. This fix suppresses the standalone object so only one instance of the DatasetFieldType is returned (in the "childFields" of its parent).
This fix changes the Json output of the API `/api/dataverses/{dataverseAlias}/metadatablocks`

## Backward Incompatible Changes

The Json response of API call `/api/dataverses/{dataverseAlias}/metadatablocks` will no longer include the DatasetFieldTypes in "fields" if they are children of another DatasetFieldType. The child DatasetFieldType will only be included in the "childFields" of it's parent DatasetFieldType.
3 changes: 3 additions & 0 deletions doc/release-notes/11107-fake-to-perma-demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### Demo/Eval Container Tutorial

The demo/eval container tutorial has been updated to use the Permalink PID provider instead of the FAKE DOI Provider. See also #11107.
5 changes: 5 additions & 0 deletions doc/release-notes/11113-avoid-orphan-perm-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
This release fixes a bug that caused Dataverse to generate unnecessary solr documents for files when a file is added/deleted from a draft dataset. These documents could accumulate and potentially impact performance.

Assuming the upgrade to solr 9.7.0 also occurs in this release, there's nothing else needed for this PR. (Starting with a new solr insures the solr db is empty and that a reindex is already required.)


8 changes: 8 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1295,6 +1295,8 @@ It returns a list of versions with their metadata, and file list:
The optional ``excludeFiles`` parameter specifies whether the files should be listed in the output. It defaults to ``true``, preserving backward compatibility. (Note that for a dataset with a large number of versions and/or files having the files included can dramatically increase the volume of the output). A separate ``/files`` API can be used for listing the files, or a subset thereof in a given version.

The optional ``excludeMetadataBlocks`` parameter specifies whether the metadata blocks should be listed in the output. It defaults to ``false``, preserving backward compatibility. (Note that for a dataset with a large number of versions and/or metadata blocks having the metadata blocks included can dramatically increase the volume of the output).

The optional ``offset`` and ``limit`` parameters can be used to specify the range of the versions list to be shown. This can be used to paginate through the list in a dataset with a large number of versions.


Expand All @@ -1319,6 +1321,12 @@ The fully expanded example above (without environment variables) looks like this
The optional ``excludeFiles`` parameter specifies whether the files should be listed in the output (defaults to ``true``). Note that a separate ``/files`` API can be used for listing the files, or a subset thereof in a given version.

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0?excludeMetadataBlocks=false"
The optional ``excludeMetadataBlocks`` parameter specifies whether the metadata blocks should be listed in the output (defaults to ``false``).


By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.

Expand Down
5 changes: 5 additions & 0 deletions doc/sphinx-guides/source/container/running/demo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,11 @@ Next, set up the UI toggle between English and French, again using the unblock k

Stop and start the Dataverse container in order for the language toggle to work.

PID Providers
+++++++++++++

Dataverse supports multiple Persistent ID (PID) providers. The ``compose.yml`` file uses the Permalink PID provider. Follow :ref:`pids-configuration` to reconfigure as needed.

Next Steps
----------

Expand Down
12 changes: 6 additions & 6 deletions docker/compose/demo/compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ services:
-Ddataverse.files.file1.type=file
-Ddataverse.files.file1.label=Filesystem
-Ddataverse.files.file1.directory=${STORAGE_DIR}/store
-Ddataverse.pid.providers=fake
-Ddataverse.pid.default-provider=fake
-Ddataverse.pid.fake.type=FAKE
-Ddataverse.pid.fake.label=FakeDOIProvider
-Ddataverse.pid.fake.authority=10.5072
-Ddataverse.pid.fake.shoulder=FK2/
-Ddataverse.pid.providers=perma1
-Ddataverse.pid.default-provider=perma1
-Ddataverse.pid.perma1.type=perma
-Ddataverse.pid.perma1.label=Perma1
-Ddataverse.pid.perma1.authority=DV
-Ddataverse.pid.perma1.permalink.separator=/
#-Ddataverse.lang.directory=/dv/lang
ports:
- "8080:8080" # HTTP (Dataverse Application)
Expand Down
8 changes: 8 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DataFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -1142,4 +1142,12 @@ public boolean isDeaccessioned() {
}
return inDeaccessionedVersions; // since any published version would have already returned
}
public boolean isInDatasetVersion(DatasetVersion version) {
for (FileMetadata fmd : getFileMetadatas()) {
if (fmd.getDatasetVersion().equals(version)) {
return true;
}
}
return false;
}
} // end of class
11 changes: 7 additions & 4 deletions src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -421,15 +421,16 @@ public Response useDefaultCitationDate(@Context ContainerRequestContext crc, @Pa
@GET
@AuthRequired
@Path("{id}/versions")
public Response listVersions(@Context ContainerRequestContext crc, @PathParam("id") String id, @QueryParam("excludeFiles") Boolean excludeFiles, @QueryParam("limit") Integer limit, @QueryParam("offset") Integer offset) {
public Response listVersions(@Context ContainerRequestContext crc, @PathParam("id") String id, @QueryParam("excludeFiles") Boolean excludeFiles,@QueryParam("excludeMetadataBlocks") Boolean excludeMetadataBlocks, @QueryParam("limit") Integer limit, @QueryParam("offset") Integer offset) {

return response( req -> {
Dataset dataset = findDatasetOrDie(id);
Boolean deepLookup = excludeFiles == null ? true : !excludeFiles;
Boolean includeMetadataBlocks = excludeMetadataBlocks == null ? true : !excludeMetadataBlocks;

return ok( execCommand( new ListVersionsCommand(req, dataset, offset, limit, deepLookup) )
.stream()
.map( d -> json(d, deepLookup) )
.map( d -> json(d, deepLookup, includeMetadataBlocks) )
.collect(toJsonArray()));
}, getRequestUser(crc));
}
Expand All @@ -441,6 +442,7 @@ public Response getVersion(@Context ContainerRequestContext crc,
@PathParam("id") String datasetId,
@PathParam("versionId") String versionId,
@QueryParam("excludeFiles") Boolean excludeFiles,
@QueryParam("excludeMetadataBlocks") Boolean excludeMetadataBlocks,
@QueryParam("includeDeaccessioned") boolean includeDeaccessioned,
@QueryParam("returnOwners") boolean returnOwners,
@Context UriInfo uriInfo,
Expand All @@ -466,11 +468,12 @@ public Response getVersion(@Context ContainerRequestContext crc,
if (excludeFiles == null ? true : !excludeFiles) {
requestedDatasetVersion = datasetversionService.findDeep(requestedDatasetVersion.getId());
}
Boolean includeMetadataBlocks = excludeMetadataBlocks == null ? true : !excludeMetadataBlocks;

JsonObjectBuilder jsonBuilder = json(requestedDatasetVersion,
null,
excludeFiles == null ? true : !excludeFiles,
returnOwners);
excludeFiles == null ? true : !excludeFiles,
returnOwners, includeMetadataBlocks);
return ok(jsonBuilder);

}, getRequestUser(crc));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,15 @@ private List<DvObjectSolrDoc> constructDatafileSolrDocs(DataFile dataFile, Map<L
Map<DatasetVersion.VersionState, Boolean> desiredCards = searchPermissionsService.getDesiredCards(dataFile.getOwner());
for (DatasetVersion datasetVersionFileIsAttachedTo : datasetVersionsToBuildCardsFor(dataFile.getOwner())) {
boolean cardShouldExist = desiredCards.get(datasetVersionFileIsAttachedTo.getVersionState());
if (cardShouldExist) {
/*
* Since datasetVersionFileIsAttachedTo should be a draft or the most recent
* released one, it could be more efficient to stop the search through
* FileMetadatas after those two (versus continuing through all prior versions
* as in isInDatasetVersion). Alternately, perhaps filesToReIndexPermissionsFor
* should not combine the list of files for the different datsetversions into a
* single list to start with.
*/
if (cardShouldExist && dataFile.isInDatasetVersion(datasetVersionFileIsAttachedTo)) {
String solrIdStart = IndexServiceBean.solrDocIdentifierFile + dataFile.getId();
String solrIdEnd = getDatasetOrDataFileSolrEnding(datasetVersionFileIsAttachedTo.getVersionState());
String solrId = solrIdStart + solrIdEnd;
Expand Down
47 changes: 39 additions & 8 deletions src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java
Original file line number Diff line number Diff line change
Expand Up @@ -423,11 +423,17 @@ public static JsonObjectBuilder json(FileDetailsHolder ds) {
}

public static JsonObjectBuilder json(DatasetVersion dsv, boolean includeFiles) {
return json(dsv, null, includeFiles, false);
return json(dsv, null, includeFiles, false,true);
}
public static JsonObjectBuilder json(DatasetVersion dsv, boolean includeFiles, boolean includeMetadataBlocks) {
return json(dsv, null, includeFiles, false, includeMetadataBlocks);
}
public static JsonObjectBuilder json(DatasetVersion dsv, List<String> anonymizedFieldTypeNamesList,
boolean includeFiles, boolean returnOwners) {
return json( dsv, anonymizedFieldTypeNamesList, includeFiles, returnOwners,true);
}

public static JsonObjectBuilder json(DatasetVersion dsv, List<String> anonymizedFieldTypeNamesList,
boolean includeFiles, boolean returnOwners) {
boolean includeFiles, boolean returnOwners, boolean includeMetadataBlocks) {
Dataset dataset = dsv.getDataset();
JsonObjectBuilder bld = jsonObjectBuilder()
.add("id", dsv.getId()).add("datasetId", dataset.getId())
Expand Down Expand Up @@ -472,11 +478,12 @@ public static JsonObjectBuilder json(DatasetVersion dsv, List<String> anonymized
.add("sizeOfCollection", dsv.getTermsOfUseAndAccess().getSizeOfCollection())
.add("studyCompletion", dsv.getTermsOfUseAndAccess().getStudyCompletion())
.add("fileAccessRequest", dsv.getTermsOfUseAndAccess().isFileAccessRequest());

bld.add("metadataBlocks", (anonymizedFieldTypeNamesList != null) ?
jsonByBlocks(dsv.getDatasetFields(), anonymizedFieldTypeNamesList)
: jsonByBlocks(dsv.getDatasetFields())
);
if(includeMetadataBlocks) {
bld.add("metadataBlocks", (anonymizedFieldTypeNamesList != null) ?
jsonByBlocks(dsv.getDatasetFields(), anonymizedFieldTypeNamesList)
: jsonByBlocks(dsv.getDatasetFields())
);
}
if(returnOwners){
bld.add("isPartOf", getOwnersFromDvObject(dataset));
}
Expand Down Expand Up @@ -643,6 +650,19 @@ public static JsonObjectBuilder json(MetadataBlock metadataBlock, boolean printO
.add("displayName", metadataBlock.getDisplayName())
.add("displayOnCreate", metadataBlock.isDisplayOnCreate());

List<DatasetFieldType> datasetFieldTypesList;

if (ownerDataverse != null) {
datasetFieldTypesList = datasetFieldService.findAllInMetadataBlockAndDataverse(
metadataBlock, ownerDataverse, printOnlyDisplayedOnCreateDatasetFieldTypes);
} else {
datasetFieldTypesList = printOnlyDisplayedOnCreateDatasetFieldTypes
? datasetFieldService.findAllDisplayedOnCreateInMetadataBlock(metadataBlock)
: metadataBlock.getDatasetFieldTypes();
}

Set<DatasetFieldType> datasetFieldTypes = filterOutDuplicateDatasetFieldTypes(datasetFieldTypesList);

JsonObjectBuilder fieldsBuilder = Json.createObjectBuilder();

Predicate<DatasetFieldType> isNoChild = element -> element.isChild() == false;
Expand Down Expand Up @@ -672,6 +692,17 @@ public static JsonObjectBuilder json(MetadataBlock metadataBlock, boolean printO
return jsonObjectBuilder;
}

// This will remove datasetFieldTypes that are in the list but also a child of another datasetFieldType in the list
// Prevents duplicate datasetFieldType information from being returned twice
// See: https://github.com/IQSS/dataverse/issues/10472
private static Set<DatasetFieldType> filterOutDuplicateDatasetFieldTypes(List<DatasetFieldType> datasetFieldTypesList) {
// making a copy of the list as to not damage the original when we remove items
List<DatasetFieldType> datasetFieldTypes = new ArrayList<>(datasetFieldTypesList);
// exclude/remove datasetFieldTypes if datasetFieldType exists as a child of another datasetFieldType
datasetFieldTypesList.forEach(dsft -> dsft.getChildDatasetFieldTypes().forEach(c -> datasetFieldTypes.remove(c)));
return new TreeSet<>(datasetFieldTypes);
}

public static JsonArrayBuilder jsonDatasetFieldTypes(List<DatasetFieldType> fields) {
JsonArrayBuilder fieldsJson = Json.createArrayBuilder();
for (DatasetFieldType field : fields) {
Expand Down
36 changes: 36 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/DatasetsIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -731,6 +731,42 @@ public void testCreatePublishDestroyDataset() {

}

@Test
public void testHideMetadataBlocksInDatasetVersionsAPI() {

// Create user
String apiToken = UtilIT.createRandomUserGetToken();

// Create user with no permission
String apiTokenNoPerms = UtilIT.createRandomUserGetToken();

// Create Collection
String collectionAlias = UtilIT.createRandomCollectionGetAlias(apiToken);

// Create Dataset
Response createDataset = UtilIT.createRandomDatasetViaNativeApi(collectionAlias, apiToken);
createDataset.then().assertThat()
.statusCode(CREATED.getStatusCode());

Integer datasetId = UtilIT.getDatasetIdFromResponse(createDataset);
String datasetPid = JsonPath.from(createDataset.asString()).getString("data.persistentId");

// Now check that the metadata is NOT shown, when we ask the versions api to dos o.
boolean excludeMetadata = true;
Response unpublishedDraft = UtilIT.getDatasetVersion(datasetPid, DS_VERSION_DRAFT, apiToken, true,excludeMetadata, false);
unpublishedDraft.prettyPrint();
unpublishedDraft.then().assertThat()
.statusCode(OK.getStatusCode())
.body("data.metadataBlocks", equalTo(null));

// Now check that the metadata is shown, when we ask the versions api to dos o.
excludeMetadata = false;
unpublishedDraft = UtilIT.getDatasetVersion(datasetPid, DS_VERSION_DRAFT, apiToken,true, excludeMetadata, false);
unpublishedDraft.prettyPrint();
unpublishedDraft.then().assertThat()
.statusCode(OK.getStatusCode())
.body("data.metadataBlocks", notNullValue() );
}
/**
* The apis (/api/datasets/{id}/versions and /api/datasets/{id}/versions/{vid}
* are already called from other RestAssured tests, in this class and also in FilesIT.
Expand Down
7 changes: 3 additions & 4 deletions src/test/java/edu/harvard/iq/dataverse/api/DataversesIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,7 @@ public void testListMetadataBlocks() {
.body("data.size()", equalTo(1))
.body("data[0].name", is("citation"))
.body("data[0].fields.title.displayOnCreate", equalTo(true))
.body("data[0].fields.size()", is(10))
.body("data[0].fields.size()", is(10)) // 28 - 18 child duplicates
.body("data[0].fields.author.childFields.size()", is(4));

Response setMetadataBlocksResponse = UtilIT.setMetadataBlocks(dataverseAlias, Json.createArrayBuilder().add("citation").add("astrophysics"), apiToken);
Expand Down Expand Up @@ -1008,14 +1008,13 @@ public void testListMetadataBlocks() {
// Since the included property of notesText is set to false, we should retrieve the total number of fields minus one
int citationMetadataBlockIndex = geospatialMetadataBlockIndex == 0 ? 1 : 0;
listMetadataBlocksResponse.then().assertThat()
.body(String.format("data[%d].fields.size()", citationMetadataBlockIndex), equalTo(34));
.body(String.format("data[%d].fields.size()", citationMetadataBlockIndex), equalTo(34)); // 79 minus 45 child duplicates

// Since the included property of geographicCoverage is set to false, we should retrieve the total number of fields minus one
listMetadataBlocksResponse.then().assertThat()
.body(String.format("data[%d].fields.size()", geospatialMetadataBlockIndex), equalTo(2));

listMetadataBlocksResponse = UtilIT.getMetadataBlock("geospatial");

listMetadataBlocksResponse = UtilIT.getMetadataBlock("geospatial");
String actualGeospatialMetadataField1 = listMetadataBlocksResponse.then().extract().path(String.format("data.fields['geographicCoverage'].name"));
String actualGeospatialMetadataField2 = listMetadataBlocksResponse.then().extract().path(String.format("data.fields['geographicCoverage'].childFields['country'].name"));
String actualGeospatialMetadataField3 = listMetadataBlocksResponse.then().extract().path(String.format("data.fields['geographicCoverage'].childFields['city'].name"));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,7 @@ void testListMetadataBlocks() {

// returnDatasetFieldTypes=true
listMetadataBlocksResponse = UtilIT.listMetadataBlocks(false, true);
int expectedNumberOfMetadataFields = 35;
listMetadataBlocksResponse.prettyPrint();
int expectedNumberOfMetadataFields = 35; // 80 - 45 child duplicates;
listMetadataBlocksResponse.then().assertThat()
.statusCode(OK.getStatusCode())
.body("data[0].fields", not(equalTo(null)))
Expand All @@ -57,7 +56,7 @@ void testListMetadataBlocks() {
// onlyDisplayedOnCreate=true and returnDatasetFieldTypes=true
listMetadataBlocksResponse = UtilIT.listMetadataBlocks(true, true);
listMetadataBlocksResponse.prettyPrint();
expectedNumberOfMetadataFields = 10;
expectedNumberOfMetadataFields = 10; // 28 - 18 child duplicates
listMetadataBlocksResponse.then().assertThat()
.statusCode(OK.getStatusCode())
.body("data[0].fields", not(equalTo(null)))
Expand Down
Loading

0 comments on commit 8333b1b

Please sign in to comment.