Skip to content

Commit

Permalink
Merge remote-tracking branch 'IQSS/develop' into 9481-pdf-codebook
Browse files Browse the repository at this point in the history
  • Loading branch information
qqmyers committed Jun 25, 2024
2 parents f73c5f8 + 3dccdb7 commit b3ace02
Show file tree
Hide file tree
Showing 527 changed files with 18,659 additions and 7,899 deletions.
3 changes: 2 additions & 1 deletion .env
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
APP_IMAGE=gdcc/dataverse:unstable
POSTGRES_VERSION=13
POSTGRES_VERSION=16
DATAVERSE_DB_USER=dataverse
SOLR_VERSION=9.3.0
SKIP_DEPLOY=0
101 changes: 101 additions & 0 deletions .github/workflows/maven_cache_management.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
name: Maven Cache Management

on:
# Every push to develop should trigger cache rejuvenation (dependencies might have changed)
push:
branches:
- develop
# According to https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
# all caches are deleted after 7 days of no access. Make sure we rejuvenate every 7 days to keep it available.
schedule:
- cron: '23 2 * * 0' # Run for 'develop' every Sunday at 02:23 UTC (3:23 CET, 21:23 ET)
# Enable manual cache management
workflow_dispatch:
# Delete branch caches once a PR is merged
pull_request:
types:
- closed

env:
COMMON_CACHE_KEY: "dataverse-maven-cache"
COMMON_CACHE_PATH: "~/.m2/repository"

jobs:
seed:
name: Drop and Re-Seed Local Repository
runs-on: ubuntu-latest
if: ${{ github.event_name != 'pull_request' }}
permissions:
# Write permission needed to delete caches
# See also: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-a-github-actions-cache-for-a-repository-using-a-cache-id
actions: write
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Determine Java version from Parent POM
run: echo "JAVA_VERSION=$(grep '<target.java.version>' modules/dataverse-parent/pom.xml | cut -f2 -d'>' | cut -f1 -d'<')" >> ${GITHUB_ENV}
- name: Set up JDK ${{ env.JAVA_VERSION }}
uses: actions/setup-java@v4
with:
java-version: ${{ env.JAVA_VERSION }}
distribution: temurin
- name: Seed common cache
run: |
mvn -B -f modules/dataverse-parent dependency:go-offline dependency:resolve-plugins
# This non-obvious order is due to the fact that the download via Maven above will take a very long time (7-8 min).
# Jobs should not be left without a cache. Deleting and saving in one go leaves only a small chance for a cache miss.
- name: Drop common cache
run: |
gh extension install actions/gh-actions-cache
echo "🛒 Fetching list of cache keys"
cacheKeys=$(gh actions-cache list -R ${{ github.repository }} -B develop | cut -f 1 )
## Setting this to not fail the workflow while deleting cache keys.
set +e
echo "🗑️ Deleting caches..."
for cacheKey in $cacheKeys
do
gh actions-cache delete $cacheKey -R ${{ github.repository }} -B develop --confirm
done
echo "✅ Done"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Save the common cache
uses: actions/cache@v4
with:
path: ${{ env.COMMON_CACHE_PATH }}
key: ${{ env.COMMON_CACHE_KEY }}
enableCrossOsArchive: true

# Let's delete feature branch caches once their PR is merged - we only have 10 GB of space before eviction kicks in
deplete:
name: Deplete feature branch caches
runs-on: ubuntu-latest
if: ${{ github.event_name == 'pull_request' }}
permissions:
# `actions:write` permission is required to delete caches
# See also: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-a-github-actions-cache-for-a-repository-using-a-cache-id
actions: write
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Cleanup caches
run: |
gh extension install actions/gh-actions-cache
BRANCH=refs/pull/${{ github.event.pull_request.number }}/merge
echo "🛒 Fetching list of cache keys"
cacheKeysForPR=$(gh actions-cache list -R ${{ github.repository }} -B $BRANCH | cut -f 1 )
## Setting this to not fail the workflow while deleting cache keys.
set +e
echo "🗑️ Deleting caches..."
for cacheKey in $cacheKeysForPR
do
gh actions-cache delete $cacheKey -R ${{ github.repository }} -B $BRANCH --confirm
done
echo "✅ Done"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
2 changes: 2 additions & 0 deletions .github/workflows/maven_unit_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ on:
push:
paths:
- "**.java"
- "**.sql"
- "pom.xml"
- "modules/**/pom.xml"
- "!modules/container-base/**"
- "!modules/dataverse-spi/**"
pull_request:
paths:
- "**.java"
- "**.sql"
- "pom.xml"
- "modules/**/pom.xml"
- "!modules/container-base/**"
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ oauth-credentials.md
/src/main/webapp/oauth2/newAccount.html
scripts/api/setup-all.sh*
scripts/api/setup-all.*.log
src/main/resources/edu/harvard/iq/dataverse/openapi/

# ctags generated tag file
tags
Expand Down Expand Up @@ -61,3 +62,4 @@ src/main/webapp/resources/images/dataverseproject.png.thumb140

# Docker development volumes
/docker-dev-volumes
/.vs
1 change: 0 additions & 1 deletion Dockerfile

This file was deleted.

6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ Dataverse&#174;

Dataverse is an [open source][] software platform for sharing, finding, citing, and preserving research data (developed by the [Dataverse team](https://dataverse.org/about) at the [Institute for Quantitative Social Science](https://iq.harvard.edu/) and the [Dataverse community][]).

[dataverse.org][] is our home on the web and shows a map of Dataverse installations around the world, a list of [features][], [integrations][] that have been made possible through [REST APIs][], our development [roadmap][], and more.
[dataverse.org][] is our home on the web and shows a map of Dataverse installations around the world, a list of [features][], [integrations][] that have been made possible through [REST APIs][], our [project board][], our development [roadmap][], and more.

We maintain a demo site at [demo.dataverse.org][] which you are welcome to use for testing and evaluating Dataverse.

To install Dataverse, please see our [Installation Guide][] which will prompt you to download our [latest release][].
To install Dataverse, please see our [Installation Guide][] which will prompt you to download our [latest release][]. Docker users should consult the [Container Guide][].

To discuss Dataverse with the community, please join our [mailing list][], participate in a [community call][], chat with us at [chat.dataverse.org][], or attend our annual [Dataverse Community Meeting][].

Expand All @@ -28,7 +28,9 @@ Dataverse is a trademark of President and Fellows of Harvard College and is regi
[Dataverse community]: https://dataverse.org/developers
[Installation Guide]: https://guides.dataverse.org/en/latest/installation/index.html
[latest release]: https://github.com/IQSS/dataverse/releases
[Container Guide]: https://guides.dataverse.org/en/latest/container/index.html
[features]: https://dataverse.org/software-features
[project board]: https://github.com/orgs/IQSS/projects/34
[roadmap]: https://www.iq.harvard.edu/roadmap-dataverse-project
[integrations]: https://dataverse.org/integrations
[REST APIs]: https://guides.dataverse.org/en/latest/api/index.html
Expand Down
12 changes: 12 additions & 0 deletions conf/proxy/Caddyfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# This configuration is intended to be used with Caddy, a very small high perf proxy.
# It will serve the application containers Payara Admin GUI via HTTP instead of HTTPS,
# avoiding the trouble of self signed certificates for local development.

:4848 {
reverse_proxy https://dataverse:4848 {
transport http {
tls_insecure_skip_verify
}
header_down Location "^https://" "http://"
}
}
15 changes: 10 additions & 5 deletions conf/solr/9.3.0/schema.xml → conf/solr/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,8 @@
<field name="publicationStatus" type="string" stored="true" indexed="true" multiValued="true"/>
<field name="externalStatus" type="string" stored="true" indexed="true" multiValued="false"/>
<field name="embargoEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="retentionEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="subtreePaths" type="string" stored="true" indexed="true" multiValued="true"/>

<field name="fileName" type="text_en" stored="true" indexed="true" multiValued="true"/>
Expand Down Expand Up @@ -229,6 +230,8 @@

<!-- incomplete datasets issue 8822 -->
<field name="datasetValid" type="boolean" stored="true" indexed="true" multiValued="false"/>

<field name="license" type="string" stored="true" indexed="true" multiValued="false"/>

<!--
METADATA SCHEMA FIELDS
Expand Down Expand Up @@ -323,11 +326,12 @@
<field name="journalVolumeIssue" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="keyword" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="keywordValue" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="keywordTermURI" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="keywordVocabulary" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="keywordVocabularyURI" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="kindOfData" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="language" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="northLongitude" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="northLatitude" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="notesText" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="originOfSources" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="otherDataAppraisal" type="text_en" multiValued="false" stored="true" indexed="true"/>
Expand Down Expand Up @@ -370,7 +374,7 @@
<field name="software" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="softwareName" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="softwareVersion" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="southLongitude" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="southLatitude" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="state" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="studyAssayCellType" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="studyAssayMeasurementType" type="text_en" multiValued="true" stored="true" indexed="true"/>
Expand Down Expand Up @@ -562,11 +566,12 @@
<copyField source="journalVolumeIssue" dest="_text_" maxChars="3000"/>
<copyField source="keyword" dest="_text_" maxChars="3000"/>
<copyField source="keywordValue" dest="_text_" maxChars="3000"/>
<copyField source="keywordTermURI" dest="_text_" maxChars="3000"/>
<copyField source="keywordVocabulary" dest="_text_" maxChars="3000"/>
<copyField source="keywordVocabularyURI" dest="_text_" maxChars="3000"/>
<copyField source="kindOfData" dest="_text_" maxChars="3000"/>
<copyField source="language" dest="_text_" maxChars="3000"/>
<copyField source="northLongitude" dest="_text_" maxChars="3000"/>
<copyField source="northLatitude" dest="_text_" maxChars="3000"/>
<copyField source="notesText" dest="_text_" maxChars="3000"/>
<copyField source="originOfSources" dest="_text_" maxChars="3000"/>
<copyField source="otherDataAppraisal" dest="_text_" maxChars="3000"/>
Expand Down Expand Up @@ -609,7 +614,7 @@
<copyField source="software" dest="_text_" maxChars="3000"/>
<copyField source="softwareName" dest="_text_" maxChars="3000"/>
<copyField source="softwareVersion" dest="_text_" maxChars="3000"/>
<copyField source="southLongitude" dest="_text_" maxChars="3000"/>
<copyField source="southLatitude" dest="_text_" maxChars="3000"/>
<copyField source="state" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayCellType" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayMeasurementType" dest="_text_" maxChars="3000"/>
Expand Down
4 changes: 2 additions & 2 deletions conf/solr/9.3.0/solrconfig.xml → conf/solr/solrconfig.xml
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@
have some sort of hard autoCommit to limit the log size.
-->
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
<maxTime>${solr.autoCommit.maxTime:30000}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>

Expand All @@ -301,7 +301,7 @@
-->

<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
<maxTime>${solr.autoSoftCommit.maxTime:1000}</maxTime>
</autoSoftCommit>

<!-- Update Related Event Listeners
Expand Down
File renamed without changes.
10 changes: 10 additions & 0 deletions doc/release-notes/10015-RO-Crate-metadata-file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Detection of mime-types based on a filename with extension and detection of the RO-Crate metadata files.

From now on, filenames with extensions can be added into `MimeTypeDetectionByFileName.properties` file. Filenames added there will take precedence over simply recognizing files by extensions. For example, two new filenames are added into that file:
```
ro-crate-metadata.json=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"
ro-crate-metadata.jsonld=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"
```

Therefore, files named `ro-crate-metadata.json` will be then detected as RO-Crated metadata files from now on, instead as generic `JSON` files.
For more information on the RO-Crate specifications, see https://www.researchobject.org/ro-crate
5 changes: 5 additions & 0 deletions doc/release-notes/10022_upload_redirect_without_tagging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
If your S3 store does not support tagging and gives an error if you configure direct uploads, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For more details see https://dataverse-guide--10029.org.readthedocs.build/en/10029/developers/big-data-support.html#s3-tags #10022 and #10029.

## New config options

- dataverse.files.<id>.disable-tagging
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Bug fixed for the ``incomplete metadata`` label being shown for published dataset with incomplete metadata in certain scenarios. This label will now be shown for draft versions of such datasets and published datasets that the user can edit. This label can also be made invisible for published datasets (regardless of edit rights) with the new option ``dataverse.ui.show-validity-label-when-published`` set to `false`.

This file was deleted.

This file was deleted.

5 changes: 0 additions & 5 deletions doc/release-notes/10216-metadatablocks.md

This file was deleted.

8 changes: 8 additions & 0 deletions doc/release-notes/10236-openapi-definition-endpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
In Dataverse 6.0 Payara was updated, which caused the url `/openapi` to stop working:

- https://github.com/IQSS/dataverse/issues/9981
- https://github.com/payara/Payara/issues/6369

When it worked in Dataverse 5.x, the `/openapi` output was generated automatically by Payara, but in this release we have switched to OpenAPI output produced by the [SmallRye OpenAPI plugin](https://github.com/smallrye/smallrye-open-api/tree/main/tools/maven-plugin). This gives us finer control over the output.

For more information, see the section on [OpenAPI](https://dataverse-guide--10328.org.readthedocs.build/en/10328/api/getting-started.html#openapi) in the API Guide.
1 change: 1 addition & 0 deletions doc/release-notes/10242-add-feature-dv-api
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
New api endpoints have been added to allow you to add or remove featured collections from a dataverse collection.
53 changes: 53 additions & 0 deletions doc/release-notes/10288-add-term_uri-metadata-in-keyword-block.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
### New keywordTermURI Metadata in keyword Metadata Block

Adding a new metadata `keywordTermURI` to the `keyword` metadata block to facilitate the integration of controlled vocabulary services, in particular by adding the possibility of saving the "term" and its associated URI. For more information, see #10288 and PR #10371.

## Upgrade Instructions

1\. Update the Citation metadata block

- `wget https://github.com/IQSS/dataverse/releases/download/v6.3/citation.tsv`
- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"`

2\. Update your Solr `schema.xml` to include the new field.

For details, please see https://guides.dataverse.org/en/latest/admin/metadatacustomization.html#updating-the-solr-schema


3\. Reindex Solr.

Once the schema.xml is updated, Solr must be restarted and a reindex initiated.
For details, see https://guides.dataverse.org/en/latest/admin/solr-search-index.html but here is the reindex command:

`curl http://localhost:8080/api/admin/index`


4\. Run ReExportAll to update dataset metadata exports. Follow the instructions in the [Metadata Export of Admin Guide](https://guides.dataverse.org/en/latest/admin/metadataexport.html#batch-exports-through-the-api).


## Notes for Dataverse Installation Administrators

### Data migration to the new `keywordTermURI` field

You can migrate your `keywordValue` data containing URIs to the new `keywordTermURI` field.
In case of data migration, view the affected data with the following database query:

```
SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue')
AND value ILIKE 'http%';
```

If you wish to migrate your data, a database update is then necessary:

```
UPDATE datasetfield df
SET datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordTermURI')
FROM datasetfieldvalue dfv
WHERE dfv.datasetfield_id = df.id
AND df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue')
AND dfv.value ILIKE 'http%';
```

A ['Reindex in Place'](https://guides.dataverse.org/en/latest/admin/solr-search-index.html#reindex-in-place) will be required and ReExportAll will need to be run to update the metadata exports of the dataset. Follow the directions in the [Admin Guide](http://guides.dataverse.org/en/latest/admin/metadataexport.html#batch-exports-through-the-api).
5 changes: 5 additions & 0 deletions doc/release-notes/10316_cvoc_http_headers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
You are now able to add HTTP request headers required by the External Vocabulary Services you are implementing.

A combined documentation can be found on pull request [#10404](https://github.com/IQSS/dataverse/pull/10404).

For more information, see issue [#10316](https://github.com/IQSS/dataverse/issues/10316) and pull request [gddc/dataverse-external-vocab-support#19](https://github.com/gdcc/dataverse-external-vocab-support/pull/19).
Loading

0 comments on commit b3ace02

Please sign in to comment.