Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate property key in Bundle.properties file V5.12.1 #9169

Closed
stevenferey opened this issue Nov 17, 2022 · 18 comments · Fixed by #9176 or #9718
Closed

Duplicate property key in Bundle.properties file V5.12.1 #9169

stevenferey opened this issue Nov 17, 2022 · 18 comments · Fixed by #9176 or #9718
Labels
Feature: Internationalization GREI 6 Connect Digital Objects Type: Bug a defect User Role: Hackathon Participant Core developers, frequent contributors, drive-by contributors, etc.
Milestone

Comments

@stevenferey
Copy link
Contributor

Hi Dataverse Community,

We detected a duplicate keys in the Bundle.properties file :

advanced.search.datasets.persistentId.tip=The persistent identifier for the Dataset.
advanced.search.datasets.persistentId.tip=The Dataset's unique persistent identifier, either a DOI or Handle

best regards
Steven Ferey.

@pdurbin
Copy link
Member

pdurbin commented Nov 17, 2022

@stevenferey thanks!

Yes, confirmed in the develop branch as of 26afd10

$ ack advanced.search.datasets.persistentId.tip src
src/main/java/propertyFiles/Bundle.properties
935:advanced.search.datasets.persistentId.tip=The persistent identifier for the Dataset.
937:advanced.search.datasets.persistentId.tip=The Dataset's unique persistent identifier, either a DOI or Handle

Interestingly, both lines are from the same commit (98acdca):

$ cd src/main/java/propertyFiles
$ git blame Bundle.properties | grep advanced.search.datasets.persistentId.tip
98acdca83a6 src/main/java/propertyFiles/Bundle.properties (Julian Gautier      2022-05-17 16:36:32 -0400  935) advanced.search.datasets.persistentId.tip=The persistent identifier for the Dataset.
98acdca83a6 src/main/java/propertyFiles/Bundle.properties (Julian Gautier      2022-05-17 16:36:32 -0400  937) advanced.search.datasets.persistentId.tip=The Dataset's unique persistent identifier, either a DOI or Handle

@qqmyers
Copy link
Member

qqmyers commented Nov 17, 2022

FWIW

ingest.csv.invalidHeader=Invalid header row. One of the cells is empty.
ingest.csv.lineMismatch=Mismatch between line counts in first and final passes!, {0} found on first pass, but {1} found on second.
ingest.csv.recordMismatch=Reading mismatch, line {0} of the Data file: {1} delimited values expected, {2} found.
ingest.csv.nullStream=Stream can't be null.
are duplicates as well

@jggautier
Copy link
Contributor

jggautier commented Nov 17, 2022

Looks like in the pull request for improving the tooltips in the Citation metadata block, I left both lines there? Sorry about that.

About the duplicate advanced.search.datasets.persistentId.tip lines, the second one on line 937 was the one we decided to use, but since later changes to Dataverse facilitate the use of PIDs other than DOIs and Handles, I think we should avoid mentioning specific PIDs, and something like the first one on line 935 would be better.

Should I create a PR so that only one advanced.search.datasets.persistentId.tip exists?

I don't understand how those ingest.csv... lines are mismatches, but I'm happy to help with that, too if you could write more about it @qqmyers. Or if I create a PR, could you fix that duplicate issue?

@qqmyers
Copy link
Member

qqmyers commented Nov 17, 2022

They are copies of

ingest.csv.invalidHeader=Invalid header row. One of the cells is empty.
ingest.csv.lineMismatch=Mismatch between line counts in first and final passes!, {0} found on first pass, but {1} found on second.
ingest.csv.recordMismatch=Reading mismatch, line {0} of the Data file: {1} delimited values expected, {2} found.
ingest.csv.nullStream=Stream can't be null.
and I think you can just delete whichever set you like.

@jggautier
Copy link
Contributor

jggautier commented Nov 17, 2022

Ah, thanks! I'll wait for this to be prioritized before creating a PR that removes from the Bundle.properties file:

  • one of the "advanced.search.datasets.persistentId.tip" entries
  • either of the duplicate sets of lines in @qqmyers's two comments

@poikilotherm
Copy link
Contributor

Should we add an action that runs a duplicate detection check on the properties files when they are changed?

@poikilotherm
Copy link
Contributor

poikilotherm commented Nov 18, 2022

I played with a little shell script (on Linux - no idea if this works on non-GNU MacOS, but it would in a Github Action):

for PF in $(find . -wholename '*/src/*.properties'); do
    FILTER=$(grep -a -v -E "^(#.*|\s*$)" "$PF" | cut -d"=" -f1 | sort | uniq -c | tr -s " " | grep -v "^ 1 ")
    if [ -n "$FILTER" ]; then
        echo "$PF"
        echo "$FILTER"
        echo ""
    fi
done

And it shows we have more problems with duplicates:

./src/main/java/propertyFiles/biomedical.properties
 2 controlledvocabulary.studyAssayMeasurementType.transcription_profiling

./src/main/java/propertyFiles/Bundle.properties
 2 advanced.search.datasets.persistentId.tip
 2 advanced.search.files.persistentId.tip
 2 authenticationProvider.name.builtin
 2 authenticationProvider.name.github
 2 authenticationProvider.name.google
 2 authenticationProvider.name.null
 2 authenticationProvider.name.orcid
 2 authenticationProvider.name.orcid-sandbox
 2 authenticationProvider.name.shib
 2 citationFrame.banner.countdownMessage.seconds
 2 ingest.csv.invalidHeader
 2 ingest.csv.lineMismatch
 2 ingest.csv.nullStream
 2 ingest.csv.recordMismatch

./src/main/java/propertyFiles/customGSD.properties
 2 controlledvocabulary.gsdCourseName.06337:_changing_natural_and_built_coastal_environments
 2 controlledvocabulary.gsdFacultyName.menges,_achim

It would be very easy to put this into an Action which runs on PRs that change properties files:

FAIL=0

for PF in $(find . -wholename '*/src/*.properties'); do
    FILTER=$(grep -a -v -E "^(#.*|\s*$)" "$PF" | cut -d"=" -f1 | sort | uniq -c | tr -s " " | grep -v "^ 1 ")
    if [ -n "$FILTER" ]; then
        echo "$PF"
        echo "$FILTER"
        echo ""
        FAIL=1
    fi
done

if [ "$FAIL" -eq 1 ]; then
    echo "Duplicate keys found, please check output."
    exit 1
fi

@poikilotherm
Copy link
Contributor

Just because I was feeling like it, I went ahead and created #9176

Some fun leisure time hacking!

@stevenferey
Copy link
Contributor Author

Hello,

Can we follow the result of the execution of the git workflow in order to propose a PR and remove key duplications?

Thank you so much,
Steven.

@pdurbin
Copy link
Member

pdurbin commented Jul 12, 2023

@stevenferey I may be misunderstanding but if you go to https://github.com/IQSS/dataverse/pull/9176/files you can see the reports of the duplicates (at the bottom). (Good work, @poikilotherm!) Here's a screenshot:

Screenshot 2023-07-12 at 17-12-59 9169 fix duplicate keys in Java properties files by poikilotherm · Pull Request #9176 · IQSS_dataverse

@jggautier
Copy link
Contributor

jggautier commented Jul 20, 2023

Just noting a comment I updated last week about one of the duplicate keys:

About the duplicate advanced.search.datasets.persistentId.tip lines, the second one on line 937 was the one we decided to use, but since later changes to Dataverse facilitate the use of PIDs other than DOIs and Handles, I think we should avoid mentioning specific PIDs, and something like the first one on line 935 would be better.

It looks like the PR keeps the other key instead, the one with the value "The Dataset's unique persistent identifier, either a DOI or Handle".

I think this is okay for now. I've been hoping to continue improving and applying the metadata text guide, and the tooltip for persistentId field can be reviewed and improved then.

@stevenferey
Copy link
Contributor Author

Thanks for your feedback,

I modified this key to insert the first proposal.

Steven.

@jggautier
Copy link
Contributor

Ah okay. Thanks. So on the advanced search page, the tooltip for the persistent ID field label will be "The persistent identifier for the Dataset." This will be different than what's in the tooltip for the persistent ID field label on the dataset page, right?

Here's a screenshot of that tooltip on the dataset page (after clicking the Metadata tab):
Screenshot 2023-07-26 at 9 18 38 AM

@stevenferey
Copy link
Contributor Author

Yes it is. I also propose to modify this label to align the two keys:

  • dataset.metadata.persistentId.tip=The persistent identifier for the Dataset.

I note that this formulation is also present for the keys:

  • dataset.metadata.alternativePersistentId.tip=A previously used persistent identifier for the Dataset, either a DOI or Handle

  • advanced.search.files.persistentId.tip=The unique persistent identifier for a data file, which can be a Handle or DOI in Dataverse.

  • file.metadata.persistentId.tip=The unique persistent identifier for a file, which can be a Handle or DOI in Dataverse.

do you want me to change these keys?
Steven.

@jggautier
Copy link
Contributor

I'm not sure if this should be changed in the PR for this issue. It feels like scope creep to me, if that makes sense.

Do you know if your PR is for the next Dataverse release, 5.14?

I'm wondering if there would be time to create a new issue to make sure these tooltips are consistent and create a new PR for that issue.

@jggautier
Copy link
Contributor

I was talking @scolapasta and it sounds like this PR was not planned to be in the next Dataverse release (v5.14).

We're thinking it would be better to leave the tooltip for the persistent ID field on the advanced search page as it is in 5.13, so it reads "The Dataset's unique persistent identifier, either a DOI or Handle" and leave the other tooltips that you found so far as they are. Then @scolapasta wrote that this PR could go into QA, although it may not be included in 5.14.

And then I'd open a separate GitHub issue about improving these tooltips and include the ones you listed. That way we can maintain the scope of this issue and PR and make sure we're not missing anything else when changing those other tooltips.

Does that sound okay? If so, could you change dataset.metadata.persistentId.tip back to "The Dataset's unique persistent identifier, either a DOI or Handle"?

@stevenferey
Copy link
Contributor Author

Thank you for your reply,

Absolutely, so I modify the key to display the initial value.

Steven.

@jggautier
Copy link
Contributor

Awesome! And thanks for spotting those other tooltips!

@pdurbin pdurbin added Type: Bug a defect Feature: Internationalization User Role: Hackathon Participant Core developers, frequent contributors, drive-by contributors, etc. labels Oct 8, 2023
@pdurbin pdurbin added this to the 6.3 milestone Jul 1, 2024
@cmbz cmbz added the GREI 6 Connect Digital Objects label Jul 9, 2024
@DS-INRAE DS-INRAE moved this to Done in Recherche Data Gouv Jul 10, 2024
pdurbin added a commit to poikilotherm/dataverse that referenced this issue Jul 25, 2024
stevenwinship added a commit that referenced this issue Aug 5, 2024
…ayName (#9176)

* ci: add workflow to check for duplicate keys in Java properties files #9169

* ci: add metadata blocks properties file check

As creating the properties files for metadata blocks is a
tedious manual process, this script ensures in CI everything
is present. It adds to checking for duplicates.

* feat,ci: move check scripts to separate files

- Easier to run and debug as separate files
- Can be used locally, too
- Use GraalVM to compile native binary for accents removal with same Java code as used in application (also uses JBang as build system)
- "Just" using JBang is not fast enough, JVM startup times are making it sluggish!

* fix,ci: remove shellspec tests on CentOS 7 as it is EOL

* test: add a stupid change to test checks run and find problems

* ci: fix JBang setup in properties check

* fix: make properties check script dir-safe

* fix: escape square brackets in pattern

* refactor: move CV mangling (lowercase, spaces) to Java

* fix(metadata): typo and missing property for CodeMeta

* Revert "test: add a stupid change to test checks run and find problems"

This reverts commit 33731ad.

* file for QA only. DO NOT CHECK IN

* file for QA only. DO NOT CHECK IN

* file for QA only. DO NOT CHECK IN

* Removing QA test files. Safe to merge

---------

Co-authored-by: Philip Durbin <[email protected]>
Co-authored-by: Steven Winship <[email protected]>
@pdurbin pdurbin modified the milestones: 6.3, 6.4 Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Internationalization GREI 6 Connect Digital Objects Type: Bug a defect User Role: Hackathon Participant Core developers, frequent contributors, drive-by contributors, etc.
Projects
Status: Done
6 participants