Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an importer for Citavi backup files #8848

Merged
merged 45 commits into from
Jun 1, 2022
Merged

Conversation

lishangyu9
Copy link
Contributor

@lishangyu9 lishangyu9 commented May 25, 2022

Fixes #8322 .
This importer for Citavi support ‘ctv5bak’ and ‘ctv6bak’ files.

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

What we have added

  • The Citavi file format support

    Add the definition of Citavi file in the “StandardFileType.java”.
  • The XSD file defined the Citavi XML file

    Add the file “scitavi.xsd” for generating the java class “CitaviExchangeData.java”.
  • The importer for Citavi file “ctv5bak” and “ctv6bak”

    • Add the class “src/main/java/org/jabref/logic/importer/fileformat/CitaviXmlImporter.java”.
      Because the “ctv6bak” and “ctv5bak” are compressed files. We add the function “getReaderFromZip” to unzip the file before parsing.
    • The XML file inside the “ctv6bak” is UTF-8 BOM format. This format cannot be recognized by the default process of XML. Therefore We create a function “checkForUtf8BOMAndDiscardIfAny” to delete the byte at the beginning of the XML file.
    • The process of parsing XML data is similar to “EndnoteImporter”. However, the date structures of Endnote and Citavi are different. The primary different methods are “getAuthorName”, “getEditorName”, “getKeywords” and “getPublisher”. These data are located in different places with reference data. For example, we need to go through the author’s matching list and match the references ID. If the id is matched we got the author’s ID list. Finally, we can get the author’s name.
  • The test of Citavi importer

    The test class of Citavi importer is “CitaviXmlImporterTestFiles.java”.
    There are 4 test cases:
    “CitaviXmlImporterTest1.ctv6bak”, “CitaviXmlImporterTest1.bib”
    “CitaviXmlImporterTest2.ctv6bak”, “CitaviXmlImporterTest2.bib”
    “CitaviXmlImporterTest3.ctv6bak”, “CitaviXmlImporterTest3.bib”
    “CitaviXmlImporterTest4.ctv6bak”, “CitaviXmlImporterTest4.bib”

How to use the Citavi Importer

  1. Click the “File” → ”Import” → ”Import into new library”.
    1
  2. Select the file. Click “Open”.
    2
  3. The file is imported
    3

Siedlerchr and others added 30 commits April 14, 2022 18:21
Add generated citavi xsd schema
Fix some resouces leaks

Use AuthorListParser
@Siedlerchr
Copy link
Member

Thank you very much for your work! Impressive 👍 I took a more in-depth look at the author handling code, that's really ugly but I couldn't find a general better approach. However, you do not have to iterate over the whole String, the ids are all UUIDs that have a fixed length of 36 characters. So you can improve your loops.

Second thing: I implemented an example of using JabRef's Author and AuthorLIst class for adding new authors. You just need to adjust your test bib files (Lastname, Firstname), can be done in JabRef's GUI: (Quality-> Cleanup -> Normalize in persons) and maybe the same for other persons fields.

@Siedlerchr
Copy link
Member

@lishangyu9 I came up with a more efficient solution for the authors and editors.
In essential, the oneToN is similar to a relational database schema where the first part as you already identified is the referenced and the other ones are the person ids.

Instead of iterating through the whole author + person list for every reference, I create a mapping between referenceId and Persons. In addition, I introduced a second Map for storing Persons we already know (if the same author with the same uuid comes up multiple times, so we don't have to through all Person objects everytime.

Antoher thing we can use for our advantage: We know that all UUIDs have a length of 36 chars and we know that the first only the first UUID is the referenceID. So we can safely assume that the rest of the string are the Person ids. So we can use split to get an array of all person ids if we just split at the semicolon, after we removed the refId from it.

This can be adapted for other oneToN relationships as well

@lishangyu9
Copy link
Contributor Author

Thank you @Siedlerchr . We will check and fix it today.

@lishangyu9
Copy link
Contributor Author

Hi @Siedlerchr, we fixed other's OneToN relationships, "Keywords" and "Publishers". We followed the method you have made and created the "buildKeywordList" and "buildPublisherList" functions for mapping the data and reference Id. We appreciate your help.

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label May 28, 2022
Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor changes, but overall looks very good to me!

@Siedlerchr
Copy link
Member

@claell You were using citavi previously, it would be nice if you could testt this PR.
(best is using github cli tools, gh pr checkout 8848 and then ./gradlew run should assemble & run jabref

@claell
Copy link
Contributor

claell commented May 31, 2022

@Siedlerchr when I find the time, I'll try that. I don't have GitHub's cli tools installed. Is there another easy way or should I just use them? Also, I likely cannot do too in depth testing due to constrained time.

@calixtus
Copy link
Member

gh cli is supposed to be the easy way... you can of course also follow the "command line instructions" on the bottom of this page

@calixtus
Copy link
Member

grafik

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it is OK. Some nitpicks, but we can fix them for ourselves later.

The thing I see here, that there was strong test-driven development somehow:

buildPersonList, buildKeywordList, buildPublisherList should all have been package private and tested by CitaviXmlImporter.

However, I think, this project is over. Think about using JUnit and the play / debug button also for methods themselves the next java project.

@calixtus
Copy link
Member

calixtus commented Jun 1, 2022

I just mentioned you are mixing the old io and the new nio file apis. Is there a chance to switch completely to the new nio API? Provides thread safety i.a....

@Siedlerchr
Copy link
Member

I just mentioned you are mixing the old io and the new nio file apis. Is there a chance to switch completely to the new nio API? Provides thread safety i.a....

@calixtus Ehm they are meant to be interoperable, for example Files.newInputStream returns an InputStream. And Inputstream is just the base class in java.io. There is not all in nio.

@Siedlerchr Siedlerchr merged commit 943489e into JabRef:main Jun 1, 2022
Siedlerchr added a commit that referenced this pull request Jun 1, 2022
* upstream/main:
  Add an importer for Citavi backup files (#8848)
  Reviewdoc: Comment on PRs (#8878)
  Squashed 'buildres/csl/csl-styles/' changes from 649aac4..e740261
  Use JDK 15 text blocks to improve injected languages readability (#8874)
  Fix fetcher tests (#8877)
  Fix #8390 by allowing multiple group deletion for Remove groups > Kee… (#8875)
  Add restart warning on SSL configuration change (#8871)
  Update to lucene 9.2 (#8868)
  Fix for removing several groups deletes only one of them (#8801)
  Disable Write XMP Button in General tab of Entry-Editor when action is in progress (#8728)
  Bump jsoup from 1.14.3 to 1.15.1 (#8864)
  Bump unirest-java from 3.13.8 to 3.13.10 (#8869)
  Bump unoloader from 7.3.2 to 7.3.3 (#8863)
  Bump pascalgn/automerge-action from 0.15.2 to 0.15.3 (#8860)
  Bump classgraph from 4.8.146 to 4.8.147 (#8861)
  Bump mockito-core from 4.5.1 to 4.6.0 (#8862)
  Lucence dir checkers should only delete lucence dirs (#8854)
  Update README.md (#8858)
  Update adr.md
  Update adr.md
@Siedlerchr
Copy link
Member

Thanks again for your valuable contribution and your quick follow up! 👍
We just noticed by chance that the pages field contained unicode minus signs instead of the normal ascii ones. So I created a follow up PR which calls the pages formatter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: import-load status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create an importer for Citavi
7 participants