Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix authorsAlpha #11614

Merged
merged 21 commits into from
Aug 16, 2024
Merged

Fix authorsAlpha #11614

merged 21 commits into from
Aug 16, 2024

Conversation

subhramit
Copy link
Member

@subhramit subhramit commented Aug 12, 2024

Fix authorsAlpha

[Auxiliary PR, required for the GSoC '24 CSL4LibreOffice project]

Fix for the function authorsAlpha in BracketedPattern which returns author representation for the purpose of alphanumeric styles.
Adapted for the purpose of DIN 1505-02.
Changes:

  1. Max authors before "exceed limit" is 4.
  2. No appending "+" if exceeded
  3. For a single author, (maximum) first 4 letters of the author name are taken instead of 3.

Investigation ref. #11577 and issue subhramit#17

References:
https://forums.zotero.org/discussion/12239/style-request-lecture-notes-in-informatics-lni-from-the-germany-computer-science-society
https://forums.zotero.org/discussion/82660/format-din-1505-2-alphanumeric-german-standard-superseded-by-iso-690
https://forums.zotero.org/discussion/82631/problem-with-citation-style-din-1505-2-alphanumeric-german-and-word-integration
Template attached to first link: ISSS-word-template.docx

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@subhramit subhramit mentioned this pull request Aug 12, 2024
6 tasks
@koppor
Copy link
Member

koppor commented Aug 12, 2024

The authorsAlpha implementation is aligned with "GI e.V."'s biblatexs style: https://github.com/gi-ev/biblatex-lni

Output:

image

LaTeX-Source: https://github.com/gi-ev/biblatex-lni/blob/main/basic-test-en.tex

If we change something in the behavior, all things need to be aligned.

The "ultimate" truth is the Word template at https://gi.de/service/publikationen/lni. But I remember, that also does not cover all cases.

@koppor
Copy link
Member

koppor commented Aug 12, 2024

The current test cases are:

    static Stream<Arguments> authorsAlpha() {
        return Stream.of(
                Arguments.of("A+", "Alexander Artemenko and others"),
                Arguments.of("A+", "Aachen and others"),
                Arguments.of("AB+", "Aachen and Berlin and others"),
                Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and others"),
                // FIXME: Arguments.of("Aach", "Aachen"),
                Arguments.of("AB", "Aachen and Berlin"),
                Arguments.of("ABC", "Aachen and Berlin and Chemnitz"),
                Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf"),
                Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and others"),
                Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen"),
                Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen and others"));
    }

If the test case do not fail at this PR, you found a simpler implementation.

@koppor
Copy link
Member

koppor commented Aug 12, 2024

The update does not support "and others" any more:

image

@subhramit
Copy link
Member Author

The current test cases are:
Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and others"),
Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen"),
Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen and others"));
}


If the test case do not fail at this PR, you found a simpler implementation.

ABC+ won't work with this one. I had removed the + for DIn 1505-02.
Also, ABC+ implies three max authors represented, while the PR changes it to 4 (ABCD and no +)

@koppor
Copy link
Member

koppor commented Aug 12, 2024

Examples from the template:

[AB00]	Abel, K.; Bibel, U.: Formatierungsrichtlinien für Tagungsbände. Format-Verlag, Bonn, 2000.
[ABC01]	Abraham, N.; Bibel, U.: Corleone, P.: Formatting Contributions for Proceedings. In (Glück, H.I. ed.): Proc. 7th Int. Conf. on Formatting of Workshop-Proceedings, New York 1999. Noah & Sons, San Francisco, pp. 46-53, 2001.
[An14] 	Anteil an Frauen in der Informatik. Statistics Worldwide, 2014.
[Az09]	Azubi, L. et.al.: Die Fußnote in LNI-Bänden. In (Glück, H. I., ed.): Formatierung 2009. LNI 999, Format-Verlag, Bonn, pp. 135-162, 2009.
[Ez10]	Ezgarani, O.: The Magic Format – Your Way to Pretty Books. Noah & Sons, 2010.
[GI14]	GI, Gesellschaft für Informatik e.V., [www.gi-ev.at](http://www.gi-ev.at/), accessed: 24/12/2014.
[Gl09]	Glück, H. I.: Formatierung leicht gemacht. Formatierungsjournal 11/09, pp. 23-27, 2009.
[Wa14a] 	Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Essenzen der Informatik. Verlag Formvoll, 2014.
[Wa14b] 	Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Ganz neue Essenzen der Informatik im selben Jahr. Format-Verlag, 2014.

@koppor
Copy link
Member

koppor commented Aug 12, 2024

After double check with the LNI guidelines, it is OK to remove the +.

For proper software engineering:

  • Add support for handling "and others" (no "+" but two letters of the first last name)
  • Adapt existing test cases
  • Add LNI test cases to test cases (Fix authorsAlpha #11614 (comment))

@koppor koppor marked this pull request as draft August 12, 2024 20:00
@subhramit
Copy link
Member Author

Sorry, I mis-clicked on edit instead of reply.
@koppor
Could you clarify a bit more on:

  • Add support for handling "and others" (no "+" but two letters of the first last name)

Meaning - if there is "and others" in the title, what should be done differently? Should it be treated as if we ignore "and others"? First last name of whom? First author? All authors? And when? (is the two letters thing related to/valid only in the case of "and others"?)
https://forums.zotero.org/discussion/82631/problem-with-citation-style-din-1505-2-alphanumeric-german-and-word-integration has the following:
image

That is, till three authors, a total of 4 letters would be filled, depending on how many "extra authors" are there after 1.

@koppor
Copy link
Member

koppor commented Aug 12, 2024

I thought, the rules for "exceeding" are clear.

Four letters of the first last name.

Treat and others as exceeding.

@koppor
Copy link
Member

koppor commented Aug 12, 2024

forums.zotero.org/discussion/82631/problem-with-citation-style-din-1505-2-alphanumeric-german-and-word-integration has

A random dude in the net writing something in a comment without trustable sources. - Moreover, his examples lead to different results as shown at #11614 (comment).

I think, the pattern the linuxguy said is supported by JabRef too, but I don't find at https://docs.jabref.org/setup/citationkeypatterns

@subhramit
Copy link
Member Author

Side note: Also required for PR-D of CSL4LibreOffice.

@subhramit
Copy link
Member Author

subhramit commented Aug 12, 2024

After double check with the LNI guidelines, it is OK to remove the +.

For proper software engineering:

  • Add support for handling "and others" (no "+" but two letters of the first last name)

Can you tell me if the following expected outputs are correct/desired?:

Case 1:

Arguments.of("Al", "Alexander Artemenko and others"),
                Arguments.of("Aa", "Aachen and others"),
                Arguments.of("Aa", "Aachen and Berlin and others"),
                Arguments.of("Aa", "Aachen and Berlin and Chemnitz and others"),
                Arguments.of("AB", "Aachen and Berlin"),
                Arguments.of("ABC", "Aachen and Berlin and Chemnitz"),
                Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf"),
                Arguments.of("Aa", "Aachen and Berlin and Chemnitz and Düsseldorf and others"),
                Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen"),
                Arguments.of("Aa", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen and others"));

Case 2:

Arguments.of("Al", "Alexander Artemenko and others"),
                Arguments.of("Aa", "Aachen and others"),
                Arguments.of("AaBe", "Aachen and Berlin and others"),
                Arguments.of("AaBeCh", "Aachen and Berlin and Chemnitz and others"),
                Arguments.of("AB", "Aachen and Berlin"),
                Arguments.of("ABC", "Aachen and Berlin and Chemnitz"),
                Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf"),
                Arguments.of("AaBeChDü", "Aachen and Berlin and Chemnitz and Düsseldorf and others"),
                Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen"),
                Arguments.of("AaBeChDü", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen and others"));

Case 3:

Arguments.of("Al", "Alexander Artemenko and others"),
                Arguments.of("Aa", "Aachen and others"),
                Arguments.of("AaB", "Aachen and Berlin and others"),
                Arguments.of("AaBC", "Aachen and Berlin and Chemnitz and others"),
                Arguments.of("AB", "Aachen and Berlin"),
                Arguments.of("ABC", "Aachen and Berlin and Chemnitz"),
                Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf"),
                Arguments.of("AaBC", "Aachen and Berlin and Chemnitz and Düsseldorf and others"),
                Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen"),
                Arguments.of("AaBC", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen and others"));

or anything else?

@subhramit
Copy link
Member Author

subhramit commented Aug 12, 2024

I thought, the rules for "exceeding" are clear.

Four letters of the first last name.

Treat and others as exceeding.

In the comment stating requirements, you mentioned two letters of the first last name instead of four (as mentioned here). I would request more clarification with examples for "and others" case with multiple authors (refer the cases above).

@koppor
Copy link
Member

koppor commented Aug 13, 2024

I thought, the rules for "exceeding" are clear.

Four letters of the first last name.

Treat and others as exceeding.

In the comment stating requirements, you mentioned two letters of the first last name instead of four (as mentioned here). I would request more clarification with examples for "and others" case with multiple authors (refer the cases above).

I take #11614 (comment) as ground truth. They use two letters. We follow that.

With "exceeding" I meant, it exceeds maximum number of authors. Thus, "and others" means five authors present. Thus, it is case 1.

@subhramit
Copy link
Member Author

Also, the last two examples from the template have:

[Wa14a] Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Essenzen der Informatik. Verlag Formvoll, 2014.
[Wa14b] Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Ganz neue Essenzen der Informatik im selben Jahr. Format-Verlag, 2014.

[Wa14a] and [Wa14b] imply there has to be an alphabetical counter that gets auto-incremented if two instances of authors are exactly the same but the papers are different. CSL uses itemDataProvider, which automatically supplies numbers, which are then taken care of by reference marks. We would need a similar system for this - but also that detects difference only in papers with the same authors. Furthermore, the letter has to be appended after the "Year" part, which is not handled by authorsAlpha.

Is this a compulsory requirement for this PR? Or should we just add the other cases to the test case and keep this as an edge case and try to think of it later?

@subhramit
Copy link
Member Author

subhramit commented Aug 13, 2024

Also, the last two examples from the template have:

[Wa14a] Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Essenzen der Informatik. Verlag Formvoll, 2014.
[Wa14b] Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Ganz neue Essenzen der Informatik im selben Jahr. Format-Verlag, 2014.

[Wa14a] and [Wa14b] imply there has to be an alphabetical counter that gets auto-incremented if two instances of authors are exactly the same but the papers are different. CSL uses itemDataProvider, which automatically supplies numbers, which are then taken care of by reference marks. We would need a similar system for this - but also that detects difference only in papers with the same authors. Furthermore, the letter has to be appended after the "Year" part, which is not handled by authorsAlpha.

Is this a compulsory requirement for this PR? Or should we just add the other cases to the test case and keep this as an edge case and try to think of it later?

This is also slightly contradictory, as it does not exceed the maximum number of authors (=4), yet uses the first author's two characters only instead of the first character from all 4. (At least it is well-established for DNI everywhere that there can be 4 authors maximum represented).

@subhramit
Copy link
Member Author

Also, the last two examples from the template have:

[Wa14a] Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Essenzen der Informatik. Verlag Formvoll, 2014.
[Wa14b] Wasser, K.; Feuer, H.; Erde, R.; Licht, H.: Ganz neue Essenzen der Informatik im selben Jahr. Format-Verlag, 2014.

[Wa14a] and [Wa14b] imply there has to be an alphabetical counter that gets auto-incremented if two instances of authors are exactly the same but the papers are different. CSL uses itemDataProvider, which automatically supplies numbers, which are then taken care of by reference marks. We would need a similar system for this - but also that detects difference only in papers with the same authors. Furthermore, the letter has to be appended after the "Year" part, which is not handled by authorsAlpha.

Is this a compulsory requirement for this PR? Or should we just add the other cases to the test case and keep this as an edge case and try to think of it later?

Okay, after some thought, I am skipping these last two cases. Reason: It would also require "future insight" - if the same authors are going to be cited again in future, we have to append an "a" at the end of the first instance right away. Since this will not be possible, the only way is to have targeted editing of the citations as more such entries are cited, which is a hard problem again (auto-refreshment of old citations).

@koppor
Copy link
Member

koppor commented Aug 13, 2024

Is this a compulsory requirement for this PR? Or should we just add the other cases to the test case and keep this as an edge case and try to think of it later?

One by one. Keep focus. Divide and conquer.

"Just" fix authorsAlpha.


JabRef has this functionality internally already. Check the preferences of the citation key generator.

Maybe, it can be reused. Maybe only if a complete bibdatabasecontext is available. Then, the code needs to be refactored.

@subhramit subhramit requested a review from koppor August 13, 2024 15:19
@subhramit
Copy link
Member Author

subhramit commented Aug 13, 2024

@koppor please see if the changes are fine.
Only case I did not handle is:

[An14] 	Anteil an Frauen in der Informatik. Statistics Worldwide, 2014.

(since it did not have any author name in it, I figured by translating, nor commas that would separate a name like GI, Gesellschaft für Informatik e.V., which is handled). This seems to be out of scope for "authorsAlpha".

Is this OK for now?

Also added one-to-one test cases for an auxiliary function I extracted to get last name, it was common for both single author and "and others" case.

@subhramit subhramit marked this pull request as ready for review August 13, 2024 15:27
@koppor
Copy link
Member

koppor commented Aug 13, 2024

The other day, @Siedlerchr too suggested that it is easier for maintainers to review and merge PRs if we branch off upstream/main. I was thinking of cloning directly and adding a branch from main, but I was wondering if contributors can safely do that as it was missing from devdocs (devdocs.jabref.org/contributing.html#understanding-the-basics-of-code-contributions).

I highlight the current text regarding the branches.

image

I think, points 2 and 3 need to be merged. And refined with the two git commands.

If it's just these two commands, I will file a PR adding it to the devdocs. Should we "prefer" new contributors to branch from main as well? Or should it just be mentioned as Choice 1/ Choice 2 in the docs?

There is no choice. It is always upstream/main. I have no clue, what some other choice could be...

@subhramit
Copy link
Member Author

There is no choice. It is always upstream/main. I have no clue, what some other choice could be...

My bad, that was unclear. I meant branching from the fork (origin) vs the main repo (upstream). I also saw Loay mention that builds are generated for branches from upstream.

@subhramit
Copy link
Member Author

subhramit commented Aug 13, 2024

Summary of last two commits (resolution of failing unit tests):

  1. Removed "+" from the end of expected strings in CitationKeyGeneratorTest for the tests involving authorsAlpha.
  2. Adapted expected authorsAlpha output "vdAal" to "Aa" for input "Wil van der Aalst" and so on, as per the convention we are following this PR onwards.
  3. Changed number of expected characters from authorsAlpha output e.g. from "New" to "Ne" for input "Isaac Newton".
  4. Similar changes in BracketedPatternTest.
  5. [Additional] Removed Exceptions that were not being thrown from tests in AbstractCitationKeyPatternsTest and BibEntryTest along the way.

Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen"),
Arguments.of("ABC+", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen and others"));
Arguments.of("Aa", "Aachen and Berlin and Chemnitz and Düsseldorf and others"),
Arguments.of("ABCD", "Aachen and Berlin and Chemnitz and Düsseldorf and Essen"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does exceed 4 authors, why is this not Aa?

Copy link
Member Author

@subhramit subhramit Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our discussion, we would do that if there is "and others". (Refer Case "1"). Furthermore, this has been established ever since we started talking about DIN. I can do only the first author's two letters of last name in case of exceeding authors but note that everywhere I have looked, in case of exceeding authors (and no "and others"), the abbreviation of first four authors are taken.
You can try citing 10.3945/ajcn.111.023457 using citationmachine as well: https://www.citationmachine.net/bibliographies/a446c586-d900-4583-a249-72d7b9bc0b52
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the LNI template: #11614 (comment)

We need to setup a minimal latex+bibtex document with authorsalpha with the test cases and see what that returns.

Comment on lines 871 to 877
static String getLastName(Author author) {
String[] nameParts = author.getNamePrefixAndFamilyName()
.replaceAll("\\s+", " ")
.trim()
.split(" ");
return nameParts[nameParts.length - 1];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you use org.jabref.model.entry.Author#getFamilyName?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@koppor
Copy link
Member

koppor commented Aug 16, 2024

@subhramit It is so great to see that now two persons (you and me) are in the context of authorsalpha.

Can you add a test for institutions?

In BibTeX, they are written as author = {{Gesellschaft für Informatik e.V.}}. In JabRef then {Gesellschaft für Informatik e.V.}` (because the outer braces are removed while reading (and added while writing)).

I am curious whether the thing works there, too.

@koppor
Copy link
Member

koppor commented Aug 16, 2024

It is good enough for now.

What I understood:

  • DIN is outdated
  • DIN is requested in the Zotero forum because of LNI use
  • LNI publishes its guidelines
  • LNI differs from DIN in some cases

What I miss:

Reason for the last point: We exchange information at multiple different places (which makes it scattered), but the core knowledge is not condensed. What if someone looks into that in one year - and if none of us both is available? --> Documentaiton enables asynchronous work.

Follow-up TODOs:

  • Collect knowledge in keygenerators.md. (New PR!)
  • Create .tex and .bib file for authorsalpha and the different cases (New PR)
  • Find usages of authorsalpha in the wild. (Add it to keygenerators.md. (New PR)
  • check institutions - example: author = {{Apache Foundation}}. It is OK if it fails, we just need to know and then think of solutions. For BibLaTeX, there is the feature label = {AF}. (add result to keygenerators.md - New PR). Add test case for that. Maybe commented out or marked with // TODO ...

Comment on lines 871 to 877
static String getLastName(Author author) {
String[] nameParts = author.getNamePrefixAndFamilyName()
.replaceAll("\\s+", " ")
.trim()
.split(" ");
return nameParts[nameParts.length - 1];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code hint - if you encounter a similar case in the future, there is a more intuitive code IMHO:

https://www.w3schools.com/java/ref_string_lastindexof.asp

int pos = string.lastIndexOf(" ");
String result = string.substr(pos +1);

@subhramit
Copy link
Member Author

It is good enough for now.

What I understood:

  • DIN is outdated
  • DIN is requested in the Zotero forum because of LNI use
  • LNI publishes its guidelines
  • LNI differs from DIN in some cases

What I miss:

Reason for the last point: We exchange information at multiple different places (which makes it scattered), but the core knowledge is not condensed. What if someone looks into that in one year - and if none of us both is available? --> Documentaiton enables asynchronous work.

Follow-up TODOs:

  • Collect knowledge in keygenerators.md. (New PR!)
  • Create .tex and .bib file for authorsalpha and the different cases (New PR)
  • Find usages of authorsalpha in the wild. (Add it to keygenerators.md. (New PR)
  • check institutions - example: author = {{Apache Foundation}}. It is OK if it fails, we just need to know and then think of solutions. For BibLaTeX, there is the feature label = {AF}. (add result to keygenerators.md - New PR). Add test case for that. Maybe commented out or marked with // TODO ...

I have handled the case of institutions now.

Comment on lines 856 to 857
if (lastName.startsWith("{") && lastName.endsWith("}")) {
alphaStyle.append(getOrganizationInitials(lastName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite using org.jabref.logic.formatter.bibtexfields.RemoveEnclosingBracesFormatter

  1. format the string using that formatter
  2. if result is not equal -> you found an organization
  3. Pass that string to getOrganizationInitials - the first two lines can be removed then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be better than before 😅.

Follow-ups as discussed.

@koppor koppor enabled auto-merge August 16, 2024 09:14
@koppor koppor added this pull request to the merge queue Aug 16, 2024
Merged via the queue into JabRef:main with commit c3756e2 Aug 16, 2024
21 checks passed
@koppor koppor deleted the fix-authorsAlpha branch August 16, 2024 09:27
@koppor
Copy link
Member

koppor commented Dec 24, 2024

Where is the issue? There was no link in the PR description here - and at the issue tracker, I only found JabRef/user-documentation#453

@subhramit
Copy link
Member Author

Where is the issue? There was no link in the PR description here - and at the issue tracker, I only found JabRef/user-documentation#453

The issue is mentioned in the PR description:

Investigation ref. #11577 and issue subhramit#17

I am/was not aware of any other existing issues around this. We started brainstorming when the requirement for adding alphanumeric style support in CSL came up (in the above pull/issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants