Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make/Sync bibliography should remove entries #22

Closed
koppor opened this issue Aug 7, 2024 · 10 comments · Fixed by JabRef/jabref#11712
Closed

Make/Sync bibliography should remove entries #22

koppor opened this issue Aug 7, 2024 · 10 comments · Fixed by JabRef/jabref#11712

Comments

@koppor
Copy link
Collaborator

koppor commented Aug 7, 2024

I played around with Chocolate.bib. I added [3], but removed it later.

When pressing the bibliography-refresh-button, that number needs to go, too.

Maybe, all text markes need to be scanned and all numbers re-adjusted?


image

@koppor
Copy link
Collaborator Author

koppor commented Aug 7, 2024

Re-calculation: numbering starts with the first appearance in the text. Thus, if I swap two citations, the numbers are swapped, too.

@subhramit
Copy link
Owner

subhramit commented Aug 7, 2024

Okay, understood. This is a big one.
Once a citation which is not the last citation is removed from the document, the numbers need to be recalculated and redistributed. (Right now it works on "refresh on the basis of highest cited number of a unique entry", as demonstrated in the example).
This would also involve updating the citation text.
@Siedlerchr this will be a mammoth task with non ending complexity, owing to what the number is covered by, due difference in style formats (multiplied by 2 as grouped citation behavior/formatting is different from individual for many styles (which can't be traced except if hard-coded)). Multiply that by infinity if a user, god forbid, uses multiple citation styles in the same document.

@subhramit subhramit added insane and removed insane labels Aug 7, 2024
@koppor
Copy link
Collaborator Author

koppor commented Aug 9, 2024

I think, it covers numeric citations only. - I saw that you track the numbers at citation inserting (see org.jabref.logic.openoffice.oocsltext.CSLCitationOOAdapter#updateMultipleCitations).

It is only about updating org.jabref.logic.openoffice.oocsltext.CSLReferenceMarkManager#citationKeyToNumber.

  1. Clear citationKeyToNumber
  2. for each reference mark:
    a. num = getCitationNumber(referenceMark.getCitationKey()b. update text withnum`

The only "hard" part is 2.b - but this can be done using regex, since you know the format is [ number ]. And you need to replace number by num.

@subhramit
Copy link
Owner

subhramit commented Aug 9, 2024

I think, it covers numeric citations only. - I saw that you track the numbers at citation inserting (see org.jabref.logic.openoffice.oocsltext.CSLCitationOOAdapter#updateMultipleCitations).

It is only about updating org.jabref.logic.openoffice.oocsltext.CSLReferenceMarkManager#citationKeyToNumber.

  1. Clear citationKeyToNumber
  2. for each reference mark:
    a. num = getCitationNumber(referenceMark.getCitationKey()b. update text withnum`

The only "hard" part is 2.b - but this can be done using regex, since you know the format is [ number ]. And you need to replace number by num.

Questions:

  1. How would we update the numbers in the entire document, and not just the bibliography section? We could be messing with the document if we just look for brackets.
  2. Format is [1] for some citation styles. Some have [1], [2], [1,2], (1),(2), (1,2), [1;2], (1;2) (surprisingly), some even other formatting like superscript{(1,2)}, some superscript{1,2}, some superscript{[1,2]}, superscript{[1],[2]} and many other kinds. Furthermore, things like superscript are "relatively" easy to deal with before the citation is inserted, as we are dealing with raw HTML 1,2 or [1,..] and so on (that is how we assign citation numbers when inserting), but once it enters the document, what to scan for (and is there even a way to scan, I have come across only functions that can move cursor and replace text), and how to update precisely those? There can be other text with these formattings too.

@subhramit
Copy link
Owner

subhramit commented Aug 9, 2024

I think, it covers numeric citations only. - I saw that you track the numbers at citation inserting (see org.jabref.logic.openoffice.oocsltext.CSLCitationOOAdapter#updateMultipleCitations).
It is only about updating org.jabref.logic.openoffice.oocsltext.CSLReferenceMarkManager#citationKeyToNumber.

  1. Clear citationKeyToNumber
  2. for each reference mark:
    a. num = getCitationNumber(referenceMark.getCitationKey()b. update text withnum`

The only "hard" part is 2.b - but this can be done using regex, since you know the format is [ number ]. And you need to replace number by num.

Questions:

  1. How would we update the numbers in the entire document, and not just the bibliography section?
  2. Format is [1] for some citations. Some have [1], [2], [1,2], (1),(2), (1,2), [1;2], (1;2) (surprisingly), some even other formatting like superscript{(1,2)}, some superscript{1,2}, some superscript{[1,2]}, superscript{[1],[2]} and many other kinds. Furthermore, things like superscript are "relatively" easy to deal with before the citation is inserted, as we are dealing with raw HTML 1,2 or [1,..] and so on (that is how we assign citation numbers when inserting), but once it enters the document, what to scan for, and how to update precisely those? There can be other text with these formattings too.
  1. Even if hypothetically somehow we manage to do it for individual citations, grouped citations have a single combined reference mark at the end of the citation text (because we could not selectively wrap them around each entry of the group, as each grouped citation acts as a single citation string output. I tried dealing it using regex, and even at the raw stage it failed because of the difference in styles and separators between the numbers). This means even if we had a way to store the location of every citation using their respecive reference marks, we cannot have the exact location for grouped entries [we can't even turn on ctrl+f8 and play around/move those grouped citation marks around in the document!].
    JStyles had a relatively simpler problem statement when it comes to this, as we define the style and separators ourselves (and not many variations, some even hardcoded).

@koppor
Copy link
Collaborator Author

koppor commented Aug 9, 2024

Questions:

  1. How would we update the numbers in the entire document, and not just the bibliography section?

We have the JabRef_ text markers marking the areas of the citation, don't we? These text markers mark the start and the end of the citation string. Don't they? - If both assumptions are true, one can iterate through all text marks and work on their content.

  1. Format is [1] for some citation styles. Some have [1], [2], [1,2], (1),(2), (1,2), [1;2], (1;2) (surprisingly), some even other formatting like superscript{(1,2)},

All of them have in common that numbers are used for the citation number and other strings for some citation sugar.

Thus, searching for the first match of \d+ matches the first number, the next search the next number etc.

@koppor
Copy link
Collaborator Author

koppor commented Aug 9, 2024

grouped citations have a single combined reference mark

Does that have all citation keys? If yes: Good; If not: we need to modify the CslRefernceMark class accordingly.

Maybe first implement the refresh for non-grouped only. If that works, one can think about extending it to grouped citations.

@subhramit
Copy link
Owner

subhramit commented Aug 9, 2024

We have the JabRef_ text markers marking the areas of the citation, don't we? These text markers mark the start and the end of the citation string. Don't they? - If both assumptions are true, one can iterate through all text marks and work on their content.

Reference markers don't give us the location of text in the document. They are just used to annotate text in an invisible way. What we are looking forward to are "anchors" and "page info" information. Implementing them can be a starting point.

All of them have in common that numbers are used for the citation number and other strings for some citation sugar.
Thus, searching for the first match of \d+ matches the first number, the next search the next number etc.

This is assuming we have used numbers in our document only in citation text and nowhere else. Search will not work on the basis of citation text. If we search on the basis of reference markers, we will be able to update the number in the reference mark but not the text, as they don't give us the location of text.

I will not attempt this as of now, maybe after merging PR-D (preferably if any two of us work together on this). @Siedlerchr can attest how difficult it is to even trace and manipulate a newline in a "marked" (even anchored) area of the document. Whenever cursors come into play, they take a lot of experimentation which can be done when we have time.

For future reference - me or anybody who wants to try this
We will have to (A)

  1. anchor the reference mark insertion point
  2. re-distribute numbers if a citation is removed from document (has to be done when update is pressed as first step)
  3. get the anchor locations
  4. use \d+ regex to match, early stop at first match.
  5. change the citation number in the text.
  6. update the reference mark

OR (B)

  1. anchor the citations
  2. Handle redistribution of numbers on deletion of a citation
  3. search using regex and https://wiki.documentfoundation.org/Documentation/SDKGuide/Text_Search_and_Replace
  4. if the match is anchored, manipulate the citation text
  5. update the reference mark

To the reader - more things to ponder on: How to update an existing reference marker text (A.6 or B.5)? We can update text, we can read existing reference marks, but how to change them, as they are not a part of the primary text. We need to find a way to remove the old reference mark and insert a new one.

More reference on anchors: https://stackoverflow.com/questions/69500141/how-do-i-iterate-over-an-entire-document-in-openoffice-libreoffice-with-uno
https://devdocs.jabref.org/code-howtos/openoffice/order-of-appearance.html
https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1text_1_1XTextContent.html#ae82a8b42f6b2578549b68b4483a877d3

@ThiloteE ThiloteE changed the title Refresh bibliography should remove entries Make/Sync bibliography should remove entries Aug 19, 2024
@ThiloteE
Copy link
Collaborator

ThiloteE commented Aug 29, 2024

Idea:

If we only have the CID available, but no wrapper around the citation, then maybe the following could work:

We cite the first time: Nothing shall happen:
Any other make/sync bibliography, cite or cite-in-text (in random sequence in the document) command should trigger the following:

  1. Create directly behind old citations (v1) a new citation (v2) with correct order of numbering. (Exception: if a citation is deleted, do not create a new citation.)
  2. Copy the reference mark of all old citations (v1) to and attach to new citations (v2).
  3. Delete the old citations (v1) and delete the reference marks of old citations (v1) too. We know that a citation is a citation, because a reference mark is attached to it. We also know that every second citation is the new version of the citation. By rule of elimination, we can infer that the other citations must be of the old version and are those we can delete.

Now, what we should have are numerically correctly ordered citations, but wrong (old) reference marks.

  1. Delete all reference marks.
  2. Re-create all reference marks.

Now, what we should have are correctly orderd citations and new correctly ordered reference marks.

Here the German translation:

Eine Idee:

Wenn wir nur die CID zur Verfügung haben, aber keine Anführungs und Endzeichen, dann könnte vielleicht folgendes funktionieren:

Wir zitieren das erste Mal: Es soll nichts passieren:
Jedes weitere erstellen / synchronisieren eines Literaturverzeichnises, zitieren oder zitieren im Text (in beliebiger Reihenfolge im Dokument) sollte folgendes auslösen:

  1. Direkt hinter den alten Zitaten (v1) werden neue Zitate (v2) mit der korrekten Reihenfolge der Nummerierung erstellt. (Ausnahme: Wenn ein Zitat gelöscht wird, wird kein neues Zitat erstellt).
  2. Es werden die Referenzmarkierungen aller alten Zitate (v1) in die neuen Zitate (v2) kopiert, bzw. angeheftet.
  3. Es werden die alten Zitate (v1) und auch die Referenzmarken der alten Zitate (v1) gelöscht. Wir wissen, dass ein Zitat ein Zitat ist, weil es mit einer Referenzmarke versehen ist. Wir wissen auch, dass jedes zweite Zitat die neue Version des Zitats ist. Nach dem Ausschlussprinzip können wir davon ausgehen, dass die anderen Zitate der alten Version angehören müssen und wir sie deshalb löschen können.

Was wir nun haben sollten, sind numerisch korrekt geordnete Zitate, aber falsche (alte) Referenzmarken.

  1. Es werden alle Verweiszeichen gelöscht.
  2. Alle Verweiszeichen werden neu angelegt.

Jetzt sollten die Zitate und neuen Verweiszeichen sowohl vorhanden, als auch richtig geordnet sein.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants