Make/Sync bibliography should remove entries #22

koppor · 2024-08-07T20:46:46Z

I played around with Chocolate.bib. I added [3], but removed it later.

When pressing the bibliography-refresh-button, that number needs to go, too.

Maybe, all text markes need to be scanned and all numbers re-adjusted?

koppor · 2024-08-07T20:47:44Z

Re-calculation: numbering starts with the first appearance in the text. Thus, if I swap two citations, the numbers are swapped, too.

subhramit · 2024-08-07T22:12:45Z

Okay, understood. This is a big one.
Once a citation which is not the last citation is removed from the document, the numbers need to be recalculated and redistributed. (Right now it works on "refresh on the basis of highest cited number of a unique entry", as demonstrated in the example).
This would also involve updating the citation text.
@Siedlerchr this will be a mammoth task with non ending complexity, owing to what the number is covered by, due difference in style formats (multiplied by 2 as grouped citation behavior/formatting is different from individual for many styles (which can't be traced except if hard-coded)). Multiply that by infinity if a user, god forbid, uses multiple citation styles in the same document.

koppor · 2024-08-09T07:51:54Z

I think, it covers numeric citations only. - I saw that you track the numbers at citation inserting (see org.jabref.logic.openoffice.oocsltext.CSLCitationOOAdapter#updateMultipleCitations).

It is only about updating org.jabref.logic.openoffice.oocsltext.CSLReferenceMarkManager#citationKeyToNumber.

Clear citationKeyToNumber
for each reference mark:
a. num = getCitationNumber(referenceMark.getCitationKey()b. update text withnum`

The only "hard" part is 2.b - but this can be done using regex, since you know the format is [ number ]. And you need to replace number by num.

subhramit · 2024-08-09T08:04:00Z

I think, it covers numeric citations only. - I saw that you track the numbers at citation inserting (see org.jabref.logic.openoffice.oocsltext.CSLCitationOOAdapter#updateMultipleCitations).

It is only about updating org.jabref.logic.openoffice.oocsltext.CSLReferenceMarkManager#citationKeyToNumber.

Clear citationKeyToNumber

for each reference mark:
a. num = getCitationNumber(referenceMark.getCitationKey()b. update text withnum`

The only "hard" part is 2.b - but this can be done using regex, since you know the format is [ number ]. And you need to replace number by num.

Questions:

How would we update the numbers in the entire document, and not just the bibliography section? We could be messing with the document if we just look for brackets.
Format is [1] for some citation styles. Some have [1], [2], [1,2], (1),(2), (1,2), [1;2], (1;2) (surprisingly), some even other formatting like superscript{(1,2)}, some superscript{1,2}, some superscript{[1,2]}, superscript{[1],[2]} and many other kinds. Furthermore, things like superscript are "relatively" easy to deal with before the citation is inserted, as we are dealing with raw HTML ^1,2 or ^[1,..] and so on (that is how we assign citation numbers when inserting), but once it enters the document, what to scan for (and is there even a way to scan, I have come across only functions that can move cursor and replace text), and how to update precisely those? There can be other text with these formattings too.

subhramit · 2024-08-09T08:11:06Z

I think, it covers numeric citations only. - I saw that you track the numbers at citation inserting (see org.jabref.logic.openoffice.oocsltext.CSLCitationOOAdapter#updateMultipleCitations).
It is only about updating org.jabref.logic.openoffice.oocsltext.CSLReferenceMarkManager#citationKeyToNumber.

Clear citationKeyToNumber

for each reference mark:
a. num = getCitationNumber(referenceMark.getCitationKey()b. update text withnum`

The only "hard" part is 2.b - but this can be done using regex, since you know the format is [ number ]. And you need to replace number by num.

Questions:

How would we update the numbers in the entire document, and not just the bibliography section?

Format is [1] for some citations. Some have [1], [2], [1,2], (1),(2), (1,2), [1;2], (1;2) (surprisingly), some even other formatting like superscript{(1,2)}, some superscript{1,2}, some superscript{[1,2]}, superscript{[1],[2]} and many other kinds. Furthermore, things like superscript are "relatively" easy to deal with before the citation is inserted, as we are dealing with raw HTML 1,2 or [1,..] and so on (that is how we assign citation numbers when inserting), but once it enters the document, what to scan for, and how to update precisely those? There can be other text with these formattings too.

Even if hypothetically somehow we manage to do it for individual citations, grouped citations have a single combined reference mark at the end of the citation text (because we could not selectively wrap them around each entry of the group, as each grouped citation acts as a single citation string output. I tried dealing it using regex, and even at the raw stage it failed because of the difference in styles and separators between the numbers). This means even if we had a way to store the location of every citation using their respecive reference marks, we cannot have the exact location for grouped entries [we can't even turn on ctrl+f8 and play around/move those grouped citation marks around in the document!].
JStyles had a relatively simpler problem statement when it comes to this, as we define the style and separators ourselves (and not many variations, some even hardcoded).

subhramit · 2024-08-09T08:19:36Z

Ref. https://github.com/JabRef/jabref/blob/main/src/main/resources/resource/openoffice/default_authoryear.jstyle

koppor · 2024-08-09T11:20:46Z

Questions:

How would we update the numbers in the entire document, and not just the bibliography section?

We have the JabRef_ text markers marking the areas of the citation, don't we? These text markers mark the start and the end of the citation string. Don't they? - If both assumptions are true, one can iterate through all text marks and work on their content.

Format is [1] for some citation styles. Some have [1], [2], [1,2], (1),(2), (1,2), [1;2], (1;2) (surprisingly), some even other formatting like superscript{(1,2)},

All of them have in common that numbers are used for the citation number and other strings for some citation sugar.

Thus, searching for the first match of \d+ matches the first number, the next search the next number etc.

koppor · 2024-08-09T11:24:02Z

grouped citations have a single combined reference mark

Does that have all citation keys? If yes: Good; If not: we need to modify the CslRefernceMark class accordingly.

Maybe first implement the refresh for non-grouped only. If that works, one can think about extending it to grouped citations.

subhramit · 2024-08-09T12:10:08Z

We have the JabRef_ text markers marking the areas of the citation, don't we? These text markers mark the start and the end of the citation string. Don't they? - If both assumptions are true, one can iterate through all text marks and work on their content.

Reference markers don't give us the location of text in the document. They are just used to annotate text in an invisible way. What we are looking forward to are "anchors" and "page info" information. Implementing them can be a starting point.

All of them have in common that numbers are used for the citation number and other strings for some citation sugar.
Thus, searching for the first match of \d+ matches the first number, the next search the next number etc.

This is assuming we have used numbers in our document only in citation text and nowhere else. Search will not work on the basis of citation text. If we search on the basis of reference markers, we will be able to update the number in the reference mark but not the text, as they don't give us the location of text.

I will not attempt this as of now, maybe after merging PR-D (preferably if any two of us work together on this). @Siedlerchr can attest how difficult it is to even trace and manipulate a newline in a "marked" (even anchored) area of the document. Whenever cursors come into play, they take a lot of experimentation which can be done when we have time.

For future reference - me or anybody who wants to try this
We will have to (A)

anchor the reference mark insertion point
re-distribute numbers if a citation is removed from document (has to be done when update is pressed as first step)
get the anchor locations
use \d+ regex to match, early stop at first match.
change the citation number in the text.
update the reference mark

OR (B)

anchor the citations
Handle redistribution of numbers on deletion of a citation
search using regex and https://wiki.documentfoundation.org/Documentation/SDKGuide/Text_Search_and_Replace
if the match is anchored, manipulate the citation text
update the reference mark

To the reader - more things to ponder on: How to update an existing reference marker text (A.6 or B.5)? We can update text, we can read existing reference marks, but how to change them, as they are not a part of the primary text. We need to find a way to remove the old reference mark and insert a new one.

More reference on anchors: https://stackoverflow.com/questions/69500141/how-do-i-iterate-over-an-entire-document-in-openoffice-libreoffice-with-uno
https://devdocs.jabref.org/code-howtos/openoffice/order-of-appearance.html
https://api.libreoffice.org/docs/idl/ref/interfacecom_1_1sun_1_1star_1_1text_1_1XTextContent.html#ae82a8b42f6b2578549b68b4483a877d3

ThiloteE · 2024-08-29T21:14:21Z

Idea:

If we only have the CID available, but no wrapper around the citation, then maybe the following could work:

We cite the first time: Nothing shall happen:
Any other make/sync bibliography, cite or cite-in-text (in random sequence in the document) command should trigger the following:

Create directly behind old citations (v1) a new citation (v2) with correct order of numbering. (Exception: if a citation is deleted, do not create a new citation.)
Copy the reference mark of all old citations (v1) to and attach to new citations (v2).
Delete the old citations (v1) and delete the reference marks of old citations (v1) too. We know that a citation is a citation, because a reference mark is attached to it. We also know that every second citation is the new version of the citation. By rule of elimination, we can infer that the other citations must be of the old version and are those we can delete.

Now, what we should have are numerically correctly ordered citations, but wrong (old) reference marks.

Delete all reference marks.
Re-create all reference marks.

Now, what we should have are correctly orderd citations and new correctly ordered reference marks.

Here the German translation:

Eine Idee:

Wenn wir nur die CID zur Verfügung haben, aber keine Anführungs und Endzeichen, dann könnte vielleicht folgendes funktionieren:

Wir zitieren das erste Mal: Es soll nichts passieren:
Jedes weitere erstellen / synchronisieren eines Literaturverzeichnises, zitieren oder zitieren im Text (in beliebiger Reihenfolge im Dokument) sollte folgendes auslösen:

Direkt hinter den alten Zitaten (v1) werden neue Zitate (v2) mit der korrekten Reihenfolge der Nummerierung erstellt. (Ausnahme: Wenn ein Zitat gelöscht wird, wird kein neues Zitat erstellt).
Es werden die Referenzmarkierungen aller alten Zitate (v1) in die neuen Zitate (v2) kopiert, bzw. angeheftet.
Es werden die alten Zitate (v1) und auch die Referenzmarken der alten Zitate (v1) gelöscht. Wir wissen, dass ein Zitat ein Zitat ist, weil es mit einer Referenzmarke versehen ist. Wir wissen auch, dass jedes zweite Zitat die neue Version des Zitats ist. Nach dem Ausschlussprinzip können wir davon ausgehen, dass die anderen Zitate der alten Version angehören müssen und wir sie deshalb löschen können.

Was wir nun haben sollten, sind numerisch korrekt geordnete Zitate, aber falsche (alte) Referenzmarken.

Es werden alle Verweiszeichen gelöscht.
Alle Verweiszeichen werden neu angelegt.

Jetzt sollten die Zitate und neuen Verweiszeichen sowohl vorhanden, als auch richtig geordnet sein.

subhramit added insane and removed insane labels Aug 7, 2024

ThiloteE changed the title ~~Refresh bibliography should remove entries~~ Make/Sync bibliography should remove entries Aug 19, 2024

ThiloteE mentioned this issue Aug 19, 2024

Make Settings > Sync bibliography work with CSL #26

Closed

2 tasks

subhramit mentioned this issue Aug 30, 2024

[WIP] CSL citations: Fix sync numbering JabRef/jabref#11688

Closed

6 tasks

subhramit mentioned this issue Sep 6, 2024

CSL4LibreOffice - E [Add re-distribution of numeric CSL citations] JabRef/jabref#11712

Merged

7 tasks

Siedlerchr closed this as completed in JabRef/jabref#11712 Sep 7, 2024

subhramit mentioned this issue Dec 18, 2024

Fix generation of multiple bibliography sections JabRef/jabref#12309

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make/Sync bibliography should remove entries #22

Make/Sync bibliography should remove entries #22

koppor commented Aug 7, 2024

koppor commented Aug 7, 2024

subhramit commented Aug 7, 2024 •

edited

Loading

koppor commented Aug 9, 2024

subhramit commented Aug 9, 2024 •

edited

Loading

subhramit commented Aug 9, 2024 •

edited

Loading

subhramit commented Aug 9, 2024

koppor commented Aug 9, 2024

koppor commented Aug 9, 2024

subhramit commented Aug 9, 2024 •

edited

Loading

ThiloteE commented Aug 29, 2024 •

edited

Loading

Make/Sync bibliography should remove entries #22

Make/Sync bibliography should remove entries #22

Comments

koppor commented Aug 7, 2024

koppor commented Aug 7, 2024

subhramit commented Aug 7, 2024 • edited Loading

koppor commented Aug 9, 2024

subhramit commented Aug 9, 2024 • edited Loading

subhramit commented Aug 9, 2024 • edited Loading

subhramit commented Aug 9, 2024

koppor commented Aug 9, 2024

koppor commented Aug 9, 2024

subhramit commented Aug 9, 2024 • edited Loading

ThiloteE commented Aug 29, 2024 • edited Loading

subhramit commented Aug 7, 2024 •

edited

Loading

subhramit commented Aug 9, 2024 •

edited

Loading

subhramit commented Aug 9, 2024 •

edited

Loading

subhramit commented Aug 9, 2024 •

edited

Loading

ThiloteE commented Aug 29, 2024 •

edited

Loading