Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting JSON with Cyrillic characters into Bibtex #177

Closed
yaskevich opened this issue Aug 25, 2022 · 6 comments
Closed

Formatting JSON with Cyrillic characters into Bibtex #177

yaskevich opened this issue Aug 25, 2022 · 6 comments
Labels
category: mapping Parsers and formatters package: plugin-bibtex type: bug Something isn't working

Comments

@yaskevich
Copy link

The cases are tested via RunKit at the Citation.js main page. However, I write code for browser (Vue3).
Different browsers are used (Firefox and Chrome-based).

Actually, I didn't expect the difference between processing of English and Cyrillic texts, but that what I am faced with.

There is an example (Ukrainian):

const Cite = require('citation-js')
const bibjson =  {"id":"antonenko1997","ISBN":"966-7219-00-3","type":"book","title":"Як ми говоримо","author":[{"given":"Б.Д.","family":"Антоненко-Давидович"}],"issued":{"date-parts":[[1997]]},"edition":"4","publisher":"Українська книга","citation-key":"antonenko1997","publisher-place":"Київ"};
const bibtext  = (new Cite(bibjson)).format('bibtex');

Result:

@book{antonenko1997,
	address = {̈},
	author = {-, ..},
	edition = {4},
	year = {1997},
	publisher = {̈ },
	title = {  },
}

The same is got via the interface to Zenodo.
Thus, the example from the documentation:

let example = new Cite('10.5281/zenodo.1005176');
example.format('bibtex');

Result:

@article{Willighagen2018Larsgw,
	author = {Willighagen, Lars and Willighagen, Egon and Badger, The Gitter and {\v C}erm{\' a}k, Petr and Wienke, Johannes},
	year = {2018},
	month = {nov 2},
	publisher = {Zenodo},
	title = {Larsgw/{Citation}.{Js}: V0.4.0-10},
}

Let's replace the link with that to a Ukrainian article:

let example = new Cite('10.5281/zenodo.1048805');
example.format('bibtex');

Result:

@article{undefined2017XviXviii,
	author = {-, .},
	journal = {Zenodo},
	year = {2017},
	month = {nov 14},
	publisher = {Zenodo},
	title = {̈  {Xvi}--{Xviii} .  {\"  } : '-  {\"  }  },
}

Maybe, some additional options should be passed? I'd appreciate any suggestions.

@larsgw larsgw added type: bug Something isn't working category: mapping Parsers and formatters package: plugin-bibtex labels Aug 25, 2022
@larsgw
Copy link
Member

larsgw commented Aug 25, 2022

Thank you, that's a good point, I hadn't thought of that. I'm pretty sure you cannot easily include non-ASCII characters in normal BibTeX and I haven't implemented how to escape Cyrillic characters yet so it removes them to ensure the output is at least parseable, even if incorrect. I'll look into a solution.

@yaskevich
Copy link
Author

It seems that the problem is a bit wider.
As far as I understand, the general view is like this: vanilla Bibtex disallows everything but ASCII characters. Non-ASCII symbols are escaped in Latex manner.
But now is the age of Unicode, and many applications and services process all the data in Unicode and do not escape non-ASCII. E.g. Overleaf (cloud-based LaTeX editor) uses the UTF-8 encoding for all text files.
The same as Zenodo does.
For example, Bibtex of a random Polish publication on Zenodo looks like this:

@book{zarzeczny_grzegorz_2017_3745683,
  author       = {Zarzeczny, Grzegorz and Piekot, Tomasz},
  title        = {Przystępność tekstów urzędowych w internecie},
  publisher    = {{Oficyna Wydawnicza ATUT – Wrocławskie Wydawnictwo Oświatowe}},
  year         = 2017,
  address      = {Wrocław},
  month        = mar,
  doi          = {10.5281/zenodo.3745683},
  url          = {https://doi.org/10.5281/zenodo.3745683}
}

However, if I store Bibtex in JSON via Citation.js and then retrieve back as Bibtex, I get:

@book{zarzeczny_grzegorz_2017_3745683,
	address = {Wroc\l{}aw},
	author = {Zarzeczny, Grzegorz and Piekot, Tomasz},
	year = {2017},
	month = {3},
	publisher = {Oficyna Wydawnicza ATUT -- Wroc\l{}awskie Wydawnictwo O{\' s}wiatowe},
	title = {Przyst{\k e}pno{\' s}{\' c} tekst{\' o}w urz{\k e}dowych w internecie},
}

This escaping is bad for user experience and actually is not needed for today's environment.

It would be great to have an option to process Unicode text in Citation.js as is (like biber does, I presume). So it would solve the problem with Cyrillic and other non-ASCII texts as well.

The question is this - how deep this Latex-alike conversion is hidden inside Citation.js? Is it possible to "just drop it", when some flag is provided, or that would ruin all the logic of the tool?

@larsgw
Copy link
Member

larsgw commented Apr 11, 2023

I think BibLaTeX (with the biber backend) can indeed process Unicode text (though that may require a specific LaTeX engine as well), so that is a good idea for a flag. That wouldn't be much of a problem. Still, I feel like it should not silently drop characters if they cannot be escaped. Maybe a warning message is enough, I don't know.

@yaskevich
Copy link
Author

Maybe I've used wrong word, I didn't mean dropping the characters. I was asking whether it is possible to omit the non-ASCII escaping operation from the pipeline.

@larsgw
Copy link
Member

larsgw commented Apr 12, 2023

I understand what you meant, I just meant that in addition to adding that flag, I would ideally like to find a default behavior that does not silently drop Cyrillic characters, which is what happens now.

yaskevich added a commit to yaskevich/elemental that referenced this issue May 13, 2023
@larsgw larsgw closed this as completed in cd018f9 Jan 22, 2024
@larsgw
Copy link
Member

larsgw commented Jan 22, 2024

In the v0.7.8 release, there's an option to keep all unicode. This might become the default in the future.

const { plugins } = require('@citation-js/core')
const config = plugins.config.get('@bibtex')
config.format.asciiOnly = false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: mapping Parsers and formatters package: plugin-bibtex type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants