Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOIs prepended with https://doi.org/ without percent-encoding, causing broken URLs #249

Open
tupelo-schneck opened this issue Jan 23, 2025 · 2 comments

Comments

@tupelo-schneck
Copy link

Working on https://citation.doi.org we found that citeproc-js will not percent-encode DOIs that are manifested as https://doi.org links. Most DOIs are well-behaved and don't require percent-encoding, but there are some that do. A particularly tricky example is

10.1002/(sici)1099-050x(199823/24)37:3/4<197::aid-hrm2>3.0.co;2-#

You can paste that into https://citation.doi.org (or just use https://citation.doi.org/format?doi=10.1002%2F%28sici%291099-050x%28199823%2F24%2937%3A3%2F4%3C197%3A%3Aaid-hrm2%3E3.0.co%3B2-%23&style=apa&lang=en-US ) and you'll get

Argenti, P. A. (1998). Introduction to the special issue on employee communications. In Human Resource Management (Vol. 37, Issues 3–4, pp. 197–197). Wiley.
https://doi.org/10.1002/(sici)1099-050x(199823/24)37:3/4<197::aid-hrm2>3.0.co;2-#

where that https://doi.org URL is broken, most obviously because of the # character. The correct link would be

https://doi.org/10.1002/(sici)1099-050x(199823/24)37:3/4%3C197::aid-hrm2%3E3.0.co;2-%23
@tupelo-schneck
Copy link
Author

Here's some JS code that demonstrates this issue:

import CSL from "citeproc";

const cslFetchResponse = await fetch("https://raw.githubusercontent.com/citation-style-language/styles/refs/heads/master/apa.csl");
const csl = await cslFetchResponse.text();

const localeFetchResponse = await fetch("https://raw.githubusercontent.com/citation-style-language/locales/refs/heads/master/locales-en-US.xml");
const locale = await localeFetchResponse.text();

const item = {
    "id": "item",
    "publisher": "Wiley",
    "issue": "3-4",
    "DOI": "10.1002/(sici)1099-050x(199823/24)37:3/4<197::aid-hrm2>3.0.co;2-#",
    "page": "197-197",
    "title": "Introduction to the special issue on employee communications",
    "volume": "37",
    "author": [
        {
            "given": "Paul A.",
            "family": "Argenti"
        }
    ],
    "container-title": "Human Resource Management",
    "issued": {
        "date-parts": [
            [
                1998,
                9
            ]
        ]
    }
};

const sys = {
    retrieveItem: id => item,
    retrieveLocale: id => locale
};

const citeproc = new CSL.Engine(sys, csl);
citeproc.updateItems([ "item" ]);
citeproc.setOutputFormat("text");
const bib = citeproc.makeBibliography();
console.log(bib[1][0]);

tupelo-schneck added a commit to tupelo-schneck/citeproc-js that referenced this issue Jan 23, 2025
@tupelo-schneck
Copy link
Author

Since DOIs always contain a slash, it's unpleasant to maximally percent-encode them using encodeURIComponent. Code to minimally encode a DOI for appending to https://doi.org is

doi.replace(/[\u0000-\u0020"#%<>?[\\\]^`{|}\u007F-\u009F]/g, encodeURIComponent)

I've tried to make a pull request #250 that does this in the necessary places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant