Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect OS locale when showing datetime strings #116

Closed
maninak opened this issue Apr 24, 2024 · 10 comments · Fixed by #131
Closed

Respect OS locale when showing datetime strings #116

maninak opened this issue Apr 24, 2024 · 10 comments · Fixed by #131
Assignees
Labels
enhancement New feature extending the app's current capabilities

Comments

@maninak
Copy link
Collaborator

maninak commented Apr 24, 2024

This applies mostly for the patch detail page, but as long as we pass the right param to the one datetime formatting function we call everywhere we'll be fine.

@maninak maninak added the enhancement New feature extending the app's current capabilities label Apr 24, 2024
@maninak maninak moved this to 📋 Up Next in Radicle VS Code Extension Apr 24, 2024
@lorenzleutgeb
Copy link
Contributor

This will be trickier than expected. If I remember correctly, then VSCode sadly doesn't really care about the environment to obtain a locale, but instead asks you to configure it (via some preference). And then it only has one locale for all different aspects of localization, i.e. you cannot use English messages with ISO 8601 compatible notation for times, dates and durations.

If you want to do the extra work to support glibc locales:

The gist is to use LC_TIME if available, and fall back to LC_ALL etc. otherwise. This allows for the greatest flexibility. An example from the GNU gettext docs:

For example, assume you are a Swedish user in Spain, and you want your programs to handle numbers and dates according to Spanish conventions, and only the messages should be in Swedish. Then you could create a locale named sv_ES or sv_ES.UTF-8 by use of the localedef program. But it is simpler, and achieves the same effect, to set the LANG variable to es_ES.UTF-8 and the LC_MESSAGES variable to sv_SE.UTF-8; these two locales come already preinstalled with the operating system.

In my shell:

$ env | grep LC_
LC_TIME=en_DK.UTF-8
LC_MONETARY=de_AT.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_NAME=de_AT.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_NUMERIC=fr_FR.UTF-8
LC_PAPER=de_AT.UTF-8

@maninak
Copy link
Collaborator Author

maninak commented Apr 26, 2024

I could try to read LC_TIME and if that's not successful fallback to using whatever VS Code provides.

@lorenzleutgeb could you maybe also share an example output of env | grep LC_ with LC_TIME not explicitly defined (I assume that could be a thing in some edge cases), please?

@lorenzleutgeb
Copy link
Contributor

lorenzleutgeb commented Apr 28, 2024

I could try to read LC_TIME and if that's not successful fallback to using whatever VS Code provides.

No. Please read the references I sent. They give a clear priority for these environment variables. LC_ALL takes priority over LC_TIME, takes priority over LANG.

You can verify this as follows:

$ unset LC_ALL LC_TIME LANG
$ LC_ALL=fr_FR.UTF-8 LC_TIME=de_DE.UTF-8 LANG=en_DK.UTF-8 date --date=2024-01-01
lun. 01 janv. 2024 00:00:00 CET
$                    LC_TIME=de_DE.UTF-8 LANG=en_DK.UTF-8 date --date=2024-01-01
Mo 1. Jan 00:00:00 CET 2024
$                                        LANG=en_DK.UTF-8 date --date=2024-01-01
2024-01-01T00:00:00 CET

Oh, and then there's locale (see man:locale(1)), which might be helpful for you (if you can execute it somehow):

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=en_DK.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_AT.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=de_AT.UTF-8
LC_NAME=de_AT.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ unset LC_TIME
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=fr_FR.UTF-8
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_AT.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=de_AT.UTF-8
LC_NAME=de_AT.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ LC_ALL=fr_FR.UTF-8 LC_TIME=de_DE.UTF-8 LANG=en_DK.UTF-8 locale date_fmt
%a %d %b %Y %T %Z
$                    LC_TIME=de_DE.UTF-8 LANG=en_DK.UTF-8 locale date_fmt
%a %-d. %b %H:%M:%S %Z %Y
$                                        LANG=en_DK.UTF-8 locale date_fmt
%Y-%m-%dT%T %Z

Note however that some placeholders (like %a are language dependent).
It might even be an option to just invoke date.

In case you're implementing this yourself, this means that you have to check for all these three environment variables and their corresponding priority. If neither of these three environment variables are defined, only then you should fall back to whatever is configured in VSCode. More examples at https://www.baeldung.com/linux/locale-environment-variables

@lorenzleutgeb could you maybe also share an example output of env | grep LC_ with LC_TIME not explicitly defined

Sure. Then the line is just missing, of course.

$ env | grep LC_
LC_TIME=en_DK.UTF-8
LC_MONETARY=de_AT.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_NAME=de_AT.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_NUMERIC=fr_FR.UTF-8
LC_PAPER=de_AT.UTF-8
$ unset LC_TIME
$ env | grep LC_
LC_MONETARY=de_AT.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_NAME=de_AT.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_NUMERIC=fr_FR.UTF-8
LC_PAPER=de_AT.UTF-8

(I assume that could be a thing in some edge cases), please?

If someone is fine with using one locale consistently for all categories (time, monetary amounts, paper size, ...) which is actually quite likely, than they'll be inclined to just set LC_ALL and nothing else. I'd argue that it is actually less common to see LC_* with different values (like on my machine).

@maninak
Copy link
Collaborator Author

maninak commented Apr 28, 2024

That's such helpful info, thank you! 🙌

Do you think you could hit me with a shell script that returns a UnicodeBCP47LocaleIdentifier? That would help me massively prioritize this issue.

Resources:

@lorenzleutgeb
Copy link
Contributor

lorenzleutgeb commented Apr 28, 2024

Do you think you could hit me with a shell script that returns a UnicodeBCP47LocaleIdentifier? That would help me massively prioritize this issue.

I think that's impossible, since that encoding does not allow to set a particular date/time format directly, see tc39/ecma402#554

Best solution I see is:

function format(date) {
  const glibcLocale = /* read environment to get glibc compatible locale */;
  const bcp47Locale = /* simple function that converts glibc locale to BCP 47 locale */;
  /* think about fallback so that you always have a valid bcp47Locale */
  return (new Intl.DateTimeFormat(bcp47Locale).format(date));
}

If that'd be good enough for you, I can fill in the blanks and send you that script.

@maninak
Copy link
Collaborator Author

maninak commented Apr 28, 2024

Sounds good. I don't need the full formatting function, just the shell script that will be run to source it from the OS (plus any filtering/clean-up) or just a link to any native nodejs API doing the same job, if available.

I'll handle the glue code beyond that. I'll add you with a Co-authored-by: to the commit for attribution. :)

For context, this is my formatting function and for completion this is the place you spotted in the UI and prompted you to report the issue in the first place (its called in more places of course).

@maninak
Copy link
Collaborator Author

maninak commented Apr 29, 2024

ok, good news, those specific POSIX locales seem to be accessible via nodejs global process. See below:

image

For some reason LC_TIME isn't defined there. Investigating further...
Worst case we'll source it via a shell command, but it gets trickier to support that for Linux, MacOS and Windows :/.

@maninak maninak moved this from 📋 Up Next to 🏗 In Progress in Radicle VS Code Extension Apr 29, 2024
@lorenzleutgeb
Copy link
Contributor

lorenzleutgeb commented Apr 29, 2024

Yeah, of course they are in there. I tacitly implied that all along. However it seems that you are comparing process.env with the output of locale, which is conceptually wrong, since locale will "fill in" LC_TIME by defaulting in case it is not defined. You better compare process.env with the output of env before you start hunting ghosts.

I also continued experimentation yesterday, and am not so positive about the result. A conversion from libc locale to BCP47 and using Intl may not be satisfactory...

@lorenzleutgeb
Copy link
Contributor

lorenzleutgeb commented Apr 29, 2024

Very, very rough:

const util = require('node:util');
const exec = util.promisify(require('node:child_process').exec);
/*
full long medium short

d_t_fmt

date      time
long/full long/full date_fmt
medium    medium    d_t_fmt
medium    none      d_fmt
none      medium    t_fmt
*/

async function localeFormat() {
        const { stdout } = await exec("locale d_t_fmt");
        return stdout;
}

function getLibcTime() {
        return process.env["LC_ALL"] || process.env["LC_TIME"] || process.env["LANG"];
}

/**
 * glibc locales follow the format
 *
 *     language[_territory[.codeset]][@modifier]
 *
 * For primitive conversion to BCP47 we discard `codeset` and `modifier`
 * (including their respective prefixes `.` and `@`) and replace
 * `_` by `-`. This is wrong, but works fairly well.
 *
 * See <https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html>
 *
 * The proper (but complicated) way of doing this right is described in
 * <https://wikinew.openoffice.org/wiki/LocaleMapping>
 */
function libcToBCP47(libc) {
        return libc.replace(/(\.|@).*/, "").replace("_", "-");
}

function format(date, locale) {
        return date.toLocaleDateString(locale, {
            weekday: 'short',
            month: 'short',
            year: 'numeric',
            day: 'numeric',
            hour: 'numeric',
            minute: 'numeric',
            timeZoneName: 'shortGeneric',
          });
        //return (new Intl.DateTimeFormat(locale, { dateStyle: 'short' }).format(date));
}

let date = new Date(2024, 1, 1);

console.log(format(date, libcToBCP47("en_DK.UTF-8")));

//console.log(date);
//console.log(libcToBCP47("en_US.UTF-8@lel"));
//console.log(libcToBCP47("en_DK.UTF-8@lel"));
//console.log(format(new Date(), libcToBCP47(getLibcTime())));
//console.log(format(new Date(), "dk"));
//console.log(format(new Date(), "en-DK"));
//console.log(format(new Date(), "de-AT"));
$ node date.js
Thu, 1 Feb 2024, 00.00 CET
$ LC_TIME=en_DK.UTF-8 date --date=2024-01-01
2024-01-01T00:00:00 CET

@maninak
Copy link
Collaborator Author

maninak commented Apr 30, 2024

Results:

🚀 Enhancements

  • patch-detail: show all datestime strings in localized format according to the user's specified preference along the stack starting from the OS configuration (e.g. LC_TIME) until that of VS Code, in that oder of preference (e.g. with "de-AT" "Fr., 26. Apr. 2024, 10:00 MEZ" vs "Fri, Apr 26, 2024, 10:00 AM Serbia Time")
  • patch-detail: show zero-timezone-offset ISO datetimes instead of using Zulu notation which is easier for human brains to parse (e.g. "2024-04-30T15:40:17.661+00:00" vs "2024-04-30T15:40:17.661Z")

@maninak maninak self-assigned this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature extending the app's current capabilities
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants