Messages facet #19

jan-hudec · 2016-06-02T19:07:58Z

Define generic interface and implement using ~~gettext .mo catalogs~~ embedding suitable maps or closures in the binary.

This task does not include implementation of template generation ~~nor catalog compilation~~, just reading (actually, they may be ‘compiled’ for embedding in the binary; this won't include checking yet though).

The text was updated successfully, but these errors were encountered:

jan-hudec · 2017-03-15T22:21:44Z

What needs to be done

The Messages facet interface.
_!, and N_! if needed, macro wrapper(s) for the interface.
Procedural macro to take a set of catalogs in (one of) [xliff], [po][gettext] or [mo][gettext] format and generate suitable data for the Domain of given crate.
Language tag matching function(s), possibly even in locale_config, to compare the requested language with each available translation.

Design

The design is quite still quite open. Since cargo does not have a mechanism for installing resources, the translations should be embedded in the binary. However unlike crowbook-intl we want to have option to load the translations dynamically (and we don't want to combine it with standard fmt), so we will be calling through a Messages facet rather than generating inline match statements.

The translations need to be specific to each crate, so each crate will need to get something like ‘domain’ in gettext. I was considering:

Passing this to the calls of Messages.get and Messages.nget, but that would complicate the use.
Including it in the Messages facet, but then it would require non-standard getter, because normally facets don't have parameters.
Packing it with the string in the N_! (or _! if we end up doing fine with one; I think we will) macro.

And 3. sounds best. And we should be able to do without static and dynamic variants, so just _!. The macro will return an object, Message, that will contain:

The original string
Optional context
Domain data
- Identifier (by which a .mo or other kind of catalog can be looked up at runtime)
- Built-in translations table or function or something.

This can be completely static. _!("Hello, World!") will translate to something like

Message {
    domain:"cratename",
    context: "",
    text: "Hello, World!",
    translations: &[
        ("cs", "Ahoj, Světe!"),
    ],
}

(or perhaps with a function for the translations or pointer to crate-wide table of all translations or something)

There will be impl Localize for Message and that will through Messages.get just do lookup in the translations, and later search for additional .mo or other catalogs will be implemented.

A message with plural forms will have to be of a separate type, PluralMessage, that will have to be bound with a number before getting the actual translation. We may use just one macro if we come up with unambiguous syntax the matcher will be able to discern.

However, there is another problem. Facets currently only have the one LanguageTag, but the Messages facet needs to be able to fall-back for some domains and not others. I.e. if user has locale cs,de,en, the main crate has cs translation, but some dependency only has de, we should be able to use the cs translation for messages from the main crate, but de translation for messages from the dependency.

Open questions

Syntax of the main macro.
Catalog format.
Getting fallback locales in the Messages facet. We could:
- ~~Add an option for facets to be parametric~~ (we should be able to fall back with message granularity, not domain granularity, so parametrising by domain does not cut it).
- Add an option for facets to get the full list of relevant tags.
- Add an option for a facet to ask for it's fallback.

alexreg · 2017-03-16T01:52:25Z

Haven't read this in full yet, but an interesting alternative (going back to embedding .mo files more directly) would be to make use of include_bytes! perhaps.

alexreg · 2017-03-16T01:59:49Z

Actually: is there any particular reason you want/need to embed this info anyway?

jan-hudec · 2017-03-16T18:00:36Z

Actually: is there any particular reason you want/need to embed this info anyway?

Because cargo install does not have support for data files. And other ways of distribution are simpler too when it's just one binary.

Haven't read this in full yet, but an interesting alternative (going back to embedding .mo files more directly) would be to make use of include_bytes! perhaps.

We theoretically could, but we'd still need the codegen to make them accessible—the main crate can register, but libraries need references injetcted in the _!() calls anyway.
It would be less efficient, because the keys would be duplicated and the gettext crate builds a hash instead of using the data in place.
Parsing the .po directly won't be that much work. The code from crowbook-intl seems to miss some escape sequences and corner cases, but proper parser shouldn't be that much harder.

alexreg · 2017-03-17T00:49:47Z

I don't really understand the second part of your post, since it misses out a lot of details for one unfamiliar with these libraries and concepts, but I'll trust you on this.

jan-hudec · 2017-03-17T21:04:21Z

Also, the .po is a source that translators edit, but .mo is a compiled object. So while it would save us the trouble of compiling it ourselves, it would make us depend on msgfmt(1) or pocompile(1) for the build.

alexreg · 2017-03-17T22:12:21Z

I think we're going to depend on a lot of the gettext infrastructure anyway though, aren't we?

alexreg · 2017-03-17T22:12:54Z

Since this is a long-term goal, might be worth persuading the Cargo devs to allow including resource files with packages...

jan-hudec · 2017-03-17T23:26:32Z

Well, gettext infrastructure will be initially needed for extraction (xgettext) and it will always be needed for managing the catalogues. But I don't want to depend on it for build.

And yes, it would be useful to persuade Cargo devs to add support for data files, but they seem to have huge backlog.

alexreg · 2017-03-18T00:52:20Z

Well, gettext infrastructure will be initially needed for extraction (xgettext) and it will always be needed for managing the catalogues. But I don't want to depend on it for build.

Why not simply use the usual gettext build process, though, and have it entirely separate from Cargo? It would be nice if Cargo had pre/post-build scripts, of course. This could be another request.

And yes, it would be useful to persuade Cargo devs to add support for data files, but they seem to have huge backlog.

I think it's still worth requesting anyway. With a bit of nudging, it may come sooner than you think!

jan-hudec · 2017-03-18T09:50:36Z

Long term, there is really no point in relying on gettext. Embedding the strings is simpler and more efficient, extraction should eventually be done with something that understands Rust syntax (the procedural macros themselves, likely) and the catalogs can be manipulated with whatever the author wants—translate-toolkit tends to be more flexible then gettext and I am sure there are other tools for xliff (which I want to support too in the end).

Short term we will definitely rely on it for the extraction and maintenance of the translations, but I don't think it makes the generation much easier, so I'd prefer doing the in-code embedding straight away.

alexreg · 2017-03-18T16:44:53Z

Okay, fair enough. What format does translate-toolkit use to store translations though? If that's some sort of resource file, then we should petition Cargo to allow resource files.

jan-hudec · 2017-03-19T13:47:31Z

Translate toolkit is a development tool only. It does not have any runtime component that would store translations anywhere.

alexreg · 2017-03-21T00:32:16Z

Okay. So the idea is that the file format Translate Toolkit uses would be exported by some tool (from rust-locale) to a rust-locale–specific format?

jan-hudec · 2017-03-21T06:05:16Z

The idea is that a proc_macro from (subcrate of) rust-locale will expand the original string to a table of translations. During build. No resources needed. The table of strings is usually tiny compared to the binary, so there is not much to be gained by having separate resource files.

alexreg · 2017-03-21T22:07:17Z

Where does Translate Toolkit come into it then? Surely there needs to be some interoperability of formats somewhere?

jan-hudec · 2017-03-22T21:26:08Z

There are two standard formats for the catalogs: Uniforum PO and XLIFF. We will read one or both, at compiletime, and generate appropriate substitutions for the _! macro. Later, we will also write one or both.

Between us writing the template and us reading the translations, somebody has to actually translate it and that involves some maintenance of the catalogs. That's what gettext (the msg* commands) or translate-toolkit are—and will always be, because there is no point in reimplementing that part in Rust—for.

jan-hudec added the enhancement label Jun 2, 2016

jan-hudec modified the milestone: New interface - 0.3 Jun 2, 2016

jan-hudec modified the milestones: 0.4 – Translation, 0.3 Mar 9, 2017

jan-hudec mentioned this issue Mar 10, 2017

Status of issues #22

Open

jan-hudec mentioned this issue Mar 19, 2017

Tracking issue for RFC 1566: Procedural macros rust-lang/rust#38356

Closed

31 tasks

alexreg mentioned this issue Mar 21, 2017

Support for installing data files? rust-lang/cargo#3851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Messages facet #19

Messages facet #19

jan-hudec commented Jun 2, 2016 •

edited

Loading

jan-hudec commented Mar 15, 2017 •

edited

Loading

alexreg commented Mar 16, 2017

alexreg commented Mar 16, 2017

jan-hudec commented Mar 16, 2017

alexreg commented Mar 17, 2017

jan-hudec commented Mar 17, 2017

alexreg commented Mar 17, 2017

alexreg commented Mar 17, 2017

jan-hudec commented Mar 17, 2017

alexreg commented Mar 18, 2017

jan-hudec commented Mar 18, 2017

alexreg commented Mar 18, 2017

jan-hudec commented Mar 19, 2017

alexreg commented Mar 21, 2017

jan-hudec commented Mar 21, 2017 •

edited

Loading

alexreg commented Mar 21, 2017

jan-hudec commented Mar 22, 2017 •

edited

Loading

Messages facet #19

Messages facet #19

Comments

jan-hudec commented Jun 2, 2016 • edited Loading

jan-hudec commented Mar 15, 2017 • edited Loading

What needs to be done

Design

Open questions

alexreg commented Mar 16, 2017

alexreg commented Mar 16, 2017

jan-hudec commented Mar 16, 2017

alexreg commented Mar 17, 2017

jan-hudec commented Mar 17, 2017

alexreg commented Mar 17, 2017

alexreg commented Mar 17, 2017

jan-hudec commented Mar 17, 2017

alexreg commented Mar 18, 2017

jan-hudec commented Mar 18, 2017

alexreg commented Mar 18, 2017

jan-hudec commented Mar 19, 2017

alexreg commented Mar 21, 2017

jan-hudec commented Mar 21, 2017 • edited Loading

alexreg commented Mar 21, 2017

jan-hudec commented Mar 22, 2017 • edited Loading

jan-hudec commented Jun 2, 2016 •

edited

Loading

jan-hudec commented Mar 15, 2017 •

edited

Loading

jan-hudec commented Mar 21, 2017 •

edited

Loading

jan-hudec commented Mar 22, 2017 •

edited

Loading