Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages facet #19

Open
jan-hudec opened this issue Jun 2, 2016 · 17 comments
Open

Messages facet #19

jan-hudec opened this issue Jun 2, 2016 · 17 comments

Comments

@jan-hudec
Copy link
Collaborator

jan-hudec commented Jun 2, 2016

Define generic interface and implement using gettext .mo catalogs embedding suitable maps or closures in the binary.

This task does not include implementation of template generation nor catalog compilation, just reading (actually, they may be ‘compiled’ for embedding in the binary; this won't include checking yet though).

@jan-hudec jan-hudec modified the milestone: New interface - 0.3 Jun 2, 2016
@jan-hudec jan-hudec modified the milestones: 0.4 – Translation, 0.3 Mar 9, 2017
@jan-hudec
Copy link
Collaborator Author

jan-hudec commented Mar 15, 2017

What needs to be done

  • The Messages facet interface.
  • _!, and N_! if needed, macro wrapper(s) for the interface.
  • Procedural macro to take a set of catalogs in (one of) [xliff], [po][gettext] or [mo][gettext] format and generate suitable data for the Domain of given crate.
  • Language tag matching function(s), possibly even in locale_config, to compare the requested language with each available translation.

Design

The design is quite still quite open. Since cargo does not have a mechanism for installing resources, the translations should be embedded in the binary. However unlike crowbook-intl we want to have option to load the translations dynamically (and we don't want to combine it with standard fmt), so we will be calling through a Messages facet rather than generating inline match statements.

The translations need to be specific to each crate, so each crate will need to get something like ‘domain’ in gettext. I was considering:

  1. Passing this to the calls of Messages.get and Messages.nget, but that would complicate the use.
  2. Including it in the Messages facet, but then it would require non-standard getter, because normally facets don't have parameters.
  3. Packing it with the string in the N_! (or _! if we end up doing fine with one; I think we will) macro.

And 3. sounds best. And we should be able to do without static and dynamic variants, so just _!. The macro will return an object, Message, that will contain:

  • The original string
  • Optional context
  • Domain data
    • Identifier (by which a .mo or other kind of catalog can be looked up at runtime)
    • Built-in translations table or function or something.

This can be completely static. _!("Hello, World!") will translate to something like

Message {
    domain:"cratename",
    context: "",
    text: "Hello, World!",
    translations: &[
        ("cs", "Ahoj, Světe!"),
    ],
}

(or perhaps with a function for the translations or pointer to crate-wide table of all translations or something)

There will be impl Localize for Message and that will through Messages.get just do lookup in the translations, and later search for additional .mo or other catalogs will be implemented.

A message with plural forms will have to be of a separate type, PluralMessage, that will have to be bound with a number before getting the actual translation. We may use just one macro if we come up with unambiguous syntax the matcher will be able to discern.

However, there is another problem. Facets currently only have the one LanguageTag, but the Messages facet needs to be able to fall-back for some domains and not others. I.e. if user has locale cs,de,en, the main crate has cs translation, but some dependency only has de, we should be able to use the cs translation for messages from the main crate, but de translation for messages from the dependency.

Open questions

  • Syntax of the main macro.
  • Catalog format.
  • Getting fallback locales in the Messages facet. We could:
    • Add an option for facets to be parametric (we should be able to fall back with message granularity, not domain granularity, so parametrising by domain does not cut it).
    • Add an option for facets to get the full list of relevant tags.
    • Add an option for a facet to ask for it's fallback.

@alexreg
Copy link

alexreg commented Mar 16, 2017

Haven't read this in full yet, but an interesting alternative (going back to embedding .mo files more directly) would be to make use of include_bytes! perhaps.

@alexreg
Copy link

alexreg commented Mar 16, 2017

Actually: is there any particular reason you want/need to embed this info anyway?

@jan-hudec
Copy link
Collaborator Author

 Actually: is there any particular reason you want/need to embed this info anyway?

Because cargo install does not have support for data files. And other ways of distribution are simpler too when it's just one binary.

Haven't read this in full yet, but an interesting alternative (going back to embedding .mo files more directly) would be to make use of include_bytes! perhaps.

  1. We theoretically could, but we'd still need the codegen to make them accessible—the main crate can register, but libraries need references injetcted in the _!() calls anyway.
  2. It would be less efficient, because the keys would be duplicated and the gettext crate builds a hash instead of using the data in place.
  3. Parsing the .po directly won't be that much work. The code from crowbook-intl seems to miss some escape sequences and corner cases, but proper parser shouldn't be that much harder.

@alexreg
Copy link

alexreg commented Mar 17, 2017

I don't really understand the second part of your post, since it misses out a lot of details for one unfamiliar with these libraries and concepts, but I'll trust you on this.

@jan-hudec
Copy link
Collaborator Author

Also, the .po is a source that translators edit, but .mo is a compiled object. So while it would save us the trouble of compiling it ourselves, it would make us depend on msgfmt(1) or pocompile(1) for the build.

@alexreg
Copy link

alexreg commented Mar 17, 2017

I think we're going to depend on a lot of the gettext infrastructure anyway though, aren't we?

@alexreg
Copy link

alexreg commented Mar 17, 2017

Since this is a long-term goal, might be worth persuading the Cargo devs to allow including resource files with packages...

@jan-hudec
Copy link
Collaborator Author

Well, gettext infrastructure will be initially needed for extraction (xgettext) and it will always be needed for managing the catalogues. But I don't want to depend on it for build.

And yes, it would be useful to persuade Cargo devs to add support for data files, but they seem to have huge backlog.

@alexreg
Copy link

alexreg commented Mar 18, 2017

Well, gettext infrastructure will be initially needed for extraction (xgettext) and it will always be needed for managing the catalogues. But I don't want to depend on it for build.

Why not simply use the usual gettext build process, though, and have it entirely separate from Cargo? It would be nice if Cargo had pre/post-build scripts, of course. This could be another request.

And yes, it would be useful to persuade Cargo devs to add support for data files, but they seem to have huge backlog.

I think it's still worth requesting anyway. With a bit of nudging, it may come sooner than you think!

@jan-hudec
Copy link
Collaborator Author

Long term, there is really no point in relying on gettext. Embedding the strings is simpler and more efficient, extraction should eventually be done with something that understands Rust syntax (the procedural macros themselves, likely) and the catalogs can be manipulated with whatever the author wants—translate-toolkit tends to be more flexible then gettext and I am sure there are other tools for xliff (which I want to support too in the end).

Short term we will definitely rely on it for the extraction and maintenance of the translations, but I don't think it makes the generation much easier, so I'd prefer doing the in-code embedding straight away.

@alexreg
Copy link

alexreg commented Mar 18, 2017

Okay, fair enough. What format does translate-toolkit use to store translations though? If that's some sort of resource file, then we should petition Cargo to allow resource files.

@jan-hudec
Copy link
Collaborator Author

Translate toolkit is a development tool only. It does not have any runtime component that would store translations anywhere.

@alexreg
Copy link

alexreg commented Mar 21, 2017

Okay. So the idea is that the file format Translate Toolkit uses would be exported by some tool (from rust-locale) to a rust-locale–specific format?

@jan-hudec
Copy link
Collaborator Author

jan-hudec commented Mar 21, 2017

The idea is that a proc_macro from (subcrate of) rust-locale will expand the original string to a table of translations. During build. No resources needed. The table of strings is usually tiny compared to the binary, so there is not much to be gained by having separate resource files.

@alexreg
Copy link

alexreg commented Mar 21, 2017

Where does Translate Toolkit come into it then? Surely there needs to be some interoperability of formats somewhere?

@jan-hudec
Copy link
Collaborator Author

jan-hudec commented Mar 22, 2017

There are two standard formats for the catalogs: Uniforum PO and XLIFF. We will read one or both, at compiletime, and generate appropriate substitutions for the _! macro. Later, we will also write one or both.

Between us writing the template and us reading the translations, somebody has to actually translate it and that involves some maintenance of the catalogs. That's what gettext (the msg* commands) or translate-toolkit are—and will always be, because there is no point in reimplementing that part in Rust—for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants