Skip to content

DITA-OT plug-in to create, auto-translate and re-merge XLIFF files, generating translated documentation in a targeted foreign language.

License

Notifications You must be signed in to change notification settings

jason-fox/fox.jason.translate.xliff

Repository files navigation

DITA-OT Translate Plug-in

license DITA-OT 4.2 CI Coverage Status Quality Gate Status

DITA-OT Translate Plug-in is a DITA-OT Plug-in to create, auto-translate and re-merge XLIFF files, generating translated documentation in a targeted foreign language. It can create and consume files using either XLIFF 1.2 or XLIFF 2.1 format.

This plug-in consists of three DITA-OT transforms

  • The xliff-create transform creates XLIFF and skeleton files from the *.dita files.
  • The xliff-translate transform populates the <target> texts using an automatic translation service.
  • The xliff-dita transform recreates the DITA project using the translated texts.

▶️ Video from DITA-OT Day 2019

Table of Contents

Install

The DITA-OT Translate Plug-in has been tested against DITA-OT 4.x. It is recommended that you upgrade to the latest version.

Installing DITA-OT

The DITA-OT Translate Plug-in is a plug-in for the DITA Open Toolkit.

  • Full installation instructions for downloading DITA-OT can be found here.

    1. Download the dita-ot-4.2.zip package from the project website at dita-ot.org/download
    2. Extract the contents of the package to the directory where you want to install DITA-OT.
    3. Optional: Add the absolute path for the bin directory to the PATH system variable.

    This defines the necessary environment variable to run the dita command from the command line.

curl -LO https://github.com/dita-ot/dita-ot/releases/download/4.2/dita-ot-4.2.zip
unzip -q dita-ot-4.2.zip
rm dita-ot-4.2.zip

Installing the Plug-in

  • Run the plug-in installation commands:
dita install https://github.com/doctales/org.doctales.xmltask/archive/master.zip

The dita command line tool requires no additional configuration.


Signing up for an Automatic Translation Service

Several publically available automatic translation cloud services are available for use, they typically offer a try-before-you-buy option and generally offer sample access to the service for without cost. Upgrading to a paid version will be necessary when transforming larger documents.

IBM Cloud Services

The IBM Language Translator allows you to translate text programmatically from one language into another language

Introduction: Getting Started

Create an instance of the service:

  1. Go to the Language Translator External link icon page in the IBM Cloud Catalog.
  2. Sign up for a free IBM Cloud account or log in.
  3. Click Create.

Copy the credentials to authenticate to your service instance:

  1. From the IBM Cloud dashboard External link icon, click on your Language Translator service instance to go to the Language Translator service dashboard page.
  2. On the Manage page, click Show to view your credentials.
  3. Copy the API Key and URL values.
  4. Within the plug-in alter the file cfg/configuration.properties to hold your API Key and URL.

By default the Frankfurt translation service URL used is: https://gateway-fra.watsonplatform.net/language-translator/api/v3/translate, amend this when using a regional instance.


Microsoft Azure

Microsoft Translator provides multi-language support for translation, transliteration, language detection, and dictionaries.

Introduction: Overview

Create an instance of the service:

  1. Go to Try Cognitive Services
  2. Select the Translator Text APIs tab.
  3. Under Translator Text, select the Get API Key button.
  4. Agree to the terms and select your locale from the drop-down menu.
  5. Sign in by using your Microsoft, Facebook, LinkedIn, or GitHub account.

You can sign up for a free Microsoft account at the Microsoft account portal. To get started, click Sign in with Microsoft and then, when asked to sign in, click Create one. Follow the steps to create and verify your new Microsoft account.

After you sign in to Try Cognitive Services, your free trial begins. The displayed webpage lists all the Azure Cognitive Services services for which you currently have trial subscriptions. Two subscription keys are listed beside Speech Services. You can use either key in your applications.

Copy the credentials to authenticate to your service instance:

  1. Copy each of the API Key and Endpoint values.
  2. Within the plug-in alter the file cfg/configuration.properties to hold your API Key and URL.

By default the global translation service URL used is: https://api.cognitive.microsofttranslator.com/translate, amend this when using a regional instance.


Yandex Translate

The API provides access to the Yandex online machine translation service. It supports more than 90 languages and can translate separate words or complete texts.

Introduction: Overview

To sign-up to the service:

  1. Review the user agreement and rules for formatting translation results.
  2. Get a free API key.
  3. Read the documentation, where you will find instructions on enabling the API and detailed descriptions of its features.

After you sign in to your account select API Keys and create a new key as necessary. The latest endpoint can be found in the documentation

https://translate.yandex.net/api/v1.5/tr/translate

Copy the credentials to authenticate to your service instance:

  1. Copy each of the API Key and Endpoint values.
  2. Within the plug-in alter the file cfg/configuration.properties to hold your API Key and URL.

DeepL API

The DeepL API is accessible with a DeepL Pro subscription (DeepL API plan) only. The API is an interface that allows other computer programs to send texts to the DeepL servers and receive high-quality translations.

Introduction: Overview

To sign-up to the service:

  1. Open a DeepL API developers account. Note that not all accounts offer access to the DeepL API. It is essential that the account type includes REST API access.
  2. Fill out the application details and add a credit card. No payments are required for the first 30 days. You can cancel the card and still maintain free access for the trial period.
  3. Read the documentation, where you will find instructions on enabling the API and detailed descriptions of its features.

After you sign in to your account select API Keys and create a new key as necessary. The latest endpoint can be found in the documentation

https://api.deepl.com/v2/translate

Copy the credentials to authenticate to your service instance:

  1. Copy each of the API Key and Endpoint values.
  2. Within the plug-in alter the file cfg/configuration.properties to hold your API Key and URL.

Usage

XLIFF 1.2 Invocation from the command line

  1. to create an XLIFF 1.2 File and associated skeletons with run:
PATH-TO-DITA-OT/bin/dita -f xliff-create -i document.ditamap  -o out  --xliff.version=1

Result

A translate.xlf file will appear in the out directory along with a series of skeleton files.

<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <file datatype="xml" original="/document.ditamap" source-language="en" target-language="es">
    <header xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:dita="http://www.dita-ot.org">
      <skl>
        <external-file href="./skl/document.ditamap.skl" />
      </skl>
    </header>
    <body xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:dita="http://www.dita-ot.org">
        <trans-unit xmlns="" xmlns:dita="dita-ot.org" approved="no" id="42094" xml:space="preserve">
          <source xml:lang="en">
            Loves or pursues or desires to obtain pain of itself, because it
            is pain, but occasionally circumstances occur in which toil and
            pain can procure him some great pleasure. To take a trivial
            example,  <x ctype="x-dita-b" id="d3e14">which of us ever undertakes
            laborious physical exercise,</x> except to obtain some advantage from it?
            But who has any right to find fault with a man who chooses to enjoy a pleasure
            that has no annoying consequences, or one who avoids a pain that produces no
            resultant pleasure?
          </source>
          <target xml:lang="la"/>
        </trans-unit>
        ... etc
      </body>
   </file>
...etc

Note: if the translate.cachefile parameter is used, unchanged text with previously approved translations will be copied over to the <target> elements.

  1. to populate an exisiting XLIFF 1.2 File with auto-translated text
PATH-TO-DITA-OT/bin/dita -f xliff-translate \
    -i translate.xlf --translate.service=[bing|deepl|watson|yandex] \
    --translate.apikey=<api-key>
    --xliff.version=1

Result

The XLIFF 1.2 File is auto-translated in place, with translated text as shown:

Note: only <trans-unit> elements which are approved="no" will be auto-translated.

<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <file datatype="xml" original="/document.ditamap" source-language="en" target-language="es">
    <header xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:dita="http://www.dita-ot.org">
      <skl>
        <external-file href="./skl/document.ditamap.skl" />
      </skl>
    </header>
    <body xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:dita="http://www.dita-ot.org">
        <trans-unit xmlns="" xmlns:dita="dita-ot.org" approved="no" id="42094" xml:space="preserve">
          <source xml:lang="en">
            Loves or pursues or desires to obtain pain of itself, because it
            is pain, but occasionally circumstances occur in which toil and
            pain can procure him some great pleasure. To take a trivial
            example, <x ctype="x-dita-b" id="d3e14">which of us ever undertakes
            laborious physical exercise,</x> except to obtain some advantage from it?
            But who has any right to find fault with a man who chooses to enjoy a pleasure
            that has no annoying consequences, or one who avoids a pain that produces no
            resultant pleasure?
          </source>
          <target xml:lang="la">
            Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
            eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
            enim ad minim veniam, <x ctype="x-dita-b" id="d3e14">quis nostrud exercitation
            ullamco laboris,</x> nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
            in reprehenderit in voluptate velit esse cillum dolore eu fugiat
            nulla pariatur. Excepteur sint occaecat cupidatat non proident,
            sunt in culpa qui officia deserunt mollit anim id est laborum.
          </target>
        </trans-unit>
        ...etc
      </body>
   </file>
...etc

XLIFF 2.1 Invocation from the command line

  1. to create an XLIFF 2.1 File and associated skeletons with run:
PATH-TO-DITA-OT/bin/dita -f xliff-create -i document.ditamap  -o out  --xliff.version=2

Result

A translate.xlf file will appear in the out directory along with a series of skeleton files.

<?xml version="1.0" encoding="UTF-8"?>
<xliff srcLang="en" trgLang="la" version="2.0" xmlns="urn:oasis:names:tc:xliff:document:2.0">
  <file id="2" original="/topic.dita">
    <skeleton href="./skl/topic.dita.skl"></skeleton>
    <unit fs:fs="p" id="9962" xmlns:fs="urn:oasis:names:tc:xliff:fs:2.0">
      <originalData>
        <data id="sd4e14">&lt;b&gt;</data>
        <data id="ed4e14">&lt;/b&gt;</data>
      </originalData>
      <segment state="initial">
        <source xml:lang="en" xml:space="preserve">Loves or pursues or desires to obtain pain of
            itself, because it is pain, but occasionally circumstances occur in which toil and pain
            can procure him some  great pleasure. To take a trivial example, <pc dataRefEnd="ed4e14"
            dataRefStart="sd4e14" fs:fs="b" id="d4e14">which of us ever undertakes laborious physical
            exercise,</pc>except to obtain some advantage from it? But who has any right to find fault
            with a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids
            a pain that produces no resultant pleasure?
          </source>
          <target xml:lang="la"></target>
      </segment>
    </unit>
    ...etc
  </file>
  ...etc
  1. to populate an exisiting XLIFF 2.1 File with auto-translated text
PATH-TO-DITA-OT/bin/dita -f xliff-translate \
    -i translate.xlf --translate.service=[bing|deepl|watson|yandex] \
    --translate.apikey=<api-key>
    --xliff.version=2

Result

The XLIFF 2.1 File is auto-translated in place, with translated text as shown:

Note: any <segement> elements which are state="final" will not be re-translated.

<?xml version="1.0" encoding="UTF-8"?>
<xliff srcLang="en" trgLang="la" version="2.0" xmlns="urn:oasis:names:tc:xliff:document:2.0">
  <file id="2" original="/topic.dita">
    <skeleton href="./skl/topic.dita.skl"></skeleton>
    <unit fs:fs="p" id="9962" xmlns:fs="urn:oasis:names:tc:xliff:fs:2.0">
      <originalData>
        <data id="sd4e14">&lt;b&gt;</data>
        <data id="ed4e14">&lt;/b&gt;</data>
      </originalData>
      <segment state="translated">
        <source xml:lang="en" xml:space="preserve">Loves or pursues or desires to obtain pain of
            itself, because it is pain, but occasionally circumstances occur in which toil and pain
            can procure him some  great pleasure. To take a trivial example, <pc dataRefEnd="ed4e14"
            dataRefStart="sd4e14" fs:fs="b" id="d4e14">which of us ever undertakes laborious physical
            exercise</pc>except to obtain some advantage from it? But who has any right to find fault with
            a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids a pain
            that produces no resultant pleasure?
        </source>
        <target xml:lang="la">
            Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
            eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
            enim ad minim veniam, <pc dataRefEnd="ed4e14" dataRefStart="sd4e14" fs:fs="b" id="d4e14">
            quis nostrud exercitation ullamco laboris,</pc> nisi ut aliquip ex ea commodo consequat.
            Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat
            nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
            deserunt mollit anim id est laborum.
        </target>
      </segment>
    </unit>
    ...etc
  </file>
  ...etc

Populating Skeletons from the command line

  1. recreate *.dita files using an XLIFF File and its associated skeletons with run:
PATH-TO-DITA-OT/bin/dita -f xliff-dita -i translate.xlf -o out --xliff.version=1|2

Result

The translated *.dita files are generated into the out directory.

Note

Any machine translation is by definition imperfect. A typical translation workflow would send the generated XLIFF files to the translation agency (known also as "localisation service provider"), and receive back verified translated content from the translation agency integrated into to the XLIFF. For XLIFF 1.2, each <trans-unit> should be marked approved="yes" when the <target> element has been verified. Similarly for XLIFF 2.1 each <segement> should be marked as state="final".

Parameter Reference

  • translate.from - Source language to use. Defaults to the value in configuration.properties
  • translate.to - Target language. Defaults to the value in configuration.properties
  • translate.cachefile - Specifies the (absolute) location of a previously translated XLIFF file to be used. If the id matches to a previously translated text snippet in the cache file, the text will be copied over and the snippet marked as approved.
  • translate.service - Decides which translation service to use:
    • bing - Connects to the Microsoft Azure Translation service
    • custom - Sends the translate to an arbitrary URL using POST - use this to connect to proxies for Google Cloud Translate
    • deepl - Connects to the DeepL API Translation service
    • dummy - Avoids accessing a translation service, copies sources to target langauge directly without amendment.
    • watson - Connects to the IBM Cloud Translation service
    • yandex - Connects to the Yandex Translation service
  • translate.authentication.url - URL for creating an OAuth token if needed for a service. Defaults to the value in `configuration.properties.
  • translate.apikey - API Key for the Translation service. Defaults to the value in configuration.properties
  • translate.region - Subscription region for a Microsoft multi-service text API subscription
  • translate.url - URL for a Translation service. Defaults to the value in configuration.properties
  • xliff.version - Decides which XLIFF format to use. Defaults to the value in configuration.properties:
    • 1 - XLIFF 1.2 format
    • 2 - XLIFF 2.1 format

License

Apache 2.0 © 2019 - 2024 Jason Fox

The Program includes the following additional software components which were obtained under license: