Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data schema for ICU4X #174

Closed
sffc opened this issue Jul 10, 2020 · 2 comments · Fixed by #200
Closed

Data schema for ICU4X #174

sffc opened this issue Jul 10, 2020 · 2 comments · Fixed by #200
Assignees
Labels
A-design Area: Architecture or design C-data-infra Component: provider, datagen, fallback, adapters question Unresolved questions; type unclear

Comments

@sffc
Copy link
Member

sffc commented Jul 10, 2020

I've so far left the question of the ICU4X data schema a bit undefined. I wanted to discuss this in more detail now.

I was thinking of something along the lines of /category/key/[payload/]locale, followed by a metadata block that includes the hunk. Potential new names that are more domain-specific: /category/subcategory/[flavor/]locale.

Why feature-first and not locale-first?

  • Different features can have different locales.
  • Keeping the locale near the leaf of the tree means that you don't have to re-crawl the tree in order to perform fallbacks.

For example:

{
  "plurals": {
    "cardinal_v1": {
      "en": {
        "data": {
          "one": "..."
        }
      },
      "ru": {
        "data": {
          "one": "...",
          "few": "...".
          // ...
        }
      }
    },
    "ordinal_v1": {
      // ...
    }
  },
  "numbers": {
    "currency_symbols_v1": {
      "EUR": {
        "sr": {
          // sr implicitly inherits from und
          "data": { /* ... */ }
        },
        "sr_Latn": {
          // sr_Latn should inherit from und, not sr
          "fallback": "und",
          "data": { /* ... */ }
        },
        "und": {
          "data": { /* ... */ }
        }
      },
      "GBP": { /* ... */ }
    }
  }
}

This structure can optionally be broken down onto the filesystem in order to get more manageable file sizes. The filesystem structure can also be used to serve this as static data from a HTTP server.

📦data
 ┣ 📂numbers
 ┃ ┗ 📂currency_symbols_v1
 ┃   ┣ 📂GBP
 ┃   ┃ ┣ 📜sr.json
 ┃   ┃ ┣ 📜sr_Latn.json
 ┃   ┃ ┗ 📜und.json
 ┃   ┗ 📂USD
 ┃     ┗ 📜und.json
 ┗ 📂plurals
   ┣ 📂cardinal
   ┃ ┣ 📜en.json
   ┃ ┗ 📜ru.json
   ┗ 📂ordinal
     ┣ 📜en.json
     ┗ 📜ru.json

To generate this schema, we will need a tool that transforms from CLDR (either XML or JSON). Ideally that tool would be written in Rust, as it could depend on ICU4X, e.g. to process data into a more convenient form.

@zbraniecki @nciric

@sffc sffc added question Unresolved questions; type unclear C-data-infra Component: provider, datagen, fallback, adapters A-design Area: Architecture or design discuss Discuss at a future ICU4X-SC meeting labels Jul 10, 2020
@sffc sffc self-assigned this Jul 16, 2020
@sffc
Copy link
Member Author

sffc commented Jul 17, 2020

We discussed this on 2020-07-17 but did not reach a final conclusion. We will continue async. CC @mihnita

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jul 23, 2020
@sffc sffc linked a pull request Aug 19, 2020 that will close this issue
@sffc
Copy link
Member Author

sffc commented Aug 19, 2020

The layout I proposed in the OP is being implemented in #200. I filed #211 to follow up with an alternative data provider using a language pack filesystem layout. @mihnita @nciric

@sffc sffc closed this as completed in #200 Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-design Area: Architecture or design C-data-infra Component: provider, datagen, fallback, adapters question Unresolved questions; type unclear
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant