Skip to content

kristianmandrup/json-schema-to-es-mapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JSON Schema to ElasticSearch mappings

Convert JSON schema to ElasticSearch mappings

A mapping type has:

Meta-fields

Meta-fields are used to customize how a document’s metadata associated is treated. Examples of meta-fields include the document’s _index, _type, _id, and _source fields.

Fields or properties

A mapping type contains a list of fields or properties pertinent to the document.

Field datatypes

Each field has a data type which can be:

  • a simple type like text, keyword, date, long, double, boolean or ip
  • a type which supports the hierarchical nature of JSON such as object or nested
  • a specialised type like geo_point, geo_shape, or completion

It is often useful to index the same field in different ways for different purposes. For instance, a string field could be indexed as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a string field with the standard analyzer, the english analyzer, and the french analyzer.

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

Quick start

  • npm: npm install json-schema-to-es-mapping -S
  • yarn: yarn add json-schema-to-es-mapping

The easiest way to get started is to use buildMappingsFor to create a mappings object for a named index given a JSON schema.

const mappings = buildMappingsFor("people", schema);

Example:

const schema = {
  $schema: "http://json-schema.org/draft-07/schema#",
  $id: "http://example.com/person.schema.json",
  title: "Person",
  description: "A person",
  type: "object",
  properties: {
    name: {
      description: "Name of the person",
      type: "string"
    },
    age: {
      description: "Age of person",
      type: "number"
    }
  },
  required: ["name"]
};

const { buildMappingsFor } = require("json-schema-to-es-mapping");
const mappings = buildMappingsFor("people", schema);
console.log({ mappings });

This will by default give the following mappings result:

{
  "mappings": {
    "people": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "age": {
          "type": "integer"
        }
      }
    }
  }
}

The function buildMappingsFor uses the build function to return the properties map and simply wraps them with a mappings object for the named index.

Supported mappings

Currently all Elastic Search core data types are supported (except for binary).

  • string
  • numeric
  • boolean
  • date
  • object
  • ranges (numeric, date) (soon)
  • geo_point (soon)
  • ip (soon)

Note: The most feature complete version can currently be found in the to-ts branch. This branch is almost complete. It has unit test coverage of most of the functionality, includes initial support for complex schema types (such as anyOf a list of types) and the code has been converted to TypeScript.

Please help with the finishing touches so it can be released if you want or need these extra mappings and other features.

Numeric

You can assist the numeric type mapper by supplying a numType for the field entry, such as numType: "double"

See ES number reference for list of valid numTypes (except for scaled_float)

Ranges

  • Numeric
  • Date

Numeric ranges

To make a numeric field entry be mapped to an ES numeric range:

  • Set range: true
  • Set a minimum range value, either minimum or exlusiveMinimum
  • Set a maximum range value, either maximum or exlusiveMaximum

If you leave out the range: true it will be resolved as a number, using the min and max values and the multipleOf (precision). These properties will in combination be used to determine the exact numeric type (byte, short, ... double) to be used in the Elastic Search numeric type mapping.

Date ranges

To make an entry detect as a date range, the same applies as for a number range but the entry must also resolve to a date type (see types/util.js function isDate(obj) for details)

Recent feature additions

Now also resolves:

  • Array items that are themselves object types
  • References to object definitions (ie. $ref)
  • Parent-child mapping

Limitations and coming features

Support for Geo location mapping will likely be included in the near future.

Please Let me know any other features you'd like to include for a more feature complete library!

Initial work to support these features have been started in the dev branch and should land soon (0.4.0).

Fine grained control

For more fine-grained control, use the build function directly.

const { build } = require("json-schema-to-es-mapping");
const { properties, results } = build(schema);
console.log({ properties, results });

Will output the following Elastic Search Mapping schema:

{
  "name": {
    "type": "text"
  },
  "age": {
    "type": "float"
  }
}

The results will in this (simple) case give the same results as the mappings:

{
  name: { type: "keyword" },
  age: { type: "float" }
}

Event driven approach

You can use the Event driven approach with the onResult and other calback handlers, to generate a more context specific mapping for Elastic Search context, given your requirements.

const received = [];
const onResult = result => {
  console.log("received", result);
  received.push(result);
};

// potentially use to call resolve callback of Promise
const onComplete = fullResult => {
  console.log("ES mapping done :)", {
    fullResult, // 'internal" results
    received // list built by onResult
  });
};

// potentially use to call reject callback of Promise
const onError = errMsg => {
  console.error("ES mapping error", errMsg);
  throw errMsg;
};

// potentially use to call reject callback of Promise
const onThrow = err => throw err;
const config = { onResult, onComplete, onError, onThrow };

The onResult handler will populate the received array with the following:

[
  { parentName: "Person", key: "name", resultKey: "name", type: "text" },
  {
    parentName: "Person",
    key: "age",
    resultKey: "age",
    type: "float"
  }
];

You will also get notified on:

  • successful completion of JSON schema mapping via onComplete callback
  • aborted due to processing error via onError callback
  • aborted due to throwing exception via onThrow callback

The Event driven approach is entirely optional, but can be used for a more "stream like" approach. This approach works well with async promises (ie. reject and resolve callbacks).

On each result received you can then issue a command to the Elastic Search server (f.ex via the REST interface) to add a new mapping that reflects the result received.

Put mapping

PUT person/_mapping/_doc
{
  "properties": {
    "age": {
      "type": "float"
    }
  }
}

Alternatively only submit the ES index mappings after onComplete is triggered, to make sure the full JSON schema could be processed, so that you don't end up with partial schema mappings.

Nested schemas

For a nested schema of the form:

{
  $schema: "http://json-schema.org/draft-07/schema#",
  $id: "http://example.com/person.schema.json",
  title: "Person",
  description: "A person",
  type: "object",
  properties: {
    name: {
      description: "Name of the person",
      type: "string"
    },
    dog: {
      type: "object",
      typeName: "Animal",
      properties: {
        name: {
          description: "Name of the dog",
          type: "string",
          required: true
        },
        age: {
          description: "Age of dog",
          type: "number"
        }
      }
    }
  },
  required: ["name"]
};

buildMappingsFor will in this case generate an Elastic Search mapping as follows:

mappings: {
  people: {
    properties: {
      name: {
        type: "keyword"
      },
      dog: {
        properties: {
          name: {
            type: "keyword"
          },
          age: {
            type: "float"
          }
        }
      }
    }
  }
}

Note that the dog object results in a nested mapping (see ElasticSearch resources below)

The results will in this case give:

{
  name: { type: 'keyword' },
  dog_name: { type: 'keyword' },
  dog_age: { type: 'float' },
  dog: {
    name: { type: 'keyword' },
    age: { type: 'float' }
  }
}

Notice how the dog properties are provided both in flat and nested form. Depending on your requirements, you might want to store the Elastic Search data in a more flat form than in your general application domain model.

Customizing the result

You can pass a custom function shouldSetResult(converter) which controls under which converter conditions the result should be set. You can also pass:

  • a custom name separator nameSeparator
  • a resultKey(converter) function, to customize how result keys (names) are generated
  • a nestedKey(converter) function, to customize how nested result keys (names) are generated

Example:

const config = {
  shouldSetResult: converter => {
    return converter.type !== "object";
  },
  nameSeparator: "__" // example: dog__age
};

This configuration will result in results discarding the nested form, thus only retaining flattened field mappings.

{
  name: { type: 'keyword' },
  dog__name: { type: 'keyword' },
  dog__age: { type: 'float' },
}

If you add an onResult handler to receive results, it will look as follows:

results:
  [
    {
      parentName: 'Person',
      key: 'name',
      resultKey: 'name',
      type: 'keyword'
    },
    {
      parentName: 'dog',
      key: 'name',
      resultKey: 'dog__name',
      type: 'keyword'
    },
    { parentName: 'dog',
      key: 'age',
      resultKey: 'dog__age',
      type: 'float'
    },
    { parentName: 'Person',
      typeName: 'Animal',
      key: 'dog',
      resultKey: 'dog',
      properties: {
        name: { type: 'keyword' },
        age: { type: 'float' }
      }
    }
  ]
}

Note the typeName in the result for the dog fields (more on this later)

Default configuration

The default configuration is as follows.

{
  _meta_: {
    types: {
      string: "keyword",
      number: "float",
      object: "object",
      array: "nested",
      boolean: "boolean",
      date: "date"
    }
  },
  fields: {
    name: {
      type: "keyword"
    },
    content: {
      type: "text"
    },
    text: {
      type: "text"
    },
    title: {
      type: "text"
    },
    caption: {
      type: "text"
    },
    label: {
      type: "text"
    },
    tag: {
      type: "keyword",
      index:    "not_analyzed"
    }
  }
}

Note that some or all of these might benefit from being defined as multi fields, that are indexed and analyzed both as text and keyword.

You can pass in a custom configuration object (last argument) to override or extend it ;)

Note that for convenience, we pass in some typical field mappings based on names. Please customize this further to your needs.

Customization

  • Type mappers
  • Rules

Type mappers

You can pass in custom Type mapper factories if you want to override how specific types are mapped.

Internally this is managed in the SchemaEntry constructor in entry.js:

this.defaults = {
  types: {
    string: toString,
    number: toNumber,
    boolean: toBoolean,
    array: toArray,
    object: toObject,
    date: toDate,
    dateRange: toDateRange,
    numericRange: toNumericRange
  },
  typeOrder: [
    "string",
    "dateRange",
    "numericRange",
    "number",
    "boolean",
    "array",
    "object",
    "date"
  ]
};

this.types = {
  ...this.defaults.types,
  ...(config.types || {})
};
this.typeOrder = config.typeOrder || this.defaults.typeOrder;

To override, simply pass in a custom types object and/or a custom typeOrder array of the precedence order they should be resolved in.

Custom Type mapper example (object)

Create a toObject file loally in your project that contains your overrides

const { types } = require("json-schema-to-es-mapping");
const { MappingObject, toObject, util } = types;

class MyMappingObject extends MappingObject {
  // ...override

  createMappingResult() {
    return this.hasProperties
      ? this.buildObjectValueMapping()
      : this.defaultObjectValueMapping;
  }

  buildObjectValueMapping() {
    const { buildProperties } = this.config;
    return buildProperties(this.objectValue, this.mappingConfig);
  }
}

module.exports = function toObject(obj) {
  return util.isObject(obj) && new MyMappingObject(obj).convert();
};

Import the toObject function and pass it in the types object of the config object passed to the build function.

// custom implementation
const toObject = require("./toObject");

const myConfig = {
  types: {
    toObject
  }
};

// will now use the custom toObject for mapping JSON schema object to ES object
build(schema, myConfig);

Depending on your requirements, you can post-process the generated mapping to better suit your specific needs and strategies for handling nested/complex data relationships.

Elastic search types

Core:

  • String (text, keyword)
  • Numeric (long, integer, short, byte, double, float, half_float, scaled_float)
  • Date (date)
  • Boolean (boolean)
  • Binary (binary)
  • Range (integer_range, float_range, long_range, double_range, date_range)

Type mappings

The default type mappings are as follows:

  • boolean -> boolean
  • object -> object
  • array -> nested
  • string -> keyword
  • number -> integer
  • date -> date

For array it will use type of first array item if basic type and the type for all array items are the same.

{
  "type": "array",
  "items":{
    "type": "integer"
  }
}

If array item types are note "uniform" it will throw an error.

For the following array JSON schema entry the mapper will currently set the mapping type to string (by default). Please use the customization options outlined to define a more appropriate mapping strategy if needed.

{
 "type": "array",
 "items" : [{
    "type": "string"
    // ...
  },
  {
    "type": "string"
    // ...
  },
 ]
}

You can override the default type mappings by passing a types entry with type mappings in the _meta_ entry of config

const config = {
  _meta_: {
    types: {
      number: "long", // use "integer" for numbers
      string: "text" // use "text" for strings
    }
  }
};

Rules

You can pass an extra configuration object with specific rules for ES mapping properties that will be merged into the resulting mapping.

const config = {
  _meta_: {
    types: {
      number: "long", // use "integer" for numbers
      string: "text" // use "text" for strings
    }
  },
  fields: {
    created: {
      // add extra indexing field meta data for Elastic search
      format: "strict_date_optional_time||epoch_millis"
      // ...
    },
    firstName: {
      type: "keyword" // make sure firstName will be a keyword field (exact match) in ES mapping
    }
  }
};

const { build } = require("json-schema-to-es-mapping");
const mapping = build(schema, config);

Also note that you can pass in many of the functions used internally, so that the internal mechanics themselves can easily be customized as needed or used as building blocks.

Elastic Search nested objects and data

Advanced customization

To override the default mappings for certain fields, you can pass in a fields mapping entry in the config object as follows:

const config = {
  fields: {
    timestamp: {
      type: "date",
      format: "dateOptionalTime"
    }
    // ... more custom field mappings
  }
};

For a more scalable customization, pass an entryFor function which returns custom mappings depending on the entry being processed.

  • key
  • resultKey (ie. potentially nested key name)
  • parentName name of parent entry if nested property
  • schemaValue (entry from JSON schema being mapped)

You could f.ex use this to provide custom mappings for specific types of date fields.

const config = {
  entryFor: ({ key }) => {
    if (key === "date" || key === "timestamp") {
      return {
        type: "date",
        format: "dateOptionalTime"
      };
    }
  }
};

resolve type maps

You can use resolve-type-maps to define mappings to be used across your application in various schema-like contexts:

  • GraphQL schema
  • Data storage (tables, colletions etc)
  • Validation
  • Forms
  • Data Display
  • Indexing (including Elastic Search)
  • Mocks and fake data
const fieldMap = {
  name: {
    matches: ['title', 'caption', 'label'],
    elastic: {
      type: 'string',
    }
  }
  tag: {
    matches: ['tags'],
    elastic: {
      type: 'keyword',
    }

  },
  text: {
    matches: ['description', 'content'],
    elastic: {
      type: 'text',
    }
  },
  date: {
    matches: ['date', 'timestamp'],
    elastic: {
      type: 'text',
      format: 'dateOptionalTime'
    }
  }
}

const typeMap = {
  Person: {
    matches: ['User'],
    fields: {
      dog: {
        // ...
        elastic: {
          type: 'nested',
          // ...
        }
      },
      // ...
    }
  }
}

Then pass an entryFor function in the config object to resolve the entry to be used for the ES mapping entry.

import { createTypeMapResolver } from "resolve-type-maps";

const map = {
  typeMap,
  fieldMap
};

const resolverConfig = {};
const functions = {
  resolveResult: (obj) => obj.elastic;
}

const resolver = createTypeMapResolver(
  { map, functions },
  resolverConfig
);

const config = {
  entryFor: ({ parentName, typeName }) => {
    // ensure capitalized and camelized name
    const type = classify(typeName || parentName);
    const name = converter.key;
    return resolver.resolve({ type, name });
  }
};

Note that for typeName to be set, either set a className or typeName property on the object entry in the JSON schema (see dog example above) or alternatively provide a lookup function typeNameFor(name) on the config object passed in.

For inner workings, see TypeMapResolver.ts

The above configuration should look up the elastic mapping entry to use, based on the type/field combination in the typeMap first and then fall back to the field name only in the fieldMap if not found. On a match, it will resolve by returning entry named elastic in the object matching.

{
  Person: {
    matches: [/User/],
    fields: {
      dog: {
        // ...
        elastic: {
          type: 'nested',
          // ...
        }
      },
    }
  }
}

It should match a schema (or nested schema entry) named Person or User on the typeMap entry Person. For the nested dog entry it should then match on the entry dog under fields and return the entry for elastic, ie:

{
  type: "nested";
}

If no match is made in the typeMap, it will follow a similar strategy by lookup a match in the fieldMap (as per the maps entry passed in the config object when creating the resolver), matching only on the field name.

ElasticSearch mapping resources

Testing

Uses jest for unit testing.

Currently not well tested. Please help add more test coverage :)

TODO

1.0.0

  • Convert project to TypeScript
  • Add unit tests for ~80% test coverage
  • Improve mappings for:
    • Date range

Author

2019 Kristian Mandrup (CTO@Tecla5)

License

MIT

About

Convert JSON Schema to Elastic Search mapping schema

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published