Skip to content
Matthew Boynes edited this page Apr 19, 2020 · 3 revisions

Note: This document is written for Elasticsearch 7.x and might not be compatible with earlier versions.

Default Mapping

When you send a document to Elasticsearch, Elasticsearch needs to know how to index the data in that document. As noted in ES Indexing 101, a text field is treated differently from a date field, a numeric field, etc. One string field may be body copy for an article and another may be a cryptographic hash. By default, Elasticsearch will try to logically interpret any data sent to it. For instance, if it sees a new field containing numeric data, it will treat that field as a number. If it sees a date, it will set that field as a date in the given format.

Let's look at this in action. To index a document, you send it as JSON using the HTTP verb PUT, and you specify the index, document type, and optionally a document ID in the URL:

Request: PUT /futurama-characters/2

{
  "name": "Hubert J. Farnsworth",
  "title": "Professor",
  "birthday": "2841-04-09"
}

Response

{
  "_index": "futurama-characters",
  "_type": "_doc",
  "_id": "2",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 1
}

Now if we query for the mapping, we'll see what Elasticsearch did with it:

Request: GET /futurama-characters/_mapping

Response

{
  "futurama-characters": {
    "mappings": {
      "properties": {
        "birthday": {
          "type": "date"
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "title": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

Elasticsearch correctly interpreted birthday as a date, and name and title as strings. While Elasticsearch does as good a job as it can at building its own mappings/schema for new document types and new fields, it's easy to run into its limitations. A field may be a date in one document and a string in another document. If Elasticsearch is expecting a date and gets something else, it will fail to index that document. Elasticsearch is forced to make a decision about data based on the first sample it sees, which isn't always representative. In addition, you may want to structure your schema in a way that Elasticsearch can't ascertain on its own. For these reasons and many others, you can customize your mapping in your Elasticsearch index.

Custom Mapping

Here's an example request to add a custom mapping. First, we'll clear out the existing index and mapping:

**Request: DELETE /futurama-characters

Response:

{
  "acknowledged": true
}

**Request: PUT /futurama-characters

{
  "mappings": {
    "properties": {
      "birthday": {
        "type": "date",
        "format": "yyyy-MM-dd"
      },
      "name": {
        "type": "text"
      },
      "title": {
        "type": "text"
      },
      "species": {
        "type": "text"
      },
      "serial_number": {
        "type": "keyword"
      }
    }
  }
}

In our custom mapping, we made the birthday field a bit more restricted, we added two new fields, and for one of the fields (serial_number), we chose not to analyze it. As discussed in ES Indexing 101, there will be times we choose not to analyze strings, and a serial number is a good example: when you search for a serial number, you will probably only ever want to search by an exact value.

Field Types

When setting the field's type in the mapping, you have lots of options. See Mapping Types in the official Elasticsearch documentation for a full reference. Here are some important things to keep in mind:

  • Multi-fields add a lot of flexibility; for instance, you can have a field be both analyzed (for searching) and not analyzed (for aggregations and sorting)
  • For numbers, pick the smallest type which is enough for your use-case, but be sure you have room for growth and use a scaled_float instead of a float when possible

Further Reading