Change Elastic Index Mappings

Converting and changing the type of a column in database is easy. Unfortunately, the type of a field in a mapping could not be changed in Elastic. If the docs is already indexed, the mapping could not be changed and an reindex-ing would be needed.

· 3 min read
Change Elastic Index Mappings

Converting and changing the type of a column in database is easy. Unfortunately, the type of a field in a mapping could not be changed in Elastic. If the docs is already indexed, the mapping could not be changed and an reindex-ing would be needed.

Dynamic Mapping Failed Grafana

Elastic could be used as a data source for Grafana. But if there exists no field of date type in a index for Time field name, the data source could not be configured properly.

In my case, unfortunately, the type of timestamp field is set to text during dynamic mapping process due to inrecognizable time format (requires yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ but gets no "T" as a delimiter).

Explained in official documentation:

Except for supported mapping parameters, you can’t change the mapping or field type of an existing field. Changing an existing field could invalidate data that’s already indexed.
If you need to change the mapping of a field in other indices, create a new index with the correct mapping and reindex your data into that index.

Elasticsearch does not provide the functionality to change types for existing fields. A workaround is to import the data to a new index with correct types:

  1. create a new index
  2. put updated mapping (create manually a mapping where fields have the types you need before sending the first data) to the new index
  3. use reindex API to copy data from the old index to the newly created one
  4. (optionally) create an alias with the old index name pointing to the new index

Construct Explicit Mapping Request

First we should know the old index mapping and find out what is wrong:

$ curl "$ES_URL/$INDEX/_mapping/field/timestamp?pretty"
{
  "testrun": {
    "mappings": {
      "timestamp": {
        "full_name": "timestamp",
        "mapping": {
          "timestamp": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

We can clearly see a timestamp field with text type. This is what to expect, as Grafana requires date type. So we can construct a payload.json with explicit mapping:

{
  "properties": {
    "timestamp": {
      "type": "date",
      "format": "yyyy-MM-dd HH:mm:ss.SSSSSSZZZZZ"
    }
  }
}
payload.json

Here we have a custom date format because our raw data is in non-standard format like 2017-11-30 11:59:19.717648+00:00.

Then we will create a new index and put an explicit mapping into it.

$ curl -X PUT $ES_URL/$INDEX_NEW
$ curl -X PUT $ES_URL/$INDEX_NEW/_mapping -H "Content-Type: application/json" -d @payload.json

Reindex from Old Index to New

Reindex API

Again we can create a payload.json:

{
  "source": {
    "index": "$INDEX"
  },
  "dest": {
    "index": "$INDEX_NEW"
  }
}
payload.json
$ curl -X POST "$ES_URL/_reindex?pretty" -H 'Content-Type: application/json' -d @payload.json

If no errors are returned, reindex process is started. The process will take from minutes to hours, depending on the quantity of data. But by now Grafana should recognize the field and shows

Index OK. Time field name OK.

And the data source is sucessfully configured.

Reindex Asynchronously

Reindexing a huge index could take hours to days and will result in an HTTP request timeout. To make sure

If the request contains wait_for_completion=false, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at _tasks/<task_id>
$ curl -X POST "$ES_URL/_reindex?wait_for_completion=false" -H 'Content-Type: application/json' -d @payload.json

This will returns a tasks represented by its ID:

{ "task": "Sb_1wXmiTWWY2uFEczPhIQ:868863502" }

And this task ID could be used to query the process:

$ export TASK_ID="Sb_1wXmiTWWY2uFEczPhIQ:868863502"
$ curl "$ES_URL/_tasks/$TASK_ID?pretty"

Any warning or error will be included in the query result. When you confirm the process is done and task logs are no longer needed, this task should be deleted. The tasks are store in an intenral index .tasks, and it is where the task document should be deleted from.

$ curl -X DELETE "$ES_URL/.tasks/task/$TASK_ID"

How About the Old Index?

There is no index renaming operation in Elastic. But by the experience we've gained, it is more eaiers to rename one by deleting, creating and reindexing.

There is also an option to create alias for the index.