Tag: mapping

Elasticsearch, updating your mapping

Elasticsearch, updating your mapping

When I work on an elasticsearch based project, I have to deal with different steps.

Once I have done most of my job in assuring we have all the raw events we need, I have to start with deal with the various teams and their reporting tools that usualy require a fine grain mapping to enable them to properly sort / search / aggregate the information we have in the elasticsearc cluster.

A very common issue, while working on this task, is to face the errors because the fields are generic text or fail to aggregate because fielddata is not enabled:

Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

First of all, DON’T enable fielddata, think about that twice and say “NO”. There is always a better way, they cost a lot. Read this for details: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

Retrieve the current mapping.

This is the first step, you have to understand what is wrong and how you intend to fix it. use

curl -XGET "http://host:9200/indexname/_mapping"

to retrieve the current mapping of your index.

Create or update the mapping for your index.

First of all, have a look at this URL: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

It will help to clarify which types you can use / specify in your mapping. It is more likely that you will need to add ip, dates and long, especially if you’re going to deal with nested data.

By generally speaking the template you’re going to apply will look like below

{
  "order": 1,
  "settings": {
    ... this is where you specify any settings ...
  },
  "index_patterns": ["yourpattern"]
  "mappings": {
    .. this can be exactly what you get from the _mapping API of the index ...

You may want to do that because some fields are wrongly mapped with a wrong type (e.g. number are currently mapped as text or IP addresses are mapped as text).

Once you have done, you can just invoke the _template API and set/update:

curl -H 'Content-Type: application/json' http://localhost:9200/_template/yourpattern -d@/path/to/yourpattern.template.json

the template that will be used for the next index that will be created, so if you specify

"index_patterns": ["yourpattern-*"]

any new index matching that pattern will inherit the mapping (e.g. yourpattern-2019.01.15)

Update an existing index

The above template won’t affect the already create index, so you might need to “migrate” the existing data to be mapped according to the new template. You can do that by telling elasticsearch to create a new index with the _reindex API.

Below example assume your existing data are stored with index yourpattern-1-2019.01.15. Before proceeding, make sure nothing is writing your index (it could be a good idea to set a version number to the index so you can route the traffic to a new index by just telling the clients, like logstash, to use a different number), then you can run the below curl

curl -XPOST "http://127.0.0.1:9200/_reindex?wait_for_completion=false" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "yourpattern-1-2019.01.15"
  },
  "dest": {
    "index": "yourpattern-1.1-2019.01.15"
  }
}'

By specifying ?wait_for_completion=false elasticsearch will return a token that can be used to monitor the process with the _tasks API

curl -XGET "http://127.0.0.1:19200/_tasks/MM-HD_-cRXeOF6QESzidSQ:1155556"

The Reindex API is an incredibly powerful tool, I recommend to read this for more details: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Once you have done, you can safely remove the previous index with the wrongly mapped data.

Advertisements