Tag: elasticsearch

Elasticsearch, updating your mapping

Elasticsearch, updating your mapping

A very common issue, while working with elasticsearch and refining the data model of a project, is to face some error because the fields are generic text or fail to aggregate because fielddata is not enabled:

Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

First of all, DO NOT enable fielddata, think about that twice and say “NO”. There is always a better way, they cost a lot. Read this for details: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

Retrieve the current mapping.

This is the first step, you have to understand what is wrong and how you intend to fix it. use

curl -XGET "http://host:9200/indexname/_mapping"

to retrieve the current mapping of your index.

Create or update the mapping for your index.

First of all, have a look at this URL: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

It will help to clarify which types you can use / specify in your mapping. It is more likely that you will need to add ip, dates and long, especially if you’re going to deal with nested data.

By generally speaking the template you’re going to apply will look like below

{
  "order": 1,
  "settings": {
    ... this is where you specify any settings ...
  },
  "index_patterns": ["yourpattern"]
  "mappings": {
    .. this can be exactly what you get from the _mapping API of the index ...

You may want to do that because some fields are wrongly mapped with a wrong type (e.g. number are currently mapped as text or IP addresses are mapped as text).

Once you have done, you can just invoke the _template API and set/update:

curl -H 'Content-Type: application/json' http://localhost:9200/_template/yourpattern -d@/path/to/yourpattern.template.json

the template that will be used for the next index that will be created, so if you specify

"index_patterns": ["yourpattern-*"]

any new index matching that pattern will inherit the mapping (e.g. yourpattern-2019.01.15)

Update an existing index

The above template won’t affect the already create index, so you might need to “migrate” the existing data to be mapped according to the new template. You can do that by telling elasticsearch to create a new index with the _reindex API.

Below example assume your existing data are stored with index yourpattern-1-2019.01.15. Before proceeding, make sure nothing is writing your index (it could be a good idea to set a version number to the index so you can route the traffic to a new index by just telling the clients, like logstash, to use a different number), then you can run the below curl

curl -XPOST "http://127.0.0.1:9200/_reindex?wait_for_completion=false" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "yourpattern-1-2019.01.15"
  },
  "dest": {
    "index": "yourpattern-1.1-2019.01.15"
  }
}'

By specifying ?wait_for_completion=false elasticsearch will return a token that can be used to monitor the process with the _tasks API

curl -XGET "http://127.0.0.1:19200/_tasks/MM-HD_-cRXeOF6QESzidSQ:1155556"

The Reindex API is an incredibly powerful tool, I recommend to read this for more details: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Once you have done, you can safely remove the previous index with the wrongly mapped data.

Advertisement
Elasticsearch, some note about shards and replicas

Elasticsearch, some note about shards and replicas

An elasticsearch index is composed of one or more shards. Shards can be primary or replica, primary shards are the first involved in a write, replica are created or updated after the primary.

Shards.

Shards are how elasticsearch scales horizontally and distributes load among the nodes of a cluster. A shard is a lucene index, it can hold up to Integer.MAX_VALUE-128 documents (2,147,483,519). You must tune your index to make sure you won’t have shards greater than 20/30GB (or you will have out of memory issues very soon). Changing the number of shards requires to build/re-build a new index, you must set this while creating the index (or through a proper mapping configuration),

see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html

{ 
  "settings":
   {
    "index.number_of_shards": 10
   }
}

See template API for details: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html

Replicas.

They are important to scale reads and lower the pressure (replica are usually involved in the reads as primary target) and make you’re index resilient to failure. A replica can become a primary shard if needed.

The replicas can be set on-the-fly

curl -X PUT "localhost:9200/indexname/_settings" -H 'Content-Type: application/json' -d' {     "index" : {         "number_of_replicas" : 2     } } '