Menu

Elasticsearch Basics

What is normalizer in elasticsearch

Normalizers are similar to analyzers except that they may only emit a single token. As a consequence, they do not have a tokenizer and only accept a subset of the available char filters and token filters. Only the filters that work on a per-character basis are allowed. For instance a lowercasing filter would be allowed, but not a stemming filter, which needs to look at the keyword as a whole.
The current list of filters that can be used in a normalizer is following: 
arabic_normalization, asciifolding, bengali_normalization, cjk_width, decimal_digit, elision, german_normalization, hindi_normalization, indic_normalization, lowercase,persian_normalization, scandinavian_folding, serbian_normalization, sorani_normalization, uppercase.

Difference between analyzers and normalizers

Normalizer is similar to analyzer except it can be applied with keyword data-type field.

Usercase for elasticsearch normalizers

- Case insensitive sorting/ case insensitive exact match etc.

Custom normalizers

Elasticsearch does not ship with built-in normalizers so far, so the only way to get one is by building a custom one.
Normalizer case insensitive
		
curl -X PUT "localhost:9200/index_name" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "normalizer": {
        "custom_lower": {
          "type": "custom",
          "filter": [ "lowercase" ]
        }
      }
    }
  },
  "mappings":{
      "_doc": { 
      "properties": { 
        "user_id":    { "type": "integer"  }, 
        "user_name":     { "type": "keyword" , "normalizer": "custom_lower"}, 
        "user_age":      { "type": "integer" },  
        "created":  {
          "type":   "date", 
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}'

Support us by sharing our content :-

LinkedIn

Support us on Patreon :-
Become Patron