Elasticsearch Basics
What is normalizer in elasticsearch
Normalizers are similar to analyzers except that they may only emit a single token. As a consequence, they do not have a tokenizer and only accept a subset of the available char filters and token filters. Only the filters that work on a per-character basis are allowed. For instance a lowercasing filter would be allowed, but not a stemming filter, which needs to look at the keyword as a whole.The current list of filters that can be used in a normalizer is following:
arabic_normalization, asciifolding, bengali_normalization, cjk_width, decimal_digit, elision, german_normalization, hindi_normalization, indic_normalization, lowercase,persian_normalization, scandinavian_folding, serbian_normalization, sorani_normalization, uppercase.Difference between analyzers and normalizers
Normalizer is similar to analyzer except it can be applied with keyword data-type field.Usercase for elasticsearch normalizers
- Case insensitive sorting/ case insensitive exact match etc.Custom normalizers
Elasticsearch does not ship with built-in normalizers so far, so the only way to get one is by building a custom one.Normalizer case insensitive
curl -X PUT "localhost:9200/index_name" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"normalizer": {
"custom_lower": {
"type": "custom",
"filter": [ "lowercase" ]
}
}
}
},
"mappings":{
"_doc": {
"properties": {
"user_id": { "type": "integer" },
"user_name": { "type": "keyword" , "normalizer": "custom_lower"},
"user_age": { "type": "integer" },
"created": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
}'