Menu

Elasticsearch Basics

What is analyzer in elasticsearch

Text to be indexed is processed according to our requirements prior to the splitting into terms. This process is called analysis and is performed by analyzers.
The analyzing process is done by analyzers. Which involve the following process:

- Split the piece of text into individual terms or token
- Standardize the individual terms so they become more searchable.
What is the exactvalue fields and the fulltext fields ?

- The exact-values are the values that don’t make sense if they’re splitted.
For example, there is no point in splitting and tokenising the email id of a user or the date fields, because it is always better to search for an intact date or email id.
- Fulltext values are mainly the human generated textual content. Like an article in a blog or a comment in a forum. From full-text values we expect data results that make sense to humans.

PREDEFINED ANALYZERS

Elasticsearch ships with a wide range of built-in analyzers, which can be used in any index without further configuration:
Standard Analyzer :-
- The standard analyzer divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm.
- It removes most punctuation, lowercases terms, and supports removing stop words. The standard analyzer is the default analyzer which is used if none is specified.
Simple Analyzer :-
The simple analyzer divides text into terms whenever it encounters a character which is not a letter.It lowercases all terms.
Whitespace Analyzer :-
The whitespace analyzer divides text into terms whenever it encounters any whitespace character.It does not lowercase terms.
Stop Analyzer :-
The stop analyzer is like the simple analyzer, but also supports removal of stop words.
Keyword Analyzer :-
The keyword analyzer is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.
Pattern Analyzer :-
The pattern analyzer uses a regular expression to split the text into terms. It supports lower-casing and stop words.
Language Analyzers :-
Elasticsearch provides many language-specific analyzers like english or french.
Fingerprint Analyzer :-
The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.

PREDEFINED ANALYZERS

When the built-in analyzers do not fulfill your needs, you can create a custom analyzer which uses the appropriate combination of:
- zero or more character filters.
- a tokenizer.
- zero or more token filters.

Configuration

The custom analyzer accepts the following parameters:
Parameters Explanation
tokenizer A built-in or customised tokenizer. (Required)
char_filter An optional array of built-in or customised character filters.
filter An optional array of built-in or customised token filters.
position_increment_gap When indexing an array of text values, Elasticsearch inserts a fake "gap" between the last term of one value and the first term of the next value to ensure that a phrase query doesn’t match two terms from different array elements. Defaults to 100.

Support us by sharing our content :-

LinkedIn

Support us on Patreon :-
Become Patron