Menu

Introduction to elasticsearch

What is elasticsearch

Elasticsearch is an open-source, RESTful, distributed search and analytics engine.
Elasticsearch is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.
Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

Advantages of elasticsearch

-Elastic Search is document-oriented. It stores real world complex entities as structured JSON documents and indexes all fields by default, with a higher performance result.

-Elastic Search implements a lot of features, such as customized splitting text into words, customized stemming, facetted search, and more.

-Elastic Search is schema free—instead, it accepts JSON documents, as well as tries to detect the data structure, index the data, and make it searchable.

-Elastic Search is API driven i.e. actions can be performed using a simple Restful API.

-Elastic search is able to execute complex queries extremely fast. It also caches almost all of the structured queries commonly used as a filter for the result set and executes them only once. For every other request which contains a cached filter, it checks the result from the cache. This saves the time parsing and executing the query improving the speed.

Limitations of elasticsearch

-Not real-time – eventual consistency (near real-time): The data you index is only available for search after 1 sec. A process known as refresh wakes up every 1 sec by default and makes the data searchable.

-Doesn’t support SQL like joins.

-Doesn’t support transactions and rollbacks ,Elastcisearch is not an ACID compliant system.

-Updates are expensive. An update on the existing document deletes the document and re-inserts it as a new document.

Elasticsearch concepts

Node − It refers to a single running instance of Elasticsearch. Every node is identified with a unique name. If not provided explicitly, a node is named with random UUID at the startup.

Cluster − It is a collection of one or more nodes. Cluster provides collective indexing and search capabilities across all the nodes for entire data.

Index − It is a collection of different type of documents and document properties.

Document − It is a collection of fields in a specific manner defined in JSON format. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier, called the UID.

Shard − Indexes are horizontally subdivided into shards. This means each shard contains all the properties of document, but contains less number of JSON objects than index. The horizontal separation makes shard an independent node, which can be store in any node.Shard is full featured subset of an index. Shards of the same index now can reside on the same or different nodes of the cluster. Shard decides the degree of parallelism for search and indexing operations. Shards allow the cluster to grow horizontally. The number of shards per index can be specified at the time of index creation. By default number of shards created is 5. Although, once the index is created the number of shards can not be changed. To change the number of shards that data will need to re-indexed. Primary shard is the original horizontal part of an index and then these primary shards are replicated into replica shards.

Replicas − Elasticsearch allows a user to create replicas of their indexes and shards. Replication not only helps in increasing the availability of data in case of failure, but also improves the performance of searching by carrying out a parallel search operation in these replicas.

Elasticsearch Installation

Install Java

		
sudo yum install java-1.8.0-openjdk.x86_64

The RPM for Elasticsearch v6.4.3 can be downloaded from the website and installed as follows:

		
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.4.3.rpm
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.4.3.rpm.sha512

Compare the SHA of the downloaded RPM and the published checksum

		
shasum -a 512 -c elasticsearch-6.4.3.rpm.sha512 
which should output elasticsearch-{version}.rpm: OK.

Install elasticsearch

		
sudo rpm --install elasticsearch-6.4.3.rpm
This results in Elasticsearch being installed in /usr/share/elasticsearch/ with its configuration files placed in /etc/elasticsearch and its init script added in /etc/init.d/elasticsearch.

Configuring Elasticsearch

elasticsearch.yml is the file which configures the elasticsearch server settings.

		
sudo vim /etc/elasticsearch/elasticsearch.yml
Remove # character from beginning of lines for node name and clustername and change their value as required.
		
node.name: node_1
cluster.name: cluster_1
- Another crucial setting is the role of the server, which could be either master or slave.
The setting which determines the role of the server is called node.master.
If you have only one Elasticsearch node, you should leave this option commented out so that it keeps its default value of true — i.e. the sole node should be also a master.
Alternatively, if you wish to configure the node as a slave, remove the # character at the beginning of the node.master line, and change the value to false.
- node.data is another crucial configuration option , Which determines whether a node will store data or not.
In most cases this option should be left to its default value i.e. true, but there are two cases in which you might wish not to store data on a node.
One is when the node is a dedicated master.The other is when a node is used only for fetching data from nodes and aggregating results.
In the latter case the node will act up as a "search load balancer".If you have only one Elasticsearch node, you should leave this setting commented out so that it keeps the default true value. Otherwise, to disable storing data locally, uncomment the following line and change the value to false.
		
node.data: false
-Two other important options are index.number_of_shards and index.number_of_replicas.
The first determines into how many shards the index will be split into.
The second defines the number of replicas which will be distributed across the cluster.
Having more shards improves the indexing performance, while having more replicas makes searching faster. For learning purpose it's better to start with only one shard and no replicas. Thus, their values should be set to the following (make sure to remove the # at the beginning of the lines):
		
index.number_of_shards: 1
index.number_of_replicas: 0
Once you make all the changes, please save and exit the file. Now you can start Elasticsearch with the command:
		
sudo service elasticsearch start
By now, Elasticsearch should be running on port 9200. You can test it with curl, the command line client-side URL transfers tool and a simple GET request like this:
		
curl -X GET 'http://localhost:9200'
You should see the following response:
		
{
  "name" : "node_1",
  "cluster_name" : "cluster_1",
  "cluster_uuid" : "1C-D5o43RaK_o-8Ijv_WDw",
  "version" : {
    "number" : "6.4.3",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "fe40335",
    "build_date" : "2018-10-30T23:17:19.084789Z",
    "build_snapshot" : false,
    "lucene_version" : "7.4.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

References :-

Install Elasticsearch On CentOS 7

Support us by sharing our content :-

LinkedIn

Support us on Patreon :-
Become Patron