Scroll Top

Text vs Keyword

There are two types of data in elasticsearch that are often troublesome for people inexperienced in working with the system – Keyword and Text. Both types are a kind of string, but elasticsearch interprets them differently, so you can perform different operations on them.

 

How file type is made?

In general, the type of field is determined by the template. It is a instruction for creating indexes in elasticsearch – including field mappings. If a given template does not clearly specify what type of field the given field should be, elasticsearch will by default create dynamic mapping for both Keyword and Text. However, it is not recommended to work in that manner, due to the disk space that can be saved by planning assigned types to fields.

 

Inverted index

To understand the importance of the problem, look at the inverted index in elasticsearch[1] . It is a specific way in which the system saves data and thanks to this solution, elasticsearch is very fast in returning large amount of dokuments even on a big time scale.

The inverted index is similar to the index in some books. At the end of the book you can find a list of words with information on which pages these words appear on. Elasticsearch does exactly the same for each word that mentions which document it is in.

Example

For example. let’s look at document with id “my-document”. It has a field of “address” with value of “Wiejska 20, Warsaw”.
POST "localhost:9200/my-index/_doc/my-document" -d'
{
"address" : "Wiejska 20, Warsaw"
}'

Below is a presentation of how elasticsearch sees it in the inverted index for both types

 

Difference in query

Elasticsearch first checks the inverted index for a match when it receives a query. If it finds them, it displays the documents that match that query. Therefore, if we query elasticsearch for the value “Warsaw”, a document with the Keyword type may not return the result because the value is literally “Wiejska 20, Warsaw”. The opposite is true in the case of the Text type – because the field content has been analyzed, elasticsearch is able to find single words in the inverted index and return the answer in the form of a document.

Of course, there are still different kinds of search queries to Elasticsearch, and depending on which ones are used, you may get different results. This point, however, does not directly touch upon the differences in the field types themselves.

 

Summary

The differences between the two types are significant. We generally use keyword types for constant values in the data that we do not want to analize, eg country name, application name, etc. We use the Text type when we need to use the full text search power, eg in the message field that contains the original message from the system.

 

[1] In truth, the inverted index is a aspect of the Apache Lucene engine on which Elasticsearch was developed. A shortcut was used to make understanding of the problem easier.