16.6 C
Jaipur
Wednesday, December 1, 2021

Elasticsearch Fuzzy Search

Must read

Have you ever wondered how search engine tools such as Google can accurately predict your search queries as you type? Or correct typos in your search queries? No, it’s not human-like reasoning.

This functionality is possible because of a concept called fuzzy logic, fuzziness, or fuzzy search.

What is Fuzziness?

Fuzziness or Fuzzy logic refers to a mathematical logic that describes that the truth of a value can be a range of numbers between 0 and 1, where 1 represents absolute truth, and 0 represents absolute false.

Unlike Boolean logic with two distinct values: 0 and 1, fuzzy logic accepts a variable range of truthy and falsy.

In simple terms, fuzziness describes how clear a value can be. Take, for example, a typo. How do you know it’s a typo? You evaluate the existing letters and determine what the word was trying to describe.

In Fuzzy logic, we can express this in the range of 0 and 1. If a typo is helli, it is more likely to describe hello or hell rather than it describing “human” or “tomato.”

Fuzzy search in Elasticsearch

A fuzzy search will find the matching values based on the initial query based on the specific criteria.

Elasticsearch implements fuzziness using the Levenshtein edit distance algorithm.

The edit distance is the total number of word variations such as edits, deletes, replacements, or transposes of the initial word to reach a target word.

How Elasticsearch Fuzzy Search Works

The algorithm is simple to understand. You start by taking two words and comparing them sequentially, character by character.

If the character is different, you increment the distance between the words by one. You then determine the final distance value between the letters.

NOTE: The above does not describe the mathematical implementation of the algorithm. Consider the link

Elasticsearch Fuzzy Query

A fuzzy query is not very different from a regular Elasticsearch query. To use it, add the fuzziness parameter to the search query.

You can also add the maximum Levenshtein distance to use in your query.

GET kibana_sample_data_ecommerce/_search
{
“query”: {
“fuzzy”: {
“category”: {
“value”: “Men’s”,
“fuzziness”: 0.5
  }
    }
  }
}

We run a fuzzy query to match the terms “Men’s” and set a custom fuzziness value in the example above.

Below is an example response:

If you lower the edit distance, Elasticsearch will limit the search distance, which will lead to missing some values.

For example, the following query with an edit distance of 0.2 will return no results.

GET kibana_sample_data_ecommerce/_search
{
“query”: {
“fuzzy”: {
“category”: {
“value”: “Men’s”,
“fuzziness”: 0.2
  }
    }
  }
}

The response is as shown below:

Elasticsearch Fuzziness parameter

As mentioned, the fuzziness parameter in the query sets the maximum Levenshtein edit distance or the number of edits.

We can specify the fuzziness values as: 0, 1, 2, or AUTO.

When you manually set the edit distance for your fuzzy queries, you may miss some results. Elasticsearch provides an auto value that lets Elasticsearch determine the edit distance.

For example:

GET kibana_sample_data_ecommerce/_search
{
“query”: {
“fuzzy”: {
“category”: {
“value”: “Men’s”,
“fuzziness”: “AUTO”
  }
    }
  }
}

If you have the value set to AUTO, Elasticsearch will edit the distance based on the length of the search term. You can also specify low and high values for auto as:

AUTO:[low] or AUTO:[high]

To understand how fuzziness works in Elasticsearch, check the documentation.

Fuzzy Multi-Match Query

You can use Elasticsearch multi-query with fuzziness, as shown in the example query below:

GET kibana_sample_data_ecommerce/_search
{
“query”: {
“multi_match”: {
“query”: “Shoes”,
“fields”: [“category”, “customer_first_name”],
“fuzziness”: “AUTO”
  }
  }
}

The response for this is:

Elasticsearch Fuzzy Search Parameters

The field and value parameters are required when using the fuzzy query in Elasticsearch. Other parameters such as fuzziness are optional but can play an essential role in the query.

Other parameters include:

max_expansions – Controls the number of variations created. It is set to 60 by default. Elasticsearch discourages specifying a high value for max_expansions as it may lead to poor performance.

Transpositions – Determines whether the edit distance includes transpositions of two adjacent characters. This value is set to true by default.

prefix_length – This sets the number of initial characters to leave unaltered during expansion. This value is set to 0 by default.

Rewrite – Sets the method to rewrite the query. The default value is set to constant_core. Other methods include:

  1. constant_core_boolean
  2. scoring_boolean
  3. top_terms_boost_N
  4. top_terms_N
  5. top_terms_blended_freqs_N

NOTE: Avoid changing the rewrite method unless you are sure of what you are doing.

Conclusion

Elasticsearch is a powerful tool on its own. However, as shown in this tutorial, it can provide massive power when coupled with features such as fuzzy queries.

It is good to note that there is more to Elasticsearch fuzzy query than discussed in this guide. Please consider the documentation to learn more.

Thank you for reading!

Source link

- Advertisement -

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -

Latest article