TL;DR

  • Scenario: Business has both exact matching (price, ID, time) and fault tolerance needs (prefix, fuzzy, typos).
  • Conclusion: Use term-level queries for structured exact conditions, then use bool to combine must/filter/should/must_not.
  • Output: Complete flow DSL examples from index creation, data writing to term/terms/range/exists/prefix/regexp/fuzzy/ids/bool.

Version Matrix

ItemDescription
Elasticsearch 7.xVerified in 7.x environment per DSL in article
Elasticsearch 8.xQuery DSL syntax compatible
IK Analyzer PluginExamples depend on ik_max_word tokenizer
Dev Tools / Kibana ConsoleAll examples executed based on Dev Tools console

Initial Index

Create a new book index:

PUT /book
{
  "settings": {},
  "mappings" : {
    "properties" : {
      "description" : {
        "type" : "text",
        "analyzer" : "ik_max_word"
      },
      "name" : {
        "type" : "text",
        "analyzer" : "ik_max_word"
      },
      "price" : {
        "type" : "float"
      },
      "timestamp" : {
        "type" : "date",
        "format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

Write Data

PUT /book/_doc/1
{
  "name": "lucene",
  "description": "Lucene Core is a Java library providing powerful indexing and search features...",
  "price":100.45,
  "timestamp":"2020-08-21 19:11:35"
}

PUT /book/_doc/2
{
  "name": "solr",
  "description": "Solr is highly scalable, providing fully fault tolerant distributed indexing...",
  "price":320.45,
  "timestamp":"2020-07-21 17:11:35"
}

PUT /book/_doc/3
{
  "name": "Hadoop",
  "description": "The Apache Hadoop software library is a framework...",
  "price":620.45,
  "timestamp":"2020-08-22 19:18:35"
}

PUT /book/_doc/4
{
  "name": "ElasticSearch",
  "description": "Elasticsearch是一个基于Lucene的搜索服务器...",
  "price":999.99,
  "timestamp":"2020-08-15 10:11:35"
}

Term Query

term query is used to query documents where specified field contains a certain term. term is exact retrieval, one more or less won’t work.

POST /book/_search
{
  "query": {
    "term" : {
      "name" : "solr"
    }
  }
}

Terms Query

terms query is used to query documents where specified field contains certain terms.

POST /book/_search
{
  "query": {
    "terms" : {
      "name" : ["solr", "elasticsearch"]
    }
  }
}

Range Query

  • gte: greater than or equal
  • gt: greater than
  • lte: less than or equal
  • lt: less than
  • boost: query weight
POST /book/_search
{
  "query": {
    "range" : {
      "price" : {
        "gte" : 10,
        "lte" : 200,
        "boost" : 2.0
      }
    }
  }
}

Date range query:

POST /book/_search
{
  "query": {
    "range" : {
      "timestamp" : {
        "gte": "18/08/2020",
        "lte": "2021",
        "format": "dd/MM/yyyy||yyyy"
      }
    }
  }
}

Exists Query

Query documents where specified field is not empty, equivalent to SQL column is not null.

POST /book/_search
{
  "query": {
    "exists" : { "field" : "price" }
  }
}

Prefix Query

POST /book/_search
{
  "query": {
    "prefix" : {
      "name" : "so"
    }
  }
}

Regexp Query

regexp allows using regular expressions for term query. Note: If used incorrectly, can cause serious performance issues, e.g., queries starting with * will match all keywords in inverted index, almost like full table scan.

POST /book/_search
{
  "query": {
    "regexp":{
      "name": "s.*"
    }
  }
}

With boost value:

POST /book/_search
{
  "query": {
    "regexp":{
      "name":{
        "value":"s.*",
        "boost":1.2
      }
    }
  }
}

Fuzzy Query

POST /book/_search
{
  "query": {
    "fuzzy" : {
      "name" : "sol"
    }
  }
}

POST /book/_search
{
  "query": {
    "fuzzy" : {
      "name" : "so"
    }
  }
}

POST /book/_search
{
  "query": {
    "fuzzy" : {
      "name" : {
        "value": "so",
        "fuzziness": 2
      }
    }
  }
}

Typos matching:

POST /book/_search
{
  "query": {
    "fuzzy" : {
      "name" : {
        "value": "sorl"
      }
    }
  }
}

POST /book/_search
{
  "query": {
    "fuzzy" : {
      "name" : {
        "value": "osrl",
        "fuzziness":2
      }
    }
  }
}

IDs Query

POST /book/_search
{
  "query": {
    "ids" : {
      "values" : ["1", "3"]
    }
  }
}

Compound Query - Bool Query

bool query combines query clauses into one query using keywords:

  • must: Must match
  • filter: Must match, simple check for inclusion/exclusion, very fast, doesn’t participate or affect scoring
  • should: OR relationship
  • must_not: Must not match, executed in filter context, doesn’t participate or affect scoring

Example business requirements:

  • description must have Java
  • price must satisfy greater than 100 less than 1000
  • name field can be either lucene or solr
  • timestamp satisfies certain time point
POST /book/_search
{
  "query": {
    "bool": {
      "filter": {
        "match": {
          "description": "java"
        }
      },
      "must": [
        {
          "range": {
            "price": {
              "gte": 100,
              "lte": 1000
            }
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "name": "lucene"
                }
              },
              {
                "term": {
                  "name": "solr"
                }
              }
            ]
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "timestamp": {
              "gte": "18/08/2020",
              "lte": "2021",
              "format": "dd/MM/yyyy||yyyy"
            }
          }
        }
      ]
    }
  }
}

Error Quick Reference

SymptomRoot Cause LocationFix
Using term to query text field Chinese content has very low hit ratetext field is tokenized, term matches by tokenized token exactlyFor exact match use .keyword subfield or change field to keyword
range query date has no results or reports date parsing errorDate string in gte/lte doesn’t match field formatAdjust date format in DSL to match format in mappings
exists query has unusually low hit countField not actually written, field name typo, or dynamically mapped as object/nested structureUse _source to see original document structure, confirm field path and name
prefix / regexp query high CPU, slow responseDoing prefix/regexp scan on high cardinality fields, and regex starts with *Try to add fixed prefix, avoid patterns like .*xxx
fuzzy query returns unstable results or significantly slowerfuzziness set too large, allowed edit distance too highKeep fuzziness at 1-2
bool query results more or less than expectedConfused semantics of must, should, filter, must_notPrecisely distinguish meaning of each keyword
Some IDs can’t be hit in ids queryIDs written as mixed string/numeric or index name inconsistentKeep ID type consistent; confirm index name correct