Elasticsearch Query DSL Practice: match/match_phrase/quer...

Summary

This article provides in-depth explanation of core Query DSL usage in Elasticsearch 7.3, focusing on differences and pitfalls of match, match_phrase, query_string, multi_match and other full-text search statements in real business scenarios. Through complete index mapping config, sample data and Kibana Dev Tools request examples, demonstrates from match_all query all, to match OR/AND control, then match_phrase order matching and slop tolerance, finally extending to query_string logical expressions, multi-field search and fuzzy matching.

1. Overview

Elasticsearch provides a complete query DSL (Domain Specific Language) based on JSON to define queries. Consider query DSL as query AST (Abstract Syntax Tree), composed of two types of clauses:

Leaf query clauses: Look for specific values in specific fields, like match, term, range queries
Compound query clauses: Wrap other leaf or compound queries, and used to combine multiple queries in logical ways (like bool or dis_max queries), or change their behavior (like constant_score queries)

2. Query All (match_all)

Example

POST /wzkicu-index/_search
{
  "query":{
    "match_all": {}
  }
}

Return Result Analysis

After execution, result field description:

took: Query time in milliseconds
time_out: Whether timed out
_shards: Shard information
hits: Search result overview object
total: Total searched
max_score: Highest score among all result documents
_index: Index
_type: Document type
_id: Document id
_score: Document score
_source: Document data source

3. Full-text Query

Full-text search can search analyzed text fields like email body, product description, using the same tokenization processing applied to fields during indexing to query string.

Full-text search classification includes: match query, match_phrase query, query_string query, multi_match query, etc.

3.1 Match Query

Standard query for full-text search, query conditions are relatively loose:

Need to specify field name
Input text will be tokenized, e.g., “hello world” will be split into hello and world, then matching
If field content contains hello or world, it will be queried
match is a fuzzy query with partial matching

match queries receive text/numerics/dates, tokenize them, then organize into a boolean query, can specify bool combination operation via operator (or, and, default is or).

Create Index

PUT /wzk-property
{
  "settings": {},
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "images": {
        "type": "keyword"
      },
      "price": {
        "type": "float"
      }
    }
  }
}

Add Data

POST /wzk-property/_doc/
{
  "title": "小米电视4A",
  "images": "https://profile-avatar.csdnimg.cn/xxx.jpg",
  "price": 4288
}

POST /wzk-property/_doc/
{
  "title": "小米手机",
  "images": "https://profile-avatar.csdnimg.cn/xxx.jpg",
  "price": 2699
}

POST /wzk-property/_doc/
{
  "title": "华为手机",
  "images": "https://profile-avatar.csdnimg.cn/xxx.jpg",
  "price": 5699
}

OR Match (Default)

POST /wzk-property/_search
{
  "query":{
    "match":{
      "title":"小米电视4A"
    }
  }
}

Result: Not only found Xiaomi TV, but also found Xiaomi Phone. This is because match defaults to OR relationship, after tokenization any word match counts.

AND Match

If need exact search, can use and:

POST /wzk-property/_search
{
  "query": {
    "match": {
      "title": {
        "query": "小米电视4A",
        "operator": "and"
      }
    }
  }
}

Result: Precisely matched Xiaomi TV 4A.

3.2 Match Phrase Query

match_query tokenizes, text also tokenizes. match_phrase tokenization results must all be contained in text field, and order must be same, and must be continuous.

Basic Usage

POST /wzk-property/_search
{
  "query": {
    "match_phrase": {
      "title": "小米电视"
    }
  }
}

Order Requirement

POST /wzk-property/_search
{
  "query": {
    "match_phrase": {
      "title": "电视小米"
    }
  }
}

Since “电视小米” tokenization order differs from “小米电视”, no result matched.

slop Parameter (Word Skip Tolerance)

Through slop can skip a word to allow match_phrase to match ordered result:

POST /wzk-property/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "小米4A",
        "slop": 1
      }
    }
  }
}

3.3 Query String Query

This query is similar to match, but match needs to specify field name, query_string searches in all fields, range is wider.

Query String Query provides a高级 query that matches documents without specifying a specific field, and can specify which fields to match.

Broad Query

POST /wzk-property/_search
{
  "query": {
    "query_string": {
      "query": "2699"
    }
  }
}

Specify Field Query

POST /wzk-property/_search
{
  "query": {
    "query_string": {
      "query": "2699",
      "default_field": "title"
    }
  }
}

Logical Query (OR/AND)

POST /wzk-property/_search
{
  "query": {
    "query_string": {
      "query": "手机 OR 小米",
      "default_field": "title"
    }
  }
}

POST /wzk-property/_search
{
  "query": {
    "query_string": {
      "query": "手机 AND 小米",
      "default_field": "title"
    }
  }
}

Fuzzy Query

Use ~ for fuzzy matching, ~1 allows 1 word change:

POST /wzk-property/_search
{
  "query": {
    "query_string": {
      "query": "小米~1",
      "default_field": "title"
    }
  }
}

Multi-field Support

POST /lagou-property/_search
{
  "query": {
    "query_string" : {
      "query":"2699",
      "fields": ["title","price"]
    }
  }
}

3.4 Multi-match Query

If need to search text on multiple fields, can use multi_match. multi_match supports text query on multiple fields based on match.

Basic Usage

POST /wzk-property/_search
{
  "query": {
    "multi_match" : {
      "query":"小米4A",
      "fields": ["title","images"]
    }
  }
}

4. Error Quick Reference

Symptom	Root Cause	Location	Fix
Query “小米电视4A” with match, result also includes “小米手机”	match default operator=OR, after Chinese tokenization any word match counts	Use _analyze in Kibana to see title tokenization results, confirm tokenization granularity	Explicitly set “operator”: “and” in match, or use keyword/exact match field for product names
match_phrase can’t find expected docs (like “电视小米""小米4A” both have no results)	match_phrase requires tokenization order and position contiguous, default doesn’t allow word skip	Use _analyze to see phrase tokenization order, compare with actual _source.title tokenization order	Use “slop”: N to relax position constraint, or rewrite query phrase to match document
query_string reports parse error or can’t find data	query_string syntax complex, logical operators, special characters not escaped, or default_field doesn’t contain target content	See Kibana error message, gradually simplify query string, only keep single word to verify	Avoid directly passing user input, escape + - &&
Using query_string fuzzy query “小米~1” but results too many or too few	Fuzzy matching based on edit distance, affected by analyzer, not “looks similar then hits”	Test same word with _analyze and query_string respectively, observe actual hit terms	Clarify fuzzy degree acceptable to business, reasonably set 1/2, when necessary change to prefix/wildcard or pinyin index more explicit scheme
multi_match cross-field query hits not as expected	Different field types and tokenization methods among multiple fields, score dominated by certain field, causing sorting or hit deviation	Check mapping of each field’s type/analyzer, compare with single-field match effect	Explicitly configure fields weight in multi_match (like “title^3”), unify text field analyzer, avoid mixing keyword and text causing misunderstanding
match_all can find documents, but any full-text query can’t find	Field mapped as keyword or not indexed, full-text query on wrong field	Use mapping interface to confirm field type and index attribute, check field name spelling in query JSON	Change field to text or set appropriate multi-field (text + keyword), correct field name in DSL, rebuild index then verify again

5. Summary

Distinguishing match, match_phrase, query_string, multi_match matching boundaries and tokenization semantics is key to whether Chinese search “can find”.

match: Tokenize then match any (OR) or all (AND)
match_phrase: Tokenize order continuous match, supports slop word skip
query_string: Supports logical operators, multi-field, fuzzy query
multi_match: Multi-field full-text search, supports field weight