TL;DR
- Scenario: Business has both exact matching (price, ID, time) and fault tolerance needs (prefix, fuzzy, typos).
- Conclusion: Use term-level queries for structured exact conditions, then use bool to combine must/filter/should/must_not.
- Output: Complete flow DSL examples from index creation, data writing to term/terms/range/exists/prefix/regexp/fuzzy/ids/bool.
Version Matrix
| Item | Description |
|---|---|
| Elasticsearch 7.x | Verified in 7.x environment per DSL in article |
| Elasticsearch 8.x | Query DSL syntax compatible |
| IK Analyzer Plugin | Examples depend on ik_max_word tokenizer |
| Dev Tools / Kibana Console | All examples executed based on Dev Tools console |
Initial Index
Create a new book index:
PUT /book
{
"settings": {},
"mappings" : {
"properties" : {
"description" : {
"type" : "text",
"analyzer" : "ik_max_word"
},
"name" : {
"type" : "text",
"analyzer" : "ik_max_word"
},
"price" : {
"type" : "float"
},
"timestamp" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
}
}
}
Write Data
PUT /book/_doc/1
{
"name": "lucene",
"description": "Lucene Core is a Java library providing powerful indexing and search features...",
"price":100.45,
"timestamp":"2020-08-21 19:11:35"
}
PUT /book/_doc/2
{
"name": "solr",
"description": "Solr is highly scalable, providing fully fault tolerant distributed indexing...",
"price":320.45,
"timestamp":"2020-07-21 17:11:35"
}
PUT /book/_doc/3
{
"name": "Hadoop",
"description": "The Apache Hadoop software library is a framework...",
"price":620.45,
"timestamp":"2020-08-22 19:18:35"
}
PUT /book/_doc/4
{
"name": "ElasticSearch",
"description": "Elasticsearch是一个基于Lucene的搜索服务器...",
"price":999.99,
"timestamp":"2020-08-15 10:11:35"
}
Term Query
term query is used to query documents where specified field contains a certain term. term is exact retrieval, one more or less won’t work.
POST /book/_search
{
"query": {
"term" : {
"name" : "solr"
}
}
}
Terms Query
terms query is used to query documents where specified field contains certain terms.
POST /book/_search
{
"query": {
"terms" : {
"name" : ["solr", "elasticsearch"]
}
}
}
Range Query
- gte: greater than or equal
- gt: greater than
- lte: less than or equal
- lt: less than
- boost: query weight
POST /book/_search
{
"query": {
"range" : {
"price" : {
"gte" : 10,
"lte" : 200,
"boost" : 2.0
}
}
}
}
Date range query:
POST /book/_search
{
"query": {
"range" : {
"timestamp" : {
"gte": "18/08/2020",
"lte": "2021",
"format": "dd/MM/yyyy||yyyy"
}
}
}
}
Exists Query
Query documents where specified field is not empty, equivalent to SQL column is not null.
POST /book/_search
{
"query": {
"exists" : { "field" : "price" }
}
}
Prefix Query
POST /book/_search
{
"query": {
"prefix" : {
"name" : "so"
}
}
}
Regexp Query
regexp allows using regular expressions for term query. Note: If used incorrectly, can cause serious performance issues, e.g., queries starting with * will match all keywords in inverted index, almost like full table scan.
POST /book/_search
{
"query": {
"regexp":{
"name": "s.*"
}
}
}
With boost value:
POST /book/_search
{
"query": {
"regexp":{
"name":{
"value":"s.*",
"boost":1.2
}
}
}
}
Fuzzy Query
POST /book/_search
{
"query": {
"fuzzy" : {
"name" : "sol"
}
}
}
POST /book/_search
{
"query": {
"fuzzy" : {
"name" : "so"
}
}
}
POST /book/_search
{
"query": {
"fuzzy" : {
"name" : {
"value": "so",
"fuzziness": 2
}
}
}
}
Typos matching:
POST /book/_search
{
"query": {
"fuzzy" : {
"name" : {
"value": "sorl"
}
}
}
}
POST /book/_search
{
"query": {
"fuzzy" : {
"name" : {
"value": "osrl",
"fuzziness":2
}
}
}
}
IDs Query
POST /book/_search
{
"query": {
"ids" : {
"values" : ["1", "3"]
}
}
}
Compound Query - Bool Query
bool query combines query clauses into one query using keywords:
- must: Must match
- filter: Must match, simple check for inclusion/exclusion, very fast, doesn’t participate or affect scoring
- should: OR relationship
- must_not: Must not match, executed in filter context, doesn’t participate or affect scoring
Example business requirements:
- description must have Java
- price must satisfy greater than 100 less than 1000
- name field can be either lucene or solr
- timestamp satisfies certain time point
POST /book/_search
{
"query": {
"bool": {
"filter": {
"match": {
"description": "java"
}
},
"must": [
{
"range": {
"price": {
"gte": 100,
"lte": 1000
}
}
},
{
"bool": {
"should": [
{
"term": {
"name": "lucene"
}
},
{
"term": {
"name": "solr"
}
}
]
}
}
],
"must_not": [
{
"range": {
"timestamp": {
"gte": "18/08/2020",
"lte": "2021",
"format": "dd/MM/yyyy||yyyy"
}
}
}
]
}
}
}
Error Quick Reference
| Symptom | Root Cause Location | Fix |
|---|---|---|
Using term to query text field Chinese content has very low hit rate | text field is tokenized, term matches by tokenized token exactly | For exact match use .keyword subfield or change field to keyword |
range query date has no results or reports date parsing error | Date string in gte/lte doesn’t match field format | Adjust date format in DSL to match format in mappings |
exists query has unusually low hit count | Field not actually written, field name typo, or dynamically mapped as object/nested structure | Use _source to see original document structure, confirm field path and name |
prefix / regexp query high CPU, slow response | Doing prefix/regexp scan on high cardinality fields, and regex starts with * | Try to add fixed prefix, avoid patterns like .*xxx |
fuzzy query returns unstable results or significantly slower | fuzziness set too large, allowed edit distance too high | Keep fuzziness at 1-2 |
bool query results more or less than expected | Confused semantics of must, should, filter, must_not | Precisely distinguish meaning of each keyword |
Some IDs can’t be hit in ids query | IDs written as mixed string/numeric or index name inconsistent | Keep ID type consistent; confirm index name correct |