-
Introduction
- 入门
- 分布式集群
- 数据
- 分布式增删改查
- 搜索
- 映射和分析
- 结构化查询
- 排序
- 分布式搜索
- 索引管理
- 深入分片
- 结构化搜索
- 全文搜索
- 多字段搜索
- 模糊匹配
- Partial_Matching
- Relevance
- Language intro
- Identifying words
- Token normalization
- Stemming
- Stopwords
- Synonyms
- Fuzzy matching
-
Aggregations
-
overview
-
circuit breaker fd settings
-
filtering
-
facets
-
docvalues
-
eager
-
breadth vs depth
-
Conclusion
-
concepts buckets
-
basic example
-
add metric
-
nested bucket
-
extra metrics
-
bucket metric list
-
histogram
-
date histogram
-
scope
-
filtering
-
sorting ordering
-
approx intro
-
cardinality
-
percentiles
-
sigterms intro
-
sigterms
-
fielddata
-
analyzed vs not
-
overview
- 地理坐标点
- Geohashe
- 地理位置聚合
- 地理形状
- 关系
- 嵌套
- Parent Child
- Scaling
- Cluster Admin
- Deployment
- Post Deployment
[[phrase-matching]] === Phrase Matching
In the same way that the match
query is the go-to query for standard
full-text search, the match_phrase
query((("proximity matching", "phrase matching")))((("phrase matching")))((("match_phrase query"))) is the one you should reach for
when you want to find words that are near each other:
[source,js]
GET /my_index/my_type/_search { "query": { "match_phrase": { "title": "quick brown fox" } } }
// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json
Like the match
query, the match_phrase
query first analyzes the query
string to produce a list of terms. It then searches for all the terms, but
keeps only documents that contain all of the search terms, in the same
positions relative to each other. A query for the phrase quick fox
would not match any of our documents, because no document contains the word
quick
immediately followed by fox
.
[TIP]
The match_phrase
query can also be written as a match
query with type
phrase
:
[source,js]
"match": { "title": { "query": "quick brown fox", "type": "phrase" } }
// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json
==================================================
==== Term Positions
When a string is analyzed, the analyzer returns not((("phrase matching", "term positions")))((("match_phrase query", "position of terms")))((("position-aware matching"))) only a list of terms, but also the position, or order, of each term in the original string:
[source,js]
GET /_analyze?analyzer=standard Quick brown fox
// SENSE: 120_Proximity_Matching/05_Term_positions.json
This returns the following:
[role="pagebreak-before"] [source,js]
{ "tokens": [ { "token": "quick", "start_offset": 0, "end_offset": 5, "type": "", "position": 1 <1> }, { "token": "brown", "start_offset": 6, "end_offset": 11, "type": "", "position": 2 <1> }, { "token": "fox", "start_offset": 12, "end_offset": 15, "type": "", "position": 3 <1> } ] }
<1> The position
of each term in the original string.
Positions can be stored in the inverted index, and position-aware queries like
the match_phrase
query can use them to match only documents that contain
all the words in exactly the order specified, with no words in-between.
==== What Is a Phrase
For a document to be considered a((("match_phrase query", "documents matching a phrase")))((("phrase matching", "criteria for matching documents"))) match for the phrase ``quick brown fox,'' the following must be true:
quick
,brown
, andfox
must all appear in the field.The position of
brown
must be1
greater than the position ofquick
.The position of
fox
must be2
greater than the position ofquick
.
If any of these conditions is not met, the document is not considered a match.
[TIP]
Internally, the match_phrase
query uses the low-level span
query family to
do position-aware matching. ((("match_phrase query", "use of span queries for position-aware matching")))((("span queries")))Span queries are term-level queries, so they have
no analysis phase; they search for the exact term specified.
Thankfully, most people never need to use the span
queries directly, as the
match_phrase
query is usually good enough. However, certain specialized
fields, like patent searches, use these low-level queries to perform very
specific, carefully constructed positional searches.
==================================================