-
Introduction
- 入门
- 分布式集群
- 数据
- 分布式增删改查
- 搜索
- 映射和分析
- 结构化查询
- 排序
- 分布式搜索
- 索引管理
- 深入分片
- 结构化搜索
- 全文搜索
- 多字段搜索
- 模糊匹配
- Partial_Matching
- Relevance
- Language intro
- Identifying words
- Token normalization
- Stemming
- Stopwords
- Synonyms
- Fuzzy matching
-
Aggregations
-
overview
-
circuit breaker fd settings
-
filtering
-
facets
-
docvalues
-
eager
-
breadth vs depth
-
Conclusion
-
concepts buckets
-
basic example
-
add metric
-
nested bucket
-
extra metrics
-
bucket metric list
-
histogram
-
date histogram
-
scope
-
filtering
-
sorting ordering
-
approx intro
-
cardinality
-
percentiles
-
sigterms intro
-
sigterms
-
fielddata
-
analyzed vs not
-
overview
- 地理坐标点
- Geohashe
- 地理位置聚合
- 地理形状
- 关系
- 嵌套
- Parent Child
- Scaling
- Cluster Admin
- Deployment
- Post Deployment
=== Multivalue Fields
A curious thing can happen when you try to use phrase matching on multivalue fields. ((("proximity matching", "on multivalue fields")))((("match_phrase query", "on multivalue fields"))) Imagine that you index this document:
[source,js]
PUT /my_index/groups/1 { "names": [ "John Abraham", "Lincoln Smith"] }
// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json
Then run a phrase query for Abraham Lincoln
:
[source,js]
GET /my_index/groups/_search { "query": { "match_phrase": { "names": "Abraham Lincoln" } } }
// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json
Surprisingly, our document matches, even though Abraham
and Lincoln
belong to two different people in the names
array. The reason for this comes
down to the way arrays are indexed in Elasticsearch.
When John Abraham
is analyzed, it produces this:
- Position 1:
john
- Position 2:
abraham
Then when Lincoln Smith
is analyzed, it produces this:
- Position 3:
lincoln
- Position 4:
smith
In other words, Elasticsearch produces exactly the same list of tokens as it would have
for the single string John Abraham Lincoln Smith
. Our example query
looks for abraham
directly followed by lincoln
, and these two terms do
indeed exist, and they are right next to each other, so the query matches.
Fortunately, there is a simple workaround for cases like these, called the
position_offset_gap
, which((("mapping (types)", "position_offset_gap")))((("position_offset_gap"))) we need to configure in the field mapping:
[source,js]
DELETE /my_index/groups/ <1>
PUT /my_index/_mapping/groups <2> { "properties": { "names": { "type": "string", "position_offset_gap": 100 } } }
// SENSE: 120_Proximity_Matching/15_Multi_value_fields.json
<1> First delete the groups
mapping and all documents of that type.
<2> Then create a new groups
mapping with the correct values.
The position_offset_gap
setting tells Elasticsearch that it should increase
the current term position
by the specified value for every new array
element. So now, when we index the array of names, the terms are emitted with
the following positions:
- Position 1:
john
- Position 2:
abraham
- Position 103:
lincoln
- Position 104:
smith
Our phrase query would no longer match a document like this because abraham
and lincoln
are now 100 positions apart. You would have to add a slop
value of 100 in order for this document to match.