-
Introduction
- 入门
- 分布式集群
- 数据
- 分布式增删改查
- 搜索
- 映射和分析
- 结构化查询
- 排序
- 分布式搜索
- 索引管理
- 深入分片
- 结构化搜索
- 全文搜索
- 多字段搜索
- 模糊匹配
- Partial_Matching
- Relevance
- Language intro
- Identifying words
- Token normalization
- Stemming
- Stopwords
- Synonyms
- Fuzzy matching
-
Aggregations
-
overview
-
circuit breaker fd settings
-
filtering
-
facets
-
docvalues
-
eager
-
breadth vs depth
-
Conclusion
-
concepts buckets
-
basic example
-
add metric
-
nested bucket
-
extra metrics
-
bucket metric list
-
histogram
-
date histogram
-
scope
-
filtering
-
sorting ordering
-
approx intro
-
cardinality
-
percentiles
-
sigterms intro
-
sigterms
-
fielddata
-
analyzed vs not
-
overview
- 地理坐标点
- Geohashe
- 地理位置聚合
- 地理形状
- 关系
- 嵌套
- Parent Child
- Scaling
- Cluster Admin
- Deployment
- Post Deployment
[[parent-child-performance]] === Practical Considerations
Parent-child joins can be a useful technique for managing relationships when index-time performance((("parent-child relationship", "performance and"))) is more important than search-time performance, but it comes at a significant cost. Parent-child queries can be 5 to 10 times slower than the equivalent nested query!
==== Memory Use
At the time of going to press, the parent-child ID map is still held in
memory.((("parent-child relationship", "memory usage")))((("memory usage", "parent-child ID map"))) There are plans to change the map to use doc values instead, which
will be a big memory saving. Until that happens, you need to be aware of the
following: the string _id
field of every parent document has to be held in memory, and
every child document requires 8 bytes (a long value) of memory. Actually,
it's a bit less thanks to compression, but this gives you a rough idea.
You can check how much memory is being used by the parent-child cache by
consulting ((("indices-stats API")))the indices-stats
API (for a summary at the index level) or the
node-stats
API (for a summary at the node level):
[source,json]
GET /_nodes/stats/indices/id_cache?human <1>
<1> Returns memory use of the ID cache summarized by node in a human-friendly format.
==== Global Ordinals and Latency
Parent-child uses <<global-ordinals,global ordinals>> to speed((("global ordinals")))((("parent-child relationship", "global ordinals and latency"))) up joins. Regardless of whether the parent-child map uses an in-memory cache or on-disk doc values, global ordinals still need to be rebuilt after any change to the index.
The more parents in a shard, the longer global ordinals will take to build. Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.
Global ordinals, by default, are built lazily: the first parent-child query or
aggregation after a refresh will trigger building of global ordinals. This
can introduce a significant latency spike for your users. You can use
<<eager-global-ordinals,eager_global_ordinals
>> to((("eager global ordinals"))) shift the cost of
building global ordinals from query time to refresh time, by mapping the
_parent
field as follows:
[source,json]
PUT /company { "mappings": { "branch": {}, "employee": { "_parent": { "type": "branch", "fielddata": { "loading": "eager_global_ordinals" <1> } } } } }
<1> Global ordinals for the _parent
field will be built before a new segment
becomes visible to search.
With many parents, global ordinals can take several seconds to build. In this
case, it makes sense to increase the refresh_interval
so((("refresh_interval setting"))) that refreshes
happen less often and global ordinals remain valid for longer. This will
greatly reduce the CPU cost of rebuilding global ordinals every second.
==== Multigenerations and Concluding Thoughts
The ability to join multiple generations (see <>) sounds attractive until ((("grandparents and grandchildren")))((("parent-child relationship", "multi-generations")))you think of the costs involved:
- The more joins you have, the worse performance will be.
- Each generation of parents needs to have their string
_id
fields stored in memory, which can consume a lot of RAM.
As you consider your relationship schemes and whether parent-child is right for you, consider this advice ((("parent-child relationship", "guidelines for using")))about parent-child relationships:
- Use parent-child relationships sparingly, and only when there are many more children than parents.
- Avoid using multiple parent-child joins in a single query.
- Avoid scoring by using the
has_child
filter, or thehas_child
query withscore_mode
set tonone
. - Keep the parent IDs short, so that they require less memory.
Above all: think about the other relationship techniques that we have discussed before reaching for parent-child.