-
Introduction
- 入门
- 分布式集群
- 数据
- 分布式增删改查
- 搜索
- 映射和分析
- 结构化查询
- 排序
- 分布式搜索
- 索引管理
- 深入分片
- 结构化搜索
- 全文搜索
- 多字段搜索
- 模糊匹配
- Partial_Matching
- Relevance
- Language intro
- Identifying words
- Token normalization
- Stemming
- Stopwords
- Synonyms
- Fuzzy matching
-
Aggregations
-
overview
-
circuit breaker fd settings
-
filtering
-
facets
-
docvalues
-
eager
-
breadth vs depth
-
Conclusion
-
concepts buckets
-
basic example
-
add metric
-
nested bucket
-
extra metrics
-
bucket metric list
-
histogram
-
date histogram
-
scope
-
filtering
-
sorting ordering
-
approx intro
-
cardinality
-
percentiles
-
sigterms intro
-
sigterms
-
fielddata
-
analyzed vs not
-
overview
- 地理坐标点
- Geohashe
- 地理位置聚合
- 地理形状
- 关系
- 嵌套
- Parent Child
- Scaling
- Cluster Admin
- Deployment
- Post Deployment
[[language-intro]] == Getting Started with Languages
Elasticsearch ships with a collection of language analyzers that provide good, basic, out-of-the-box ((("language analyzers")))((("languages", "getting started with")))support for many of the world's most common languages:
Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.
These analyzers typically((("language analyzers", "roles performed by"))) perform four roles:
- Tokenize text into individual words:
+
The quick brown foxes
-> [The
, quick
, brown
, foxes
]
- Lowercase tokens:
+
The
-> the
- Remove common stopwords:
+
[The
, quick
, brown
, foxes
] -> [quick
, brown
, foxes
]
- Stem tokens to their root form:
+
foxes
-> fox
Each analyzer may also apply other transformations specific to its language in order to make words from that((("language analyzers", "other transformations specific to the language"))) language more searchable:
- The
english
analyzer ((("english analyzer")))removes the possessive's
:
+
John's
-> john
- The
french
analyzer ((("french analyzer")))removes elisions likel'
andqu'
and diacritics like¨
or^
:
+
l'église
-> eglis
- The
german
analyzer normalizes((("german analyzer"))) terms, replacingä
andae
witha
, orß
withss
, among others:
+
äußerst
-> ausserst