信长笔记2.0

概览

json

主要分两种
- URI Search
示例: curl -XGET "http://localhost:9200/test_index/_search?q=cusomer"

- Request Body Search
示例: curl -XGET "http://localhost:9200/test_index/_search" -H 'Content-Type: application/json' -d
'{
	"query":{
		"match_all":{}
	}
}'

URI Search

json

示例
GET /movies/_search?q=2012&df=title&sort=year:desc&from=0&size=10&timeout=1s
{
	"profile":true
}

- q 指定查询语句, 使用Query String Syntax
- df 默认字段, 不指定会对所有字段进行查询, q=2012&df=title 等价于 q=title:2012
- sort 排序, from size 分页
- profile 可以查看查询是如何被执行的

具体不同查询类型操作, 略

Request Body Search

json

示例
POST /movies/_search
{
  "script_fileds":{ // 脚本字段
    "new_filed":{
    	"script":{
      	"lang":"painless",
        "source":"doc['order_date'].value+'hello'"
      }
    }
  },
  "_source":["order_date","category.keyword"], // 指定返回的字段
  "sort":[{"order_date":"desc"}],
	"from":10,
  "size":20,
  "query":{
  	"match_all":{}
  }
}

- 使用查询表达式
POST movies/_search
{
  "query": {
    "match": {
      "title": {
        "query": "last christmas",
        "operator": "and"
      }
    }
  }
}

POST movies/_search
{
  "query": {
    "match_phrase": {
      "title":{
        "query": "one love",
        "slop": 1 
      }
    }
  }
}

Query String & Simple Query String

简单来说后者禁止了一些高级查询并忽略了一些语法错误

Mapping

Mapping类似数据库的schema定义, 作用:

定义索引中字段名称
定义字段数据类型
定义字段的倒排索引的相关配置, 如analyzed, analyzer

字段数据类型:

简单类型
- Text/Keyword
- Date
- Integer/Floating
- Boolean
- IPv4/IPv6
复杂类型-对象和嵌套对象
特殊类型
- geo_point&geo_shape/percolator

Dynamic Mapping

写入文档时, 如果索引不存在, 则会自动创建索引, 此时es会根据文档信息, 推断出字段类型

1745061447351

json

示例:
# dynamic mapping，推断字段的类型
PUT mapping_test/_doc/1
{
    "uid" : "123",
    "isVip" : false,
    "isAdmin": "true",
    "age":19,
    "heigh":180
}

# 查看 Dynamic
GET mapping_test/_mapping
从下面结果可以看到uid,isAdmin只会被识别为字符串, 被设置为Text, 并添加了keyword子字段

{
  "mapping_test" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "heigh" : {
          "type" : "long"
        },
        "isAdmin" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "isVip" : {
          "type" : "boolean"
        },
        "uid" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Dynamic Mapping配置

1745061459336

json

#修改为dynamic false
PUT dynamic_mapping_test/_mapping
{
  "dynamic": "false"
}

显式定义Mapping

json

示例:
PUT movies
{
	"mappings":{...}
}

初建时可以基于dynamic mapping进行修改

常见参数:

index, 控制当前字段是否被索引, 默认为true
index options, 控制倒排索引记录的内容, text默认position, 其他默认docs
- docs, 记录doc id
- freqs, 记录doc id/term frequencies
- positions. 记录doc id/term frequencies/term position
- offsets, 记录doc id/term frequencies/term position/character offects
null_value, 需要对Null值实现搜索, 只有keyword类型支持设定null_value
copy_to, 字段拷贝

json

#设置 index 为 false
PUT users
{
    "mappings" : {
      "properties" : {
        "firstName" : {
          "type" : "text"
        },
        "lastName" : {
          "type" : "text"
        },
        "mobile" : {
          "type" : "text",
          "index": false
        }
      }
    }
}


#设定Null_value
PUT users
{
    "mappings" : {
      "properties" : {
        "firstName" : {
          "type" : "text"
        },
        "lastName" : {
          "type" : "text"
        },
        "mobile" : {
          "type" : "keyword",
          "null_value": "NULL"
        }

      }
    }
}

#设置 Copy to
PUT users
{
  "mappings": {
    "properties": {
      "firstName":{
        "type": "text",
        "copy_to": "fullName"
      },
      "lastName":{
        "type": "text",
        "copy_to": "fullName"
      }
    }
  }
}

PUT users/_doc/1
{
  "firstName":"Ruan",
  "lastName": "Yiming"
}

GET users/_search?q=fullName:(Ruan Yiming)

POST users/_search
{
  "query": {
    "match": {
       "fullName":{
        "query": "Ruan Yiming",
        "operator": "and"
      }
    }
  }
}

多字段特性

mapping中的keyword指的是不需要分词的精确值, text是会被分词的.
text类型字段自动生成mapping时, 会自动添加一个keyword子字段供精确检索
此外, 还可以添加自定义子字段, 采用自定义分词器, 以满足不同条件下的搜索

自定义分词补充

Character Filters, 提前进行文本处理, 如增加删除以及替换字符, 可配置多个

自带的Character Filters:

HTML strip, 去除html标签
Mapping, 字符串替换
Pattern replace, 正则匹配替换

Tokenizer, 将原始文本按照一定的规则, 切分为词

es内置的Tokenizers:

whitespace/standard/uax_url_email/pattern/keyword/path hierarchy

Token Filters, 将Tokenizer输出的单词进行增加修改删除

自带的Token Filters:

Lowercase/stop/synonym(添加近义词)

json

PUT logs/_doc/1
{"level":"DEBUG"}

GET /logs/_mapping

POST _analyze
{
  "tokenizer":"keyword",
  "char_filter":["html_strip"],
  "text": "<b>hello world</b>"
}


POST _analyze
{
  "tokenizer":"path_hierarchy",
  "text":"/user/ymruan/a/b/c/d/e"
}



#使用char filter进行替换
POST _analyze
{
  "tokenizer": "standard",
  "char_filter": [
      {
        "type" : "mapping",
        "mappings" : [ "- => _"]
      }
    ],
  "text": "123-456, I-test! test-990 650-555-1234"
}

//char filter 替换表情符号
POST _analyze
{
  "tokenizer": "standard",
  "char_filter": [
      {
        "type" : "mapping",
        "mappings" : [ ":) => happy", ":( => sad"]
      }
    ],
    "text": ["I am felling :)", "Feeling :( today"]
}

// white space and snowball
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["stop","snowball"],
  "text": ["The gilrs in China are playing this game!"]
}


// whitespace与stop
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["stop","snowball"],
  "text": ["The rain in Spain falls mainly on the plain."]
}


//remove 加入lowercase后，The被当成 stopword删除
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["lowercase","stop","snowball"],
  "text": ["The gilrs in China are playing this game!"]
}

//正则表达式
GET _analyze
{
  "tokenizer": "standard",
  "char_filter": [
      {
        "type" : "pattern_replace",
        "pattern" : "http://(.*)",
        "replacement" : "$1"
      }
    ],
    "text" : "http://www.elastic.co"
}

Index Template & Dynamic Template

Index Template

索引模板, 可以设定一个模板固化mapping和setting, 并按照一定规则, 自动匹配到新创建的索引上

模板仅在索引新建时有用, 修改模板不会影响已有的索引
可以设定多个索引模板, 指定order, 多个设置会按规则merge
优先级为: 用户指定 > order高的Index Template > order底的index Template

json

#Create a default template
PUT _template/template_default
{
  "index_patterns": ["*"],
  "order" : 0,
  "version": 1,
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas":1
  }
}


PUT /_template/template_test
{
    "index_patterns" : ["test*"],
    "order" : 1,
    "settings" : {
    	"number_of_shards": 1,
        "number_of_replicas" : 2
    },
    "mappings" : {
    	"date_detection": false,
    	"numeric_detection": true
    }
}


#查看template信息
GET /_template/template_default
GET /_template/temp*

DELETE /_template/template_default
DELETE /_template/template_test

Dynamic Template

应用在某一个具体的索引上, 可以自定义一些字段类型推断的规则, 如:

is开头的字段都设置成boolean

json

#Dynaminc Mapping 根据类型和字段名
DELETE my_index

PUT my_index/_doc/1
{
  "firstName":"Ruan",
  "isVIP":"true"
}

GET my_index/_mapping
DELETE my_index

#示例, 字符串设置成boolean, 字符串设置成keyword
PUT my_index
{
  "mappings": {
    "dynamic_templates": [
            {
        "strings_as_boolean": {
          "match_mapping_type":   "string",
          "match":"is*",
          "mapping": {
            "type": "boolean"
          }
        }
      },
      {
        "strings_as_keywords": {
          "match_mapping_type":   "string",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }
}


DELETE my_index
#示例, 结合路径, 组合姓名
PUT my_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "full_name": {
          "path_match":   "name.*",
          "path_unmatch": "*.middle",
          "mapping": {
            "type":       "text",
            "copy_to":    "full_name"
          }
        }
      }
    ]
  }
}


PUT my_index/_doc/1
{
  "name": {
    "first":  "John",
    "middle": "Winston",
    "last":   "Lennon"
  }
}

GET my_index/_search?q=full_name:John

聚合分析

es的聚合(aggregation)是对数据进行统计分析的功能 1745061421898

1745061428610

json

Elasticsearch聚合分析简介
课程Demo

需要通过Kibana导入Sample Data的飞机航班数据。具体参考“2.2节-Kibana的安装与界面快速浏览”
#按照目的地进行分桶统计
GET kibana_sample_data_flights/_search
{
	"size": 0,
	"aggs":{
		"flight_dest":{
			"terms":{
				"field":"DestCountry"
			}
		}
	}
}



#查看航班目的地的统计信息，增加平均，最高最低价格
GET kibana_sample_data_flights/_search
{
	"size": 0,
	"aggs":{
		"flight_dest":{
			"terms":{
				"field":"DestCountry"
			},
			"aggs":{
				"avg_price":{
					"avg":{
						"field":"AvgTicketPrice"
					}
				},
				"max_price":{
					"max":{
						"field":"AvgTicketPrice"
					}
				},
				"min_price":{
					"min":{
						"field":"AvgTicketPrice"
					}
				}
			}
		}
	}
}



#价格统计信息+天气信息
GET kibana_sample_data_flights/_search
{
	"size": 0,
	"aggs":{
		"flight_dest":{
			"terms":{
				"field":"DestCountry"
			},
			"aggs":{
				"stats_price":{
					"stats":{
						"field":"AvgTicketPrice"
					}
				},
				"wather":{
				  "terms": {
				    "field": "DestWeather",
				    "size": 5
				  }
				}

			}
		}
	}
}

相关阅读

https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-aggregations.html

概览 ​

URI Search ​

Request Body Search ​

Query String & Simple Query String ​

Mapping ​

Dynamic Mapping ​

显式定义Mapping ​

多字段特性 ​

Index Template & Dynamic Template ​

Index Template ​

Dynamic Template ​

聚合分析 ​

概览

URI Search

Request Body Search

Query String & Simple Query String

Mapping

Dynamic Mapping

显式定义Mapping

多字段特性

Index Template & Dynamic Template

Index Template

Dynamic Template

聚合分析