当前位置:网站首页>Data cleaning - ingest using es
Data cleaning - ingest using es
2022-07-30 23:14:00 【talen_hx296】
通常es产品里面,数据清洗的logstash,Here use anotheringest做简单的数据处理
Here is the data separated by comma,变成数组
PUT spring_blogs/_doc/1
{
"title":"Introducing spring framework......",
"tags":"spring,spring boot,spring cloud",
"content":"You konw, for spring framework"
}
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "to split blog tags",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"title": "Introducing spring framework......",
"tags": "spring,spring boot,spring cloud",
"content": "You konw, for spring framework"
}
},
{
"_index": "index",
"_id": "idxx",
"_source": {
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
}
]
}# 为ES添加一个 Pipeline
PUT _ingest/pipeline/spring_blog_pipeline
{
"description": "a spring blog pipeline",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
},
{
"set":{
"field": "views",
"value": 0
}
}
]
}
#查看Pipleline
GET _ingest/pipeline/spring_blog_pipeline
#测试pipeline
POST _ingest/pipeline/spring_blog_pipeline/_simulate
{
"docs": [
{
"_source": {
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
}
]
}
DELETE spring_blogs
PUT spring_blogs/_doc/1
{
"title":"Introducing spring framework......",
"tags":"spring,spring boot,spring cloud",
"content":"You konw, for spring framework"
}
#使用pipeline更新数据
PUT spring_blogs/_doc/2?pipeline=spring_blog_pipeline
{
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
POST spring_blogs/_search
#增加update_by_query的条件
POST spring_blogs/_update_by_query?pipeline=spring_blog_pipeline
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "views"
}
}
}
}
}The final processed data
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "spring_blogs",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "Introducing cloud computering",
"content" : "You konw, for cloud",
"views" : 0,
"tags" : [
"docker",
"k8s",
"ingrest"
]
}
},
{
"_index" : "spring_blogs",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "Introducing spring framework......",
"content" : "You konw, for spring framework",
"views" : 0,
"tags" : [
"spring",
"spring boot",
"spring cloud"
]
}
}
]
}
}还可以使用Script Prcessor,This degree of freedom is greater,Can handle slightly more complex data
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "to split spring blog tags",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
},
{
"script": {
"source": """
if(ctx.containsKey("title")){
ctx.content_length = ctx.title.length();
}else{
ctx.content_length=0;
}
"""
}
},
{
"set": {
"field": "views",
"value": 0
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"title": "Introducing spring framework......",
"tags": "spring,spring boot,spring cloud",
"content": "You konw, for spring framework"
}
},
{
"_index": "index",
"_id": "idxx",
"_source": {
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
}
]
}边栏推荐
- PhpMetrics 使用
- 力扣题(3)—— 无重复字符的最长子串
- 10 个关于自动化发布管理的好处
- Py之pdpbox:pdpbox的简介、安装、案例应用之详细攻略
- 2022中国物流产业大会暨企业家高峰论坛在杭州举办!
- Go语学习笔记 - gorm使用 - gorm处理错误 Web框架Gin(十)
- [SAM模板题] P3975 [TJOI2015] 弦论
- Apache Doris series: In-depth understanding of real-time analytical database Apache Doris
- Day016 Classes and Objects
- [SAM template question] P3975 [TJOI2015] string theory
猜你喜欢
随机推荐
阿里云视频点播+项目实战
"NIO Cup" 2022 Nioke Summer Multi-School Training Camp 4 DHKLN
2022.7.28
EasyExcel综合课程实战
2sk2225 Substitute 3A/1500V Chinese Documentation【PDF Data Book】
grub 学习
Py之pdpbox:pdpbox的简介、安装、案例应用之详细攻略
CPM:A large-scale generative chinese pre-trained lanuage model
[MySQL] Related operations on databases and tables in MySQL
2021GDCPC广东省大学生程序设计竞赛 H.History
2022 China Logistics Industry Conference and Entrepreneur Summit Forum will be held in Hangzhou!
Computer shortcut icon whitening solution
详解操作符
PyTorch model export to ONNX file example (LeNet-5)
Day016 Classes and Objects
电脑快捷方式图标变白解决方案
Apache Doris系列之:安装与部署详细步骤
language code table
IDEA使用技巧
Chapter 8 Intermediate Shell Tools II









