当前位置:网站首页>数据清洗-使用es的ingest
数据清洗-使用es的ingest
2022-07-30 23:10:00 【talen_hx296】
通常es产品里面,数据清洗的logstash,这里使用另外的ingest做简单的数据处理
这里是根据逗号分隔数据,变成数组
PUT spring_blogs/_doc/1
{
"title":"Introducing spring framework......",
"tags":"spring,spring boot,spring cloud",
"content":"You konw, for spring framework"
}
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "to split blog tags",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"title": "Introducing spring framework......",
"tags": "spring,spring boot,spring cloud",
"content": "You konw, for spring framework"
}
},
{
"_index": "index",
"_id": "idxx",
"_source": {
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
}
]
}# 为ES添加一个 Pipeline
PUT _ingest/pipeline/spring_blog_pipeline
{
"description": "a spring blog pipeline",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
},
{
"set":{
"field": "views",
"value": 0
}
}
]
}
#查看Pipleline
GET _ingest/pipeline/spring_blog_pipeline
#测试pipeline
POST _ingest/pipeline/spring_blog_pipeline/_simulate
{
"docs": [
{
"_source": {
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
}
]
}
DELETE spring_blogs
PUT spring_blogs/_doc/1
{
"title":"Introducing spring framework......",
"tags":"spring,spring boot,spring cloud",
"content":"You konw, for spring framework"
}
#使用pipeline更新数据
PUT spring_blogs/_doc/2?pipeline=spring_blog_pipeline
{
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
POST spring_blogs/_search
#增加update_by_query的条件
POST spring_blogs/_update_by_query?pipeline=spring_blog_pipeline
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "views"
}
}
}
}
}最终处理后的数据
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "spring_blogs",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "Introducing cloud computering",
"content" : "You konw, for cloud",
"views" : 0,
"tags" : [
"docker",
"k8s",
"ingrest"
]
}
},
{
"_index" : "spring_blogs",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "Introducing spring framework......",
"content" : "You konw, for spring framework",
"views" : 0,
"tags" : [
"spring",
"spring boot",
"spring cloud"
]
}
}
]
}
}还可以使用Script Prcessor,这种自由度更大,可以处理稍微复杂点数据
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "to split spring blog tags",
"processors": [
{
"split": {
"field": "tags",
"separator": ","
}
},
{
"script": {
"source": """
if(ctx.containsKey("title")){
ctx.content_length = ctx.title.length();
}else{
ctx.content_length=0;
}
"""
}
},
{
"set": {
"field": "views",
"value": 0
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"title": "Introducing spring framework......",
"tags": "spring,spring boot,spring cloud",
"content": "You konw, for spring framework"
}
},
{
"_index": "index",
"_id": "idxx",
"_source": {
"title": "Introducing cloud computering",
"tags": "docker,k8s,ingrest",
"content": "You konw, for cloud"
}
}
]
}边栏推荐
- 2022.7.28
- Go1.18升级功能 - 模糊测试Fuzz 从零开始Go语言
- 解决一个Mysql的utf8编码导致的问题
- “蔚来杯“2022牛客暑期多校训练营2 H.Take the Elevator
- Apache Doris series: In-depth understanding of real-time analytical database Apache Doris
- 关于XML的学习(一)
- 如何在 AWS 中应用 DevOps 方法?
- 【MySQL】MySQL中对数据库及表的相关操作
- ML's shap: Based on FIFA 2018 Statistics (2018 Russia World Cup) team match star classification prediction data set using RF random forest + calculating SHAP value single-sample force map/dependency c
- ML之shap:基于FIFA 2018 Statistics(2018年俄罗斯世界杯足球赛)球队比赛之星分类预测数据集利用RF随机森林+计算SHAP值单样本力图/依赖关系贡献图可视化实现可解释性之攻略
猜你喜欢

MySQL联合查询(多表查询)

Apache Doris系列之:深入认识实时分析型数据库Apache Doris

一文详解:SRv6 Policy模型、算路及引流

Py's pdpbox: a detailed introduction to pdpbox, installation, and case application

2022中国物流产业大会暨企业家高峰论坛在杭州举办!

Chapter 8 Intermediate Shell Tools II

宁波中宁典当转让29.5%股权为283.38万元,2021年所有者权益为968.75万元

【无标题】

电脑快捷方式图标变白解决方案
![[MySQL] DQL related operations](/img/a5/c92e0404c6a970a62595bc7a3b68cd.gif)
[MySQL] DQL related operations
随机推荐
MySQL索引常见面试题(2022版)
力扣题(2)—— 两数相加
2sk2225 Substitute 3A/1500V Chinese Documentation【PDF Data Book】
2sk2225代换3A/1500V中文资料【PDF数据手册】
[SAM template question] P3975 [TJOI2015] string theory
10 个关于自动化发布管理的好处
"Code execution cannot continue because MSVCP140.dll was not found, reinstalling the program may resolve the problem, etc." Solutions
Golang 切片删除指定元素的几种方法
2022.7.30
CPM:A large-scale generative chinese pre-trained lanuage model
PyTorch model export to ONNX file example (LeNet-5)
mysql跨库关联查询(dblink)
mysql获取近7天,7周,7月,7年日期,根据当前时间获取近7天,7周,7月,7年日期
基于 Docker Compose 的 Nacos(MySQL 持久化)的搭建
Calico 网络通信原理揭秘
阿里云视频点播+项目实战
测试人面试 常被问到的计算机网络题,高薪回答模板来了
ZZULIOJ:1120: 最值交换
【无标题】
Week 19 Progress (Understanding IoT Basics)