当前位置:网站首页>flink批量读取es
flink批量读取es
2022-07-22 21:23:00 【武念】
Flink实时消费kafka数据,数据经过处理,富化、清洗等操作,写入ES。在流式计算中,此场景十分常见。
本文采用ES的批量操作BulkProcessor方式,此方式使用的是TransportClient,基于Tcp协议;而rest方式采用的是restClient,基于http协议,并不能保证结果的准确性。
参考
一、依赖准备:
主要依赖:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
主函数:
```java
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.qianxin.ida.enrich.ElasticsearchSink;
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import java.util.Arrays;
import java.util.Properties;
@Slf4j
public class KafkaToEs {
public static void main(String[] args) {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
try {
env.enableCheckpointing(10000);
//topic列表
String[] topics = new String[]{"topic1", "topic2"};
//多topic情况
Arrays.stream(topics).forEach(topic -> {
SingleOutputStreamOperator<JSONObject> dateStream = env.addSource(new FlinkKafkaConsumer<>(topic, new SimpleStringSchema(),
new Properties()).setStartFromLatest())
.map(new MapFunction<String, JSONObject>() {
//具体数据清洗操作
public JSONObject map(String value) throws Exception {
JSONObject jsonObject = new JSONObject();
jsonObject = JSON.parseObject(value);
return jsonObject;
}
});
dateStream.print();
//自定义sink
dateStream.addSink(new ElasticsearchSink());
});
env.execute("kafka2es");
} catch (Exception e) {
log.error("kafka2es fail " + e.getMessage());
}
}
}
ElasticsearchSink函数:
package com.qianxin.ida.enrich;
import com.alibaba.fastjson.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import org.elasticsearch.action.bulk.BackoffPolicy;
import org.elasticsearch.action.bulk.BulkProcessor;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import java.net.InetAddress;
import static com.qianxin.ida.utils.Constants.INDEX_SUFFIX;
@Slf4j
public class ElasticsearchSink extends RichSinkFunction<JSONObject> implements SinkFunction<JSONObject> {
private static BulkProcessor bulkProcessor = null;
@Override
public void open(Configuration parameters) throws Exception {
//BulkProcessor是一个线程安全的批量处理类,允许方便地设置 刷新 一个新的批量请求
Settings settings = Settings.builder()
.put("cluster.name", "elasticsearch")
.put("client.transport.sniff", false)
.build();
PreBuiltTransportClient preBuiltTransportClient = new PreBuiltTransportClient(settings);
TransportClient client = preBuiltTransportClient.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
BulkProcessor.Listener listener = buildListener();
BulkProcessor.Builder bulk = BulkProcessor.builder(client, listener);
//根据当前添加的操作数设置刷新新批量请求的时间(默认值为1000,-1禁用)
bulk.setBulkActions(Property.getIntValue("bulk_actions"));
//根据当前添加的操作大小设置刷新新批量请求的时间(默认为5Mb,-1禁用)
bulk.setBulkSize(new ByteSizeValue(Property.getLongValue("bulk_size"), ByteSizeUnit.MB));
//设置允许执行的并发请求数(默认为1,0仅允许执行单个请求)
bulk.setConcurrentRequests(Property.getIntValue("concurrent_request"));
//设置一个刷新间隔,如果间隔过去,刷新任何待处理的批量请求(默认为未设置)
bulk.setFlushInterval(TimeValue.timeValueSeconds(Property.getLongValue("flush_interval")));
//设置一个恒定的后退策略,最初等待1秒钟,最多重试3次
bulk.setBackoffPolicy(BackoffPolicy
.constantBackoff(TimeValue.timeValueSeconds(Property.getLongValue("time_wait")),
Property.getIntValue("retry_times")));
bulkProcessor = bulk.build();
super.open(parameters);
}
private static BulkProcessor.Listener buildListener() throws InterruptedException {
BulkProcessor.Listener listener = new BulkProcessor.Listener() {
@Override
public void beforeBulk(long l, BulkRequest bulkRequest) {
}
@Override
public void afterBulk(long l, BulkRequest bulkRequest, BulkResponse bulkResponse) {
}
@Override
public void afterBulk(long l, BulkRequest bulkRequest, Throwable throwable) {
}
};
return listener;
}
@Override
public void invoke(JSONObject jsonObject, Context context) throws Exception {
try {
String topic = jsonObject.getString("topic");
String index = "index_";
bulkProcessor.add(new IndexRequest(index)
.type(topic)
.source(jsonObject));
} catch (Exception e) {
log.error("sink 出错 {
{}},消息是{
{}}", e.getMessage(), jsonObject);
}
}
}
边栏推荐
- Daily question brushing record (XXXI)
- 4G传输模块的功能应用
- GNU LD script command language (I)
- Summary in the development process BaseService provides a public access service file for all controllers or services to reduce repeated injection
- Codeforces Round #808 (Div. 2) A - D
- Why does MySQL index use b+ tree instead of jump table?
- 开幕在即 | “万物互联,使能千行百业”2022开放原子全球开源峰会OpenAtom OpenHarmony分论坛
- Z-Wave 800:SE 固件升级
- Chapter 2 how to use sourcetree to update code locally
- 局域网SDN硬核技术内幕 18 美丽新世界
猜你喜欢
随机推荐
LAN SDN hard core technology insider 18 beautiful new world
Mysql无法访问,navicat提示:is not allowed to connect to this MySQL server
Talk about 12 business scenarios of concurrent programming
基于ROS的导航框架
聊聊并发编程的12种业务场景
第一篇sourcetree安装
How to make a high-quality VR panorama? Are there any simple ones that can be taken?
树和二叉树
Understanding service governance in distributed development
对比学习下的跨模态语义对齐是最优的吗?---自适应稀疏化注意力对齐机制 IEEE Trans. MultiMedia
成功解决:error: src refspec master does not match any
[technology popularization] alliance chain layer2- on a new possibility
Small program completion work wechat campus second-hand book trading small program graduation design finished product (2) small program function
Wechat hotel reservation applet graduation project (6) opening defense ppt
Copytexture, copytoresolvetarget of UE4 engine
开幕在即 | “万物互联,使能千行百业”2022开放原子全球开源峰会OpenAtom OpenHarmony分论坛
局域网SDN硬核技术内幕 18 美丽新世界
如何配置CANoe Network-based access模式的以太网网络拓扑
JS determines the scrolling element and solves the tab to switch the scrolling position independently
一个浏览器用户访问服务器文件的WEB服务器








