当前位置:网站首页>[collect first and use it sooner or later] 49 Flink high-frequency interview questions series (I)
[collect first and use it sooner or later] 49 Flink high-frequency interview questions series (I)
2022-06-11 17:51:00 【Big data Institute】
【 Collect first , Sooner or later 】49 individual Flink High frequency interview question series ( One )
Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network

One . You're developing Flink When the task , Have you ever met Back pressure problem , How did you check ?
1. Causes of back pressure
Back pressure often appears in the big promotion or some popular activities , In the above scenario , The sudden increase of traffic in a short time leads to the accumulation of data , The overall throughput of the system cannot be improved .
2. Method of monitoring back pressure
Can pass Flink Web UI Back pressure problem found .
Flink Of TaskManager Every time 50 ms Trigger a back pressure state monitoring , Total monitoring 100 Time , And feed the results back to JobManager, Finally by JobManager Calculate the proportion of back pressure , Then show it .
The calculation logic of this ratio is as follows :
| Degree of back pressure | Index range | remarks |
|---|---|---|
| HIGH | 0.5 < Ratio <= 1 | serious |
| LOW | 0.10 < Ratio <= 0.5 | commonly |
| OK | 0 <= Ratio <= 0.10 | normal |
3. Positioning and treatment of back pressure problem
Flink In case of back pressure, the problem can be located from the following three aspects
a. Data skew
Can be in Flink The background management page of each Task The size of the processing data . When data skews , It is obvious from the page that the amount of data processed by one or more nodes is much larger than that of other nodes . This kind of situation is generally in use KeyBy Equal component aggregation operator , Did not consider possible hot spots Key. This situation requires the user to be aware of the hot spots that cause the tilt Key Preprocessing .
b. Garbage collection mechanism (GC)
Unreasonable settings TaskManager The garbage collection parameter of can cause serious GC problem , Can pass -XX:+PrintGCDetails Command view GC Log .
c. The code itself
The user mistakenly used because he did not understand the implementation mechanism of the operator Flink operator , Cause performance problems . We can view the running machine node by CPU And memory location
Two . How to deal with... In a production environment Data skew problem ?
1. The main reasons for data skew are 2 In terms of
There are serious data hotspots in the business , For example, in the browsing data of a real estate website, the number of views of second-hand houses in several first tier cities such as Beijing and Shanghai far exceeds that in other regions . Technically, if a lot of KeyBy、GroupBy Wait for the operation , And there is no grouping Key Do something special , Data hot spot problems will arise .
2. Solutions to problems
Try to avoid hot spots in business key The design of the , For example, we can put Shanghai 、 Hot cities such as Beijing and non hot cities are divided into different regions , And treated separately ; When there are hot spots in technology , We should adjust the plan and break up the original key, Avoid direct aggregation ; Besides It can also be used Flink To avoid data skew .
3. Flink Task data skew scenarios and solutions
(1) Two stage aggregation to solve KeyBy hotspot
a. Will need to be grouped key Break up , For example, add random suffixes
b. Aggregate the scattered data
c. To be broken up key Revert to the original key
d. secondary KeyBy To count the final results and output them to the downstream
The specific code is as follows :
DataStream sourceStream = ...;
resultStream = sourceStream
.map(record -> {
Record record = JSON.parseObject(record, Record.class);
String type = record.getType();
record.setType(type + "_" + new Random().nextInt(100));
return record;
})
// First aggregation
.keyBy(0)
.window(TumblingEventTimeWindows.of(Time.minutes(5)))
.aggregate(new CountAggregate())
.map(count -> {
String key = count.getKey.substring(0, count.getKey.indexOf("_"));
return RecordCount(key,count.getCount);
})
// Carry out secondary polymerization
.keyBy(0)
.process(new CountProcessFunction);
resultStream.sink(...)
env.execute(...)(2)GroupBy + Aggregation Group aggregation hot issues
If you use FlinkSQL The way , Then you can put FlinkSQL Nested in two layers , The inner layer is scattered randomly Several copies ( Such as 100) To reduce data hotspots ,( This fragmentation method can be flexibly specified according to the business ).
select date,
city_id,
sum(pv) as pv
from(
select
date,
city_id,
floor(rand() * 100),
sum(count) as pv
from kafka_table
group by
date,
city_id,
floor(rand() * 100) -- Break up randomly into 100 Share
)
group by
date,
city_id;(3)Flink consumption Kafka Using parallelism and Kafka Data skew caused by inconsistent partition numbers
Flink consumption Kafka When the data is , Yes, the recommended consumption parallelism is Kafka Number of partitions 1 Times or integral multiples , namely Flink Consumer Parallelism of = Kafka The number of partitions * n (n = 1, 2 ,3 ...).
3、 ... and 、 One Flink There can be an existing event time window in the task , Is there another processing time window ?
Conclusion : One Flink Tasks can have event time windows at the same time , There is also a processing time window .
So some friends asked , Why do we often Flink The task is either set to event time semantics , Or set it to process time semantics ?
exactly , In the production environment , our Flink Tasks generally do not have windows with two temporal semantics at the same time .
So how to explain the conclusion mentioned at the beginning ?
Here it is explained from two perspectives :
We actually There is no need to put one Flink The task is bound to a specific temporal semantics . For the event time window , We just give it watermark, Can let watermark Keep moving forward , Let the event time window continuously trigger the calculation . Easier for processing time , As long as the window operator is triggered at a fixed time interval according to the local time . Whatever the time window , It mainly meets the trigger conditions of the time window .
Flink It is also supported in the implementation of .Flink Is to use a called TimerService Components to manage timer Of , We can register both event time and processing time timer,Flink I will judge for myself timer Whether the trigger conditions are met , If it is , Then call back the window processing function for calculation . demand : data source : User heartbeat log (uid,time,type). Calculate the score Android,iOS Of DAU, At the latest, output the result of accumulating the zero point of the current day to the current one minute .
Realization way 1:cumulate window
SELECT
window_start
, window_end
, platform
, sum(bucket_dau) as dau
from (
SELECT
window_start
, window_end
, platform
, count(distinct uid) as bucket_dau
FROM TABLE(
CUMULATE(
TABLE user_log,
DESCRIPTOR(time),
INTERVAL '60' SECOND
, INTERVAL '1' DAY))
GROUP BY
window_start
, window_end
, platform
, MOD(HASH_CODE(user_id), 1024)
) tmp
GROUP by
window_start
, window_end
, platform
advantage : If it is the demand of the graph , It can perfectly trace back the curve .
shortcoming : If there is data disorder between large windows , There is a risk of losing numbers ; And because it's watermark Drive output , So there will be a delay in data output .
Realization way 2:Deduplicate
-- If necessary, you can open minibatch
select
platform
, count(1) as dau
, max(time) as time
from (
select
uid
, platform
, time
, row_number() over (partition by uid, platform, time / 24 / 3600 / 1000 order by time desc) rn
from source
) tmp
where rn = 1
group by
platform
advantage : Fast calculation .
shortcoming : The task happened failover, Curves don't go back very well . There is no way to support cube Calculation .
Realization way 3:group agg
-- If necessary, you can open minibatch
SELECT
max(time) as time
, platform
, sum(bucket_dau) as dau
from (
SELECT
max(time) as time
, platform
, count(distinct uid) as bucket_dau
FROM source
GROUP BY
platform
, MOD(HASH_CODE(user_id), 1024)
) t
GROUP by
platform
advantage : Fast calculation , Support cube Calculation .
shortcoming : The task happened failover, Curves don't go back very well .
Four 、Flink How to support batch flow integration ?
Flink It supports both streaming and batch processing through an underlying engine .

On top of the streaming engine ,Flink There are the following mechanisms :
1. Checkpoint mechanism and state mechanism : For fault tolerance 、 A stateful process ;
2. Watermark mechanism : Used to implement event clock ;
3. Windows and triggers : Used to limit the calculation range , And define when the results will be presented .
On top of the same streaming engine ,Flink There is another mechanism , For efficient batch processing .
1. Backtracking for scheduling and recovery : from Microsoft Dryad introduce , Now it's used in almost all batch processors ;
2. Special memory data structures for hashing and sorting : When it's needed , Overflow some data from memory to hard disk ;
3. Optimizer : Minimize the time to generate results .
5、 ... and 、Flink High task delay , Want to solve this problem , How would you start ?
stay Flink In the background task management , We can see Flink Which operator of and task There is back pressure . The main means are resource tuning and operator tuning .
Resource tuning That is to say, it is necessary to deal with Operator The concurrent number of (parallelism)、CPU(core)、 Heap memory (heap_memory) Tuning with equal parameters .
Job parameter tuning includes : Setting of parallelism ,State Set up ,checkpoint Set up .
Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network
边栏推荐
- Why is the UDP stream set to 1316 bytes
- 【深度学习基础】神经网络的学习(3)
- 6-6 批量求和(*)
- TypeScipt基础
- 04_ Feature engineering feature selection
- ffmpeg奇偶场帧Interlace progressive命令和代码处理
- 【实用脚本】获取某个文件的行号,然后删除文件内容。
- CLP information -5 keywords to see the development trend of the financial industry in 2022
- 删除链表的倒数第N个节点---2022/02/22
- Tidb CDC synchronization of features not available in MySQL to MySQL
猜你喜欢

自动化测试-Selenium

删除链表的倒数第N个节点---2022/02/22

ffmpeg CBR精准码流控制三个步骤

【Mysql】redo log,undo log 和binlog详解(四)

How does Sister Feng change to ice?

Summary of clustering methods

Chorus translation

which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_ mod

Semaphore PV operation of process interaction and its code implementation

测试基础之:黑盒测试
随机推荐
测试基础之:黑盒测试
为什么udp流设置1316字节
关于元素位置和尺寸
TypeScipt基础
【题解】Codeforces Round #798 (Div. 2)
【实用脚本】获取某个文件的行号,然后删除文件内容。
Bracket generation ---2022/02/25
zabbix怎样自定义mysql监控项并触发告警
Chapter II relational database
Ffmpeg hard codec inter QSV
About element location and size
6-3 读文章(*)
Remove key lookup bookmark from SQL Server
Several ways to recover tidb data from accidental deletion
6-8 creating and traversing linked lists
Use exe4j to convert The jar file is packaged as Exe file
合并K个升序链表---2022/02/26
The top ten trends of 2022 industrial Internet security was officially released
送给大模型的「高考」卷:442人联名论文给大模型提出204个任务,谷歌领衔
简单理解事件