当前位置:网站首页>A Spark task tuning 】 【 one day suddenly slow down how to solve
A Spark task tuning 】 【 one day suddenly slow down how to solve
2022-07-30 06:39:00 【Peace and Shadow】
starts a new topic today, intending to write about the problems that may be encountered in the work, including how to change the serial task to parallel mentioned in the previous article, and also intend to write it in this series.
Let’s first look at the interview questions: a task is usually completed in 10-20 minutes, but it was not completed in 1-2 hours today, how do we solve it?This interview question is quite common at work.There will be a person on duty every night in the company, and the big data task will be alarmed when the output is delayed (as mentioned in the article yesterday, a very important part of the data quality is the latest output time).When there is a problem with the cluster hardware or the big data component, it is often necessary to check the tasks and data.As for the processing method, there are also several articles to pave the way, that is, look at the Spark Web UI.
[Spark]Spark Web UI - Jobs
[Spark]Spark Web UI - SQL
The two directions of processing are probably:
- Look at the Spark Web UI and observe whether the execution time of each stage is different from usual, and whether the execution time exceeds the previous average execution time.
- Look at the Spark log to see if there is an error, and if there is an abnormality in the network, disk space, etc.
The first is mainly to see if the data is skewed, and the second is to see if there is a problem with the hardware.We are mainly talking about data skew today.Because there are problems with hardware such as disks or problems with big data components, you can directly find people in the infrastructure team to solve them without data development.
1. The phenomenon of data skew
There are two possible manifestations:
- Most tasks of the Spark job are executed quickly, and only a limited number of tasks are executed very slowly. At this time, there may be data skew, and the job can run, but it runs very slowly;
- Most tasks of the Spark job are executed quickly, but some tasks will suddenly report OOM during the running process. After repeated execution several times, an OOM error is reported in a certain task. At this time, there may be data skew, and the jobnot functioning properly.
2. The principle of data skew
When performing shuffle, the same key on each node must be pulled to a task on a node for processing, such as aggregation or join operations according to the key.At this time, if the amount of data corresponding to a key is particularly large, data skew will occur.
3. The code where the data skew occurs
After studying the Spark Web UI article above and understanding how to use it, you can use the following troubleshooting process:
- Look at the Spark Web UI Jobs to see where the program gets stuck.
- There are usually several jobs, quantile Completed Jobs and Active Jobs. Below Active Jobs are the slower jobs.
- Click on the running job and observe the tasks under the Stage. There is one task that executes very slowly.
- Compare the amount of data of each task. Through the Spark Web UI, we can query the amount of data processed by each task and the required execution time.If the amount of data processed by this task and the execution time required are significantly more than other tasks, it means that data skew is likely to occur.
As for the relationship between job, stage, and task, you can see the previous article.
4. Handling of data skew
There are many processing methods, some from business, some from code, and some from parameter settings.
I feel this article is well written: https://www.doc88.com/p-2754837325472.html?r=1
Welcome to click here to follow the official account.
边栏推荐
- js方法 reduce 用法
- C# WPF中监听窗口大小变化事件
- misc-log analysis of CTF
- npm安装和npm安装——保存
- 在不同的服务器上基于docker部署redis主从同步
- C#预定义数据类型简介
- Detailed MySQL-Explain
- Defense Ideas for a Type of SMS Vulnerability
- 【SQL】first_value 应用场景 - 首单 or 复购
- uncategorized SQLException; SQL state [null]; error code [0]; sql injection violation, syntax error
猜你喜欢
随机推荐
The operations engineer interview experience
uni-app:关于自定义组件、easycom规范、uni_modules等问题
Volatility memory forensics - command shows
MySQL-Explain详解
网上说的挖矿究竟是什么? 挖矿系统开发详解介绍
2022CISCNmisc
C#中default关键字用法简介
CTF之misc-日志分析
JVM学习(二) 垃圾收集器
[网鼎杯 2020 青龙组]AreUSerialz
文件上传漏洞的绕过
uni-app: about custom components, easycom specs, uni_modules, etc.
js 实现自定义签名
oracle行转列、列转行总结
promise的基本概念
Remember a Mailpress plugin RCE vulnerability recurrence
Misc of CTF - other types of steganography
CTF misc-audio and video steganography
Blind injection, error injection, wide byte injection, stack injection study notes
awd——waf部署






