当前位置:网站首页>The reason why the process cannot be shut down after a spark job is executed and the solution
The reason why the process cannot be shut down after a spark job is executed and the solution
2022-06-13 05:37:00 【Flying it people】
Recently, the students of operation and maintenance have frequently reflected ,spark Cluster operation mode , Each execution is completed spark All process ports have been closed , But by command spark The process and port of the job cannot be closed automatically , Seriously affect the operation of other business groups , However, it is not always possible to shut down , The frequency of occurrence is not standardized , But the task is normal , Data cleaning and processing are normal , Storage is normal , After checking the log, it is found that the job will be executed when the job is completed sparksession.stop Method , It is this method that blocks the normal shutdown of the process , But the reason cannot be analyzed from the log , Consider from jvm Level to analyze and view , Is it because of memory or cpu The reason for this , use -jstack -pid Command to print jvm The stack :
Copy only part of the stack :
From the stack, we can roughly see that main Functional sparksession.stop The thread of is blocked , I won't elaborate on the status of threads , You can do it yourself google And Baidu , But what is the reason , A closer look reveals that you are waiting for a lock to be released , But why is it locked , Only in stop Method to view the source code analysis :
notice synchronized The key words can be understood , It should be locked , So look for the cause of the lock , In the stack find Found such a thread information
It turns out that this thread has obtained the lock , But why waiting Well , This is the problem to be solved , I looked at it carefully , It should be with spark Of ContextCleaner of , This method is used to clean up residual data in memory space , But it is a daemon thread , And it has been cleaned up during the operation of the job , Is it because I've been cleaning up memory , But if you think about it carefully, you won't , Because if you keep clearing the space , How can a daemon thread waiting Well , So I read it again and found another content that I could understand :
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
Rpc Protocol timeout ? Turned out to be spark The heartbeat detection communication timed out , But ask the operation and maintenance students , During the jam , No network abnormal alarm or memory space full alarm , So it should not be this 2 One reason , then google For a moment , I found that the big guys basically gave 2 Answer :
1)spark Node down ( Of course not )
2) This problem is caused by the skew of the data , Lead to STW, To shorten the GC Time can solve
So in spark The startup command of is added to the dynamic conf Parameters :
--conf "spark.driver.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC
Used CMS The recycle bin (jvm Simple parameter configuration , You can also configure the memory space of the new generation and the old generation , And the recycle bin ), At the same time, the spark.network.timeout from 36s Change to default 120s, At present, it has been running for several days without any previous situation , I feel a little happy , ha-ha , But I'm not sure if this is the reason , So a friend with clear handling experience can leave a message to me , thank !.
边栏推荐
- Metaltc4.0 stable release
- 2021-9-19
- float类型取值范围
- 20 flowable container (event sub process, things, sub process, pool and pool)
- Browser screenshot method (long screenshot, node screenshot, designated area screenshot)
- powershell优化之一:提示符美化
- MySQL main query and sub query
- List collection concurrent modification exception
- Unity游戏优化[第二版]学习记录6
- Introduction to R language 4--- R language process control
猜你喜欢
Web site learning and sorting
2 first experience of drools
Mongodb Multi - field Aggregation group by
File descriptorfile description
17 servicetask of flowable task
Case - count the number of occurrences of each string in the string
Pycharm错误解决:Process finished with exit code -1073741819 (0xC0000005)
12 error end event and terminateendevent of end event
KVM hot migration for KVM virtual management
Quartz database storage
随机推荐
priority inversion problem
Comment procéder à l'évaluation des algorithmes
2021-9-19
@Detailed explanation of propertysource usage method and operation principle mechanism
MySQL table data modification
Wampserver (MySQL) installation
Windbos run command set
System performance monitoring system
Django uses redis to store sessions starting from 0
Recursion and recursion
Quartz database storage
Pyqt5 module
Validation set: ‘flowable-executable-process‘ | Problem: ‘flowable-servicetask-missing-implementatio
Small project - household income and expenditure software (1)
Std:: map insert details
Use of mongodb
16 the usertask of a flowable task includes task assignment, multi person countersignature, and dynamic forms
安装harbor(在线|离线)
Solutions to conflicts between xampp and VMware port 443
13 cancelendevent of a flowable end event and compensationthrowing of a compensation event