当前位置:网站首页>Why does thread crash not cause JVM crash
Why does thread crash not cause JVM crash
2022-07-03 03:12:00 【Friendship years】
I saw a very interesting meituan interview question on the Internet : Why thread crashes don't cause JVM collapse , I have read many answers to this question , But I didn't find the answer to the root , So I decided to answer . I'm sure you will gain something after reading it , This article is divided into the following sections to discuss .
Thread crash , Is the process bound to crash ?
How the process crashed —— Introduction to signaling mechanism .
Why is it JVM Thread crash in does not cause JVM Process breakdown ?
openJDK The source code parsing .
1. Thread crash , Is the process bound to crash ?
Generally speaking, if the thread crashes due to illegal access to memory , Then the process will surely collapse .
Why should the system crash the process ? This is mainly because in the process , The address space of each thread is shared . Since it's sharing , Then the illegal access of a thread to the address will lead to memory uncertainty , This may affect other threads , This operation is dangerous . The operating system will think that this is likely to lead to a series of serious consequences , So let the whole process crash .

Threads share code snippets , Data segment , address space , file
There are several cases of illegal access to memory , We use C Take language for example :
Write data to read-only memory
#include <stdio.h>#include <stdlib.h>int main() {char *s = "hello world";// Write data to read-only memory , collapses[1] = 'H';}
Accessed the address space that the process does not have permission to access ( Such as kernel space )
#include <stdio.h>#include <stdlib.h>int main() {int *p = (int *)0xC0000fff;// Write data to the kernel space of the process , collapse*p = 10;}
stay 32 Bit virtual address space ,p It points to kernel space , Obviously do not have write permission , Therefore, the above assignment operation will cause a crash .
Accessed nonexistent memory
such as
#include <stdio.h>#include <stdlib.h>int main() {int *a = NULL;*a = 1;}
The above errors are all errors in accessing memory , So the unified Commission will report Segment Fault error ( Segment error ), All of this will cause the process to crash .
2. How the process crashed —— Introduction to signaling mechanism
After the thread crashes , How did the process crash ? What is the mechanism behind this ?
The answer is signal .
Let's think about whether it is often used to kill a running process kill -9 pid Such an order , there kill In fact, it is assigned to pid Sending a termination signal means . Among them 9 It's a signal .
In fact, there are many types of signals , stay Linux Through kill -l View all available signals .

Of course kill The signal must have certain authority , Otherwise, any process can send a signal to terminate other processes , That is obviously unreasonable .
actually kill It is a system call , Transferred control to the kernel ( operating system ), The kernel sends a signal to the specified process .
So how did the signaling process crash , What is the principle behind this ?
The mechanism behind it is as follows :
CPU Execute normal process instructions ;
call kill System calls send signals to processes ;
The process receives a signal from the operating system ,CPU Pause the current program , And transfer control to the operating system
call kill System calls send signals to processes ( Assuming that 11, namely SIGSEGV, This error is usually reported for illegal access to memory );
The operating system executes the corresponding signal processing program according to the situation ( function ), Generally, the process will exit after the signal handler logic is executed .
Pay attention to the... Above 5 Step .
If the process does not register its own signal handler , Then the operating system will execute the default signal handler ( Generally, the process will exit in the end ).
But if you register , Will execute its own signal processing function . This will give the process a chance to die , It received kill After the signal , You can call exit() To quit , But you can also use sigsetjmp、siglongjmp These two functions are used to resume the execution of the process .
// Examples of custom signal processing functionsSignal 11 catched!#include <signal.h>#include <stdlib.h>// Custom signal processing functions , Call... After processing the custom logic exit sign outvoid sigHandler(int sig) {printf("Signal %d catched!\n", sig);exit(sig);}int main(void) {signal(SIGSEGV, sigHandler);int *p = (int *)0xC0000fff;*p = 10; // Write data to kernel space that does not belong to a process , collapse}
The above results output :
Signal 11 catched!As the code shows : After registering the signal processing function , When I received SIGSEGV After the signal , Execute the relevant logic before exiting .
In addition, when a process receives a signal, it may not define its own signal processing function , Instead, choose to ignore the signal .
Examples are as follows :
#include <stdio.h>#include <signal.h>#include <stdlib.h>int main(void) {// Ignore the signalsignal(SIGSEGV, SIG_IGN);// Produce a SIGSEGV The signalraise(SIGSEGV);printf(" Normal end ");}
That is to say, although the process is sent kill The signal , However, if the process defines its own signal processing function or ignores the signal, it will have a chance to escape from life .
Yes, of course kill -9 Command exceptions , Whether or not the process has defined a signal processing function , Will be killed immediately .
Speaking of this, do you think of a classic interview question : How to make running Java Graceful shutdown of the project ?
Through the above introduction, it is not difficult to find , It's actually JVM I have defined the signal processing function . such , When sending kill pid command ( By default, it will be transmitted 15 That is to say SIGTERM) after ,JVM You can perform some resource cleaning in the signal processing function before calling exit sign out .
This kind of scene obviously can't be used kill -9, Otherwise, once the process is killed, there will be no time to clear the resources .
3. Why thread crashes don't cause JVM Process breakdown ?
Now let's look at the beginning , I'm sure you'll know better . think about it , stay Java What are the common problems caused by illegal memory access Exception or error Well ?
Common ones are familiar to everyone StackoverflowError perhaps NPE(NullPointerException).NPE We all know , Belonging to refers to accessing nonexistent memory . But why stack overflow (Stackoverflow) It also belongs to illegal access to memory ?
It is necessary to briefly talk about the virtual space of the process , That is, the shared address space mentioned earlier .
Modern operating systems protect processes from being affected , So the virtual address space is used to isolate the process . The addressing of processes is for virtual addresses , The virtual space of each process is the same , Threads share the address space of the process .
With 32 Take bit virtual space as an example , The virtual space distribution of the process is as follows :

that ,StackOverflow How did it happen ?
Every time a process calls a function , Will assign a stack frame , Then various local variables defined in the function will be allocated in the stack frame .
Suppose you now call an infinitely recursive function , That will continue to allocate stack frames . but stack The size of is limited (Linux China and Murdoch think 8 M, Can pass ulimit -a see ), If infinite recursion happens, the stack will be allocated quickly . At this point, call the function again to try to allocate more memory than the stack , There will be a segment error , That is to say StackOverflowError.

Okay , Now we know StackoverflowError How did it come about , That's the question : since StackoverflowError perhaps NPE Are illegal access to memory , JVM Why not collapse ?
With the foreshadowing of the previous section , I believe it is not difficult for you to answer . In fact, it's because JVM Customized its own signal processing function , Intercepted SIGSEGV The signal , Don't let these two collapse .
How to prove this conjecture ?
Let's see JVM The source code of .
4. OpenJDK The source code parsing
HotSpot Virtual machines are currently the most widely used Java virtual machine . According to the R Large description , Oracle JDK And OpenJDK Inside JVM All are HotSpot VM. From the source level , The two are basically the same thing .
OpenJDK It's open source. , So we mainly study Java 8 Of OpenJDK that will do . The address is as follows :
https://github.com/AdoptOpenJDK/openjdk-jdk8u
If you are interested, you can download it to have a look .
We just study Linux Under the JVM, For the convenience of explanation and reference , I sorted out the key processes of signal processing ( The minor code is ignored ).

You can see , Start up JVM When , The signal processing function is also set . received SIGSEGV、SIGPIPE After waiting for signal , Will eventually call JVM_handle_linux_signal This custom signal processing function .
Let's look at the main logic of this function :
JVM_handle_linux_signal(int sig,siginfo_t* info,void* ucVoid,int abort_if_unrecognized) {// Must do this before SignalHandlerMark, if crash protection installed we will longjmp away// This code calls siglongjmp, Mainly used for thread recoveryos::ThreadCrashProtection::check_crash_protection(sig, t);if (info != NULL && uc != NULL && thread != NULL) {pc = (address) os::Linux::ucontext_get_pc(uc);// Handle ALL stack overflow variations hereif (sig == SIGSEGV) {// Si_addr may not be valid due to a bug in the linux-ppc64 kernel (see// comment below). Use get_stack_bang_address instead of si_addr.address addr = ((NativeInstruction*)pc)->get_stack_bang_address(uc);// Determine whether the stack overflowsif (addr < thread->stack_base() &&addr >= thread->stack_base() - thread->stack_size()) {if (thread->thread_state() == _thread_in_Java) {// 1) For stack overflow JVM Internal treatment ofstub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::STACK_OVERFLOW);}}}}if (sig == SIGSEGV &&!MacroAssembler::needs_explicit_null_check((intptr_t)info->si_addr)) {// 2) Null pointer check will be performed herestub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::IMPLICIT_NULL);}// If it is a stack overflow or a null pointer, it will eventually return true, Will not take the last report_and_die, therefore JVM Will not quitif (stub != NULL) {// save all thread context in case we need to restore itif (thread != NULL) thread->set_saved_exception_pc(pc);uc->uc_mcontext.gregs[REG_PC] = (greg_t)stub;// 3) return true representative JVM The process does not exitreturn true;}VMError err(t, sig, pc, info, ucVoid);// Generate hs_err_pid_xxx.log File and exiterr.report_and_die();ShouldNotReachHere();return true; // Mute compiler}
From the above code ( Pay attention to the notes 1、2、3 part ) We can know the following information :
happen StackOverflow And null pointer errors , They were all sent SIGSEGV. Only the virtual machine does not choose to exit , It's an extra internal process . Actually, the thread execution is resumed , And throw StackoverflowError and NPE. That's why JVM Will not crash and we can capture the cause of these two errors or exceptions ;
If it is aimed at SIGSEGV Equal signal , In the above function JVM No extra processing , Then we will finally come to report_and_die This method . The main thing this method does is generate hs_err_pid_xxx.log crash file ( Some stack information or errors are logged ), And then quit .
So far, I believe you have understood why StackoverflowError and NPE These two illegal memory access errors ,JVM But it didn't collapse . The reason is that the signal processing function is defined inside the virtual machine , In the signal processing function, additional processing is done for these two to make JVM Don't break down .
On the other hand, we can also see , If JVM No extra processing of the signal , Finally, it will exit by itself and produce crash file hs_err_pid_xxx.log( Can pass -XX:ErrorFile=/var/log/hs_err.log In this way ). This file records the important causes of virtual machine crash . So it can also be said , Whether the virtual machine crashes depends on whether it generates the crash log file .
summary
Under normal circumstances , Operating system to ensure system security , So for illegal memory access, an SIGSEGV The signal , The operating system usually calls the default signal processing function ( It usually causes related processes to crash ).
But if the process feels " Sin does not kill ", It can also choose to customize a signal processing function . In this way, it can do some custom logic , Like records crash Information and other meaningful things .
Looking back, we can see why virtual opportunities are aimed at StackoverflowError and NullPointerException Do extra processing to restore the thread ?
in the light of StackOverflow In fact, it uses a stack backtracking method to ensure that the thread can always execute . The main reason for catching null pointer errors is that this error is too common , Let... For this common mistake JVM Collapse on that line JVM How many times will it be down . therefore For the sake of Engineering robustness , Instead of letting JVM It is better to let the thread come back from the dead than to crash , And these two errors / Exceptions are thrown to the user for processing .
边栏推荐
- I2C subsystem (I): I2C spec
- idea 加载不了应用市场解决办法(亲测)
- Vs 2019 configure tensorrt to generate engine
- [C language] MD5 encryption for account password
- Super easy to use logzero
- QT based tensorrt accelerated yolov5
- Use of check boxes: select all, deselect all, and select some
- Use optimization | points that can be optimized in recyclerview
- MySql实战45讲【行锁】
- MySQL practice 45 lecture [transaction isolation]
猜你喜欢
![[error record] the parameter 'can't have a value of' null 'because of its type, but the im](/img/1c/46d951e2d0193999f35f14d18a2de0.jpg)
[error record] the parameter 'can't have a value of' null 'because of its type, but the im

Kubernetes family container housekeeper pod online Q & A?

I2C 子系统(三):I2C Driver

VS 2019安装及配置opencv

VS 2019 配置tensorRT生成engine

Gavin teacher's perception of transformer live class - rasa project's actual banking financial BOT Intelligent Business Dialogue robot architecture, process and phenomenon decryption through rasa inte

Vs 2019 configuration du moteur de génération de tensorrt

vfork执行时出现Segmentation fault

idea 加载不了应用市场解决办法(亲测)

【PyG】理解MessagePassing过程,GCN demo详解
随机推荐
vfork执行时出现Segmentation fault
Vs 2019 configure tensorrt to generate engine
[Fuhan 6630 encodes and stores videos, and uses RTSP server and timestamp synchronization to realize VLC viewing videos]
MySql實戰45講【SQL查詢和更新執行流程】
敏捷认证(Professional Scrum Master)模拟练习题
JS finds all the parent nodes or child nodes under a node according to the tree structure
内存泄漏工具VLD安装及使用
C # general interface call
后管中编辑与预览获取表单的值写法
从C到Capable-----利用指针作为函数参数求字符串是否为回文字符
[leectode 2022.2.15] lucky numbers in the matrix
二维数组中的元素求其存储地址
The base value is too large (the error is marked as "08") [duplicate] - value too great for base (error token is'08') [duplicate]
Can I use read-only to automatically implement properties- Is read-only auto-implemented property possible?
Opengauss database development and debugging tool guide
Le processus de connexion mysql avec docker
Practice of traffic recording and playback in vivo
Pytest (6) -fixture (Firmware)
Idea set method call ignore case
MySQL practice 45 [global lock and table lock]