当前位置:网站首页>What languages can be decompiled
What languages can be decompiled
2022-06-13 07:09:00 【guangsu.】
What languages can be decompiled
This article is from Zhihu , No picture , I feel that words are enough to express . Some notes are added in the middle , For details, please move to the original text .
The concept is introduced
To understand the problem , First look 「 just 」 What is the compilation process .
- You have an idea , This is something that human natural language can express . You use your programming skills , Take it 「 translate 」 The process of becoming a programming language you are familiar with is called programming .
- You use a compiler (compiler) Translate it into a language that the machine can understand : This process is called compiling .
Both programming and compiling are 「 Information loss 」 The process of . Like you said , I have a set of integers , I'm going to put these numbers in order , Then he wrote a bubble sort with the ease . But to some extent , Your original motivation has actually been lost from the code .
Experienced people can see at a glance that this code is sorting , And novice Xiao Ming sees only some for and if Things like that .
If it is a more complex function , You may not understand what you were trying to do for a while . The process from program language to machine language is the same .
In fact, both of these processes are to 「 What do you do 」 convert to 「 How do you do it? 」 The process of , After the conversion is complete , What did you do in the first place , This information has been lost .
So-called 「 Decompile 」, In fact, it is the process of retrieving the lost information . From this point of view , The process of reading a piece of code , In fact, it is 「 Decompile 」 Into natural language .
If you want to decompile perfectly , There is only one possibility , Is that the information is not lost at all —— For example, the code you read has sufficient comments , Or it uses a pattern you know ( This is why people have repeatedly stressed the importance of annotations and design patterns ).
For the decompilation process from machine language to program language , Is the same . For example, there is a lower level than decompiling ( Non derogatory ) The process of , It's called disassembly .
Strictly speaking, assembly language is also a programming language , But here we compare it with the high-level programming language we often speak ( Include C Language ) Differentiate .
In this step , I use two-way solid arrows in assembly and machine languages , Because they can be converted to each other . No information was lost from assembly language to machine language —— Because the instructions of the two correspond to each other , So disassembly can be done easily . This is why many programming languages can only disassemble 、 You can't ( Difficult to , The same below ) The reason for decompiling .
We usually call this language 「 Compiled language 」, also called 「 Native language 」. On behalf of C、C++ etc. . So why can some languages decompile ?
This starts with machine language . Just like people in different regions use different languages , Different machines speak different languages . In jargon , It's called 「 Instruction sets are different 」.
For example , Your computer and your mobile phone , Instruction sets are usually different . A program should be executed by different machines , Can only be translated separately ( compile ) Into the corresponding machine language .
This process is too cumbersome , So people thought of a way , We have come up with something called interpretive language ( It is not verified here whether the explanation language was invented for this reason , Just to help understand ).
Problem analysis
There are two ways to interpret a language , It depends on the execution side 「 Interpreter 」 How it works .
- One is to interpret and execute directly , There is no machine language in the middle , But this method is inefficient .( Unable to take advantage of batch processing and instruction pipeline , So it's less efficient ).
- Through the first JIT Compile the source code into machine language , And then execute , Ensure the efficiency of implementation .JIT Compilation can be roughly understood as 「 Compile whatever you need 」, This process is often synchronized during execution .(php8 There's this thing in , Pay attention to understanding ).
The second way is basically adopted by modern explanatory language ,, Through the first JIT Translate into machine language by compiling , And then execute , Ensure the efficiency of implementation .「 Interpreter 」 English interpreter, It is actually a noun 「 translate 」 It means .
It's like your foreign ministry sent a document ( Source code ) To embassies of various countries , Then the staff of the embassy translate them into the corresponding languages , Communicate to relevant departments in the target country .
Representative explanatory language such as Javascript, It should be able to execute correctly on browsers of different machines , So in this way .
But in this way , The program code must be provided to each executing machine .
This is a leak . For the prevention of disclosure , The most direct way is to encrypt .( In the browser js Files can be crawled , Therefore, the implementation of some sensitive functions requires encryption to prevent technology leakage . For example, in the live broadcast scene h5 Player code .)
Where there is a lock, there is a key , There is also unlocking ; Encryption and decryption , There are also corresponding cracking methods . At this time, the so-called 「 Decompile 」, In fact, it is to crack the encryption algorithm . We won't talk about this .
later , People feel that interpreting language is a bit slow , So I thought of another way : Do some work that can be done in the early stage first , Only those jobs related to the target machine , We'll see . So the program is processed into a program called 「 Interlingua 」, Or call it 「 Bytecode 」 This process is generally called compilation .
There are few interlanguage words , More refined , It is also faster to implement . These languages are also commonly used JIT technology , Further compile the intermediate language into machine language ( Instead of explaining the execution ), The execution efficiency is comparable to those of native compiled languages . The typical examples of this language are Java etc. . Program languages can be compiled into intermediate languages , In turn, , Intermediate languages can also be decompiled into programming languages to some extent . This is because the programming languages that adopt this compilation method ensure their high-level features ( For example, reflection ), During the compilation process, most of the information of the source program is retained ,
Very little information is lost ; It is precisely because of the loss of this part of information , Intermediate languages usually do not decompile perfectly —— The most common is that the names of local variables in the decompiled program are lost , Replaced by the name automatically generated by the decompiler .
But this decompiled program , The structure and function are complete , Readability is also guaranteed . Generally speaking , By decompilable programs, we mean programs written in such a language .
Intermediate languages can be decompiled ; The encryption will be cracked again , And decrypt before execution , There will be additional performance overhead . Is there any way to make the code execute effectively , And not be used by those who intercept the code ? At this time, people get inspiration from some programmers with poor professional quality , And develop a tool , It's called 「 obfuscator 」.
Even if the code is decompiled and decrypted , Others can't understand it , If you are not careful, you will be taken to the pit . After all, the code is written for people to see , Just let the machine run once in a while , So code without readability is worthless .
Once this method is developed , Widely praised , So it became a very common practice . In the intermediate code and JIT Steps for , Confusion is often used in conjunction with these techniques .
Conclusion
- Compiled by assembly language binary You can disassemble directly . No information lost .
- C Language and other compiled languages binary Decompilation is difficult . There is information loss in the reverse process
- Java And other interpretive languages are less difficult to decompile . There is information loss in the reverse process
therefore : Many programming languages can only disassemble 、 Difficult to decompile .
Original address
author :hillin
link :https://www.zhihu.com/question/21853681/answer/74134768
source : You know
The copyright belongs to the author . Commercial reprint please contact the author for authorization , Non-commercial reprint please indicate the source .
边栏推荐
猜你喜欢
随机推荐
Why should two judgment expressions in if be written in two lines
Is it safe to open an account online in Hangzhou?
Byte (nine)
[weak transient signal detection] matlab simulation of SVM detection method for weak transient signal under chaotic background
杭州证券开户是安全的吗?
RT-Thread 模拟器 simulator LVGL控件:switch 开关按钮控件
Nfv basic overview
Local file upload FTP or remote directory
Real time lighting of websocket server based on esp32cam
C Advanced Programming - features
Try to use renderdoc to view the shader code of UE
Tidb implementation plan -- III
个人js学习笔记
C # related knowledge points
【马尔科夫链-蒙特卡罗】马尔科夫链-蒙特卡罗方法对先验分布进行抽样
RT thread simulator lvgl control: button button style
SDN basic overview
Lightning breakpoint continuation
C # using multithreading
Why is the blind box e-commerce mode so popular?









