当前位置：网站首页>What languages can be decompiled

What languages can be decompiled

2022-06-13 07:09:00 【guangsu.】

What languages can be decompiled

This article is from Zhihu , No picture , I feel that words are enough to express . Some notes are added in the middle , For details, please move to the original text .

The concept is introduced

To understand the problem , First look 「 just 」 What is the compilation process .

You have an idea , This is something that human natural language can express . You use your programming skills , Take it 「 translate 」 The process of becoming a programming language you are familiar with is called programming .
You use a compiler （compiler） Translate it into a language that the machine can understand : This process is called compiling .

Both programming and compiling are 「 Information loss 」 The process of . Like you said , I have a set of integers , I'm going to put these numbers in order , Then he wrote a bubble sort with the ease . But to some extent , Your original motivation has actually been lost from the code .
Experienced people can see at a glance that this code is sorting , And novice Xiao Ming sees only some for and if Things like that .
If it is a more complex function , You may not understand what you were trying to do for a while . The process from program language to machine language is the same .

In fact, both of these processes are to 「 What do you do 」 convert to 「 How do you do it? 」 The process of , After the conversion is complete , What did you do in the first place , This information has been lost .
So-called 「 Decompile 」, In fact, it is the process of retrieving the lost information . From this point of view , The process of reading a piece of code , In fact, it is 「 Decompile 」 Into natural language .
If you want to decompile perfectly , There is only one possibility , Is that the information is not lost at all —— For example, the code you read has sufficient comments , Or it uses a pattern you know （ This is why people have repeatedly stressed the importance of annotations and design patterns ）.

For the decompilation process from machine language to program language , Is the same . For example, there is a lower level than decompiling （ Non derogatory ） The process of , It's called disassembly .
Strictly speaking, assembly language is also a programming language , But here we compare it with the high-level programming language we often speak （ Include C Language ） Differentiate .

In this step , I use two-way solid arrows in assembly and machine languages , Because they can be converted to each other . No information was lost from assembly language to machine language —— Because the instructions of the two correspond to each other , So disassembly can be done easily . This is why many programming languages can only disassemble 、 You can't （ Difficult to , The same below ） The reason for decompiling .

We usually call this language 「 Compiled language 」, also called 「 Native language 」. On behalf of C、C++ etc. . So why can some languages decompile ？
This starts with machine language . Just like people in different regions use different languages , Different machines speak different languages . In jargon , It's called 「 Instruction sets are different 」.

For example , Your computer and your mobile phone , Instruction sets are usually different . A program should be executed by different machines , Can only be translated separately （ compile ） Into the corresponding machine language .
This process is too cumbersome , So people thought of a way , We have come up with something called interpretive language （ It is not verified here whether the explanation language was invented for this reason , Just to help understand ）.

Problem analysis

There are two ways to interpret a language , It depends on the execution side 「 Interpreter 」 How it works .

One is to interpret and execute directly , There is no machine language in the middle , But this method is inefficient .( Unable to take advantage of batch processing and instruction pipeline , So it's less efficient ).
Through the first JIT Compile the source code into machine language , And then execute , Ensure the efficiency of implementation .JIT Compilation can be roughly understood as 「 Compile whatever you need 」, This process is often synchronized during execution .(php8 There's this thing in , Pay attention to understanding ).

The second way is basically adopted by modern explanatory language ,, Through the first JIT Translate into machine language by compiling , And then execute , Ensure the efficiency of implementation .「 Interpreter 」 English interpreter, It is actually a noun 「 translate 」 It means .
It's like your foreign ministry sent a document ( Source code ) To embassies of various countries , Then the staff of the embassy translate them into the corresponding languages , Communicate to relevant departments in the target country .
Representative explanatory language such as Javascript, It should be able to execute correctly on browsers of different machines , So in this way .
But in this way , The program code must be provided to each executing machine .
This is a leak . For the prevention of disclosure , The most direct way is to encrypt .( In the browser js Files can be crawled , Therefore, the implementation of some sensitive functions requires encryption to prevent technology leakage . For example, in the live broadcast scene h5 Player code .)

Where there is a lock, there is a key , There is also unlocking ; Encryption and decryption , There are also corresponding cracking methods . At this time, the so-called 「 Decompile 」, In fact, it is to crack the encryption algorithm . We won't talk about this .
later , People feel that interpreting language is a bit slow , So I thought of another way ： Do some work that can be done in the early stage first , Only those jobs related to the target machine , We'll see . So the program is processed into a program called 「 Interlingua 」, Or call it 「 Bytecode 」 This process is generally called compilation .

There are few interlanguage words , More refined , It is also faster to implement . These languages are also commonly used JIT technology , Further compile the intermediate language into machine language （ Instead of explaining the execution ）, The execution efficiency is comparable to those of native compiled languages . The typical examples of this language are Java etc. .
Program languages can be compiled into intermediate languages , In turn, , Intermediate languages can also be decompiled into programming languages to some extent . This is because the programming languages that adopt this compilation method ensure their high-level features （ For example, reflection ）, During the compilation process, most of the information of the source program is retained ,
Very little information is lost ; It is precisely because of the loss of this part of information , Intermediate languages usually do not decompile perfectly —— The most common is that the names of local variables in the decompiled program are lost , Replaced by the name automatically generated by the decompiler .
But this decompiled program , The structure and function are complete , Readability is also guaranteed . Generally speaking , By decompilable programs, we mean programs written in such a language .

Intermediate languages can be decompiled ; The encryption will be cracked again , And decrypt before execution , There will be additional performance overhead . Is there any way to make the code execute effectively , And not be used by those who intercept the code ? At this time, people get inspiration from some programmers with poor professional quality , And develop a tool , It's called 「 obfuscator 」.
Even if the code is decompiled and decrypted , Others can't understand it , If you are not careful, you will be taken to the pit . After all, the code is written for people to see , Just let the machine run once in a while , So code without readability is worthless .

Once this method is developed , Widely praised , So it became a very common practice . In the intermediate code and JIT Steps for , Confusion is often used in conjunction with these techniques .

Conclusion

Compiled by assembly language binary You can disassemble directly . No information lost .
C Language and other compiled languages binary Decompilation is difficult . There is information loss in the reverse process
Java And other interpretive languages are less difficult to decompile . There is information loss in the reverse process

therefore : Many programming languages can only disassemble 、 Difficult to decompile .

Original address

author ：hillin
link ：https://www.zhihu.com/question/21853681/answer/74134768
source ： You know
The copyright belongs to the author . Commercial reprint please contact the author for authorization , Non-commercial reprint please indicate the source .

原网站

版权声明
本文为[guangsu.]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202270550304555.html