当前位置:网站首页>What is Base64?

What is Base64?

2022-07-07 15:37:00 nsnsttn

Base64 What is it? ?

Base64 Is a binary to text encoding . If you want to be more specific , It can be considered as a kind of byte The method of encoding an array into a string , And the encoded string only contains ASCII Basic characters .

Like strings ShuSheng007 Corresponding Base64 by U2h1U2hlbmcwMDc=. One of those = A special , Is a filler , Later .

It is worth noting that Base64 Not encryption algorithm , It's just a coding method , The algorithm is also public , So you can't rely on it for encryption .

Why call Base64?

Because it's based on (Base)64 A coding method of characters . The encoded text only contains 64 individual ASCII Code character ( Occasionally add a padding character =), As shown below :

Base64 Used 64 Characters :

  • A-Z 26 individual
  • a-z 26 individual
  • 0-9 10 individual
  • + 1 individual
  • / 1 individual

The picture below is Base64 clock , You can see from 0 To 63 Each number of corresponds to a character above .img

Base64 What problem to solve ?

Base64 Encoding is the encoding from binary values to certain specific characters , These specific characters total 64 individual , So called Base64.

Why not transfer binary directly ? Such as the picture , Or characters , Since they are binary byte streams in actual transmission . And even if Base64 The encoded string is ultimately binary ( Usually UTF-8 code , compatible ASCII code ) Transmitted over the network , Then use 4/3 Times the bandwidth to transmit data Base64 What's the point ?

The real reason is binary incompatibility . Some binary values , On some hardware , For example, in different routers , On the old computer , The meaning of expression is different , The treatment is also different . Again , Some old software , Network protocols have similar problems .

In the project , Compress the message 、 After encryption , The last step is usually base64 code . because base64 Encoded strings are more suitable for different platforms , Transmission of different languages .

base64 Advantages of coding :

  • The algorithm is coding , Not compression , After encoding, only the number of bytes will be increased ( Usually more than before 1/3, Like before 3, After encoding, 4)
  • Method is simple , Basically does not affect efficiency
  • Algorithm reversible , Decoding is very convenient , Not for private transmission .
  • After all, it's encoded , The naked eye cannot directly read the original content .
  • The encrypted string has only 【0-9a-zA-Z+/=】 Non printable characters ( Translate characters ) It can also transmit

Base64 It is born to solve the problem of binary incompatibility in various systems and transmission protocols

// During binary data transmission , Invisible characters or cannot be represented by UTF-8 Decoded binary data ( A character corresponds to a binary code , But a binary may not correspond to a character ), Data may be lost  
public static void main(String[] args) {
    
        // Byte array , Often used to represent binary data 
        byte[] bytes1 = new byte[]{
    31, -117, 8};
        // Directly use string to convert byte array ( Default UTF-8 code )
        String str = new String(bytes1);
        System.out.println(str);//
        byte[] bytes2 = str.getBytes();
        System.out.println(Arrays.toString(bytes2)); // result [31, -17, -65, -67, 8]  And the original bytes1:[31,-117,8] Is not the same 
        // Why is that ?
        //UTF-8 Coding is a kind of variable length coding , A character corresponds to a binary code , But a binary may not correspond to a character 
        // If the  Ascii code (UTF-8 compatible ) Data in , You can do the mapping , And do not lose data 
        byte[] bytes3 = new byte[]{
    49, 50, 51};
        String str2 = new String(bytes3);
        System.out.println(str2);
        byte[] bytes4 = str2.getBytes();
        System.out.println(Arrays.toString(bytes4));// result :[49, 50, 51]  and  bytes3[49,50,51] It's the same 
        // Most other coding methods , Each character corresponds to a binary code , But a binary code does not necessarily correspond to the previous character 
        // however Base64 Sure , How to solve it ?
        //Base64, use 6 One unit at a time , Go to Ascii Out of 64 A visible character to map (A-Z a-z 0-9 + /),
        //  In this way, all binaries can be divided into each according to their length 6 Bits correspond to a character , Not enough 6 Multiples of will be used  0 completion ,  If 6 All places are 0, Then map to  =  Number 
        // adopt Base64 After the coding 
        String encode = Base64.encode(bytes1);
        System.out.println(encode);//H4sI
        byte[] decode = Base64.decode(encode);
        System.out.println(Arrays.toString(decode));// result :[31, -117, 8]  And the original bytes1:[31,-117,8] It's the same 
    }

code Man

image-20220211230134264

Three characters Man Four characters after encoding TWFu

It can be seen that one byte of the original data needs 8 position ,base64 Coding requires 6 position , So the number of bytes of the original data must be 8 And 6 The common factor of , That is to say 3 Multiple

Of course, if the number of bytes to be encoded is not 3 Multiple , You need more 1 or 2 Bytes , Then you can use the following methods to deal with : First use 0 The byte value is complemented at the end , To enable it to be 3 to be divisible by , And then we can move on Base64 The coding . After coding Base64 Add one or two... After the text =  Number , Represents the number of bytes to make up . in other words , When the last two octets are left ( To be supplemented ) byte (2 individual byte) when , the last one 6 Bit Base64 The byte block has four bits (2*6-8=4) yes 0 value , Finally, attach two equal signs ; If the last eight digits are left ( To be supplemented ) byte (1 individual byte) when , the last one 6 Bit base The byte block has two bits (3*6-2*8=2) yes 0 value , Add an equal sign at the end . Refer to the following table :

image-20220211231052335

therefore Base64 The encoded data is slightly longer than the original data , Original 4/3.

Base64 DataURI Format

Sometimes you find out web Page sent to you base64 The string is preceded by something similar to the following .

data:image/jpeg;base64,    /9j/4AA...

This is a DataURI, Most browsers support opening such binary data directly , But we should pay special attention , If you just want to be real Base64 The content needs to be taken , Content behind

Base64 variant

Base64 The code can be used in HTTP In the environment, the longer identification information . for example , stay Java Persistence System Hibernate in , We used Base64 To put a long unique identifier ( It's usually 128-bit Of UUID) Encode as a string , Used as a HTTP Forms and HTTP GET URL Parameters in . In other applications , It is also often necessary to code binary data to fit in URL( Including hiding form fields ) In the form of . here , use Base64 The code is not only shorter , It's also unreadable , That is, the encoded data will not be directly seen by human eyes .

However , The standard Base64 It is not suitable to put directly URL In the transmission , because URL The encoder will take the standard Base64 Medium / and + The character becomes formal %XX In the form of , And these % Numbers also need to be converted when they are stored in the database , because ANSISQL Has been to % The number is used as a wildcard .

To solve this problem , There is a way of be used for URL Improvement Base64 code , It doesn't fill at the end = Number , And the standard Base64 Medium + and / They were changed to - and _, So you don't have to URL The conversion required for codec and database storage , The length of the encoding information is avoided to increase in this process , And unified database 、 The format of object identifiers such as forms .

There is another kind of Improvements for regular expressions Base64 variant , It will + and / Changed to ! and -, because +,* And the front is IRCu Used in [ and ] stay Regular expressions May have special meaning in .

There are also some variations , They will +/ Change it to _- or ._( Used as identifier name in programming language ) or .-( be used for XML Medium Nmtoken) even to the extent that _:( be used for XML Medium Name).

purpose

  • For certificates , Especially the root certificate , It's usually base64 Coded , Downloaded by many people on the Internet
  • E-mail attachments are usually base64 code , Because attachments often have invisible characters
  • xml If you want to embed another xml file , Direct embedding , Often xml The label is out of order , Not easy to parse , therefore , Need to put xml String compiled into byte array , Compile into visible characters .
  • Some small pictures in the web page , You can use base64 The way of coding is embedded , No more link requests consuming network resources .
    The other one xml file , Direct embedding , Often xml The label is out of order , Not easy to parse , therefore , Need to put xml String compiled into byte array , Compile into visible characters .
  • Some small pictures in the web page , You can use base64 The way of coding is embedded , No more link requests consuming network resources .
  • Older plain text protocols SMTP , These texts are occasionally transferred to a file , Need to use base64
原网站

版权声明
本文为[nsnsttn]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130613363723.html