2022-07-07 19:08:00

Yesterday's 《MIME note 》 I mentioned ,MIME There are mainly two ways of code conversion —-Quoted-printable and Base64—- take 8 The non English character of bit is converted into 7 Bit ASCII character .

Although this original intention , This is to meet the requirement that non - E - mail can not be used directly in E - mail ASCII Provisions for code characters , But there are other important implications :

a) All binaries , Can therefore be converted into printable text coding , Use text software for editing ; b) Can simply encrypt text .


First , A brief introduction Quoted-printable Code conversion mode . It is mainly used for ACSII The text contains a small amount of non ASCII Code characters , Not suitable for converting pure binary files .

It stipulates that every 8 Bytes of bits , Convert to 3 Characters .

The first character is ”=” Number , It's fixed .

The last two characters are two hexadecimal numbers , It represents the values of the first four bits and the last four bits of this byte respectively .

for instance ,ASCII In code ” Page feed key ”(form feed) yes 12, The binary form is 00001100, Written in hexadecimal is 0C, Therefore, its encoding value is ”=0C”.”=” The no. ASCII The value is 61, The binary form is 00111101, Because its encoded value is ”=3D”. Except printable ASCII Out of yards , All other characters must be converted in this way .

All printable ASCII Code character ( Decimal value from 33 To 126) All remain the same ,”=”( Decimal value 61) With the exception of .


below , Detailed introduction Base64 Coding conversion mode of .

So-called Base64, That is to say, to choose 64 Characters —- Lowercase letters a-z、 Capital A-Z、 Numbers 0-9、 Symbol ”+”、”/”( Plus as a pad ”=”, It's actually 65 Characters )—- As a basic character set . then , All other symbols are converted to characters in this character set .

say concretely , The transformation can be divided into four steps .

First step , Take every three bytes as a group , Is the total 24 Binary bits . The second step , Will this 24 Four binary bits are divided into four groups , Each group has 6 Binary bits . The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes . Step four , According to the following table , Get the corresponding symbol of each byte after expansion , This is it. Base64 The encoding value of .

  0 A  17 R   34 i   51 z   1 B  18 S   35 j   52 0   2 C  19 T   36 k   53 1   3 D  20 U   37 l   54 2   4 E  21 V   38 m   55 3   5 F  22 W   39 n   56 4   6 G  23 X   40 o   57 5   7 H  24 Y   41 p   58 6   8 I   25 Z   42 q   59 7   9 J  26 a   43 r   60 8   10 K  27 b   44 s   61 9   11 L  28 c   45 t   62 +   12 M  29 d   46 u   63 /   13 N  30 e   47 v   14 O  31 f   48 w      15 P  32 g   49 x   16 Q  33 h   50 y

because ,Base64 Convert three bytes to four bytes , therefore Base64 Encoded text , It will be about one third larger than the original .


Take a concrete example , Demonstrate English words Man How to turn it into Base64 code .

Text content








Bit pattern



































First step ,”M”、”a”、”n” Of ASCII Values are respectively 77、97、110, The corresponding binary value is 01001101、01100001、01101110, Connect them into one 24 Bit binary string 010011010110000101101110. The second step , Put this 24 The binary string of bits is divided into 4 Group , Each group 6 Binary bits :010011、010110、000101、101110. The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes :00010011、00010110、00000101、00101110. Their decimal values are 19、22、5、46. Step four , According to the table , Get each value corresponding to Base64 code , namely T、W、F、u.

therefore ,Man Of Base64 The encoding is TWFu.


If the number of bytes is less than three , This is how it is handled :

a) In the case of two bytes : The total number of these two bytes is 16 Binary bits , According to the above rules , Turn it into three groups , The last group, in addition to the front two 0 outside , We need to add two at the end 0. So you get a triple Base64 code , Add one more at the end ”=” Number . such as ,”Ma” This string is two bytes , It can be converted into three groups 00010011、00010110、00010000 in the future , Corresponding Base64 Values, respectively T、W、E, One more ”=” Number , therefore ”Ma” Of Base64 The encoding is TWE=.

b) In the case of one byte : Put this one byte 8 Binary bits , Turn into two groups according to the above rules , The last group, in addition to the first two 0 outside , Add... To the back 4 individual 0. Two of you get one like this Base64 code , Add two more at the end ”=” Number . such as ,”M” This letter is a byte , Can be converted into two groups 00010011、00010000, Corresponding Base64 Values, respectively T、Q, Two more ”=” Number , therefore ”M” Of Base64 The encoding is TQ==.


Another Chinese example , Chinese characters ” yan ” How to translate it into Base64 code ?

Here we need to pay attention to , Chinese characters themselves can have multiple codes , such as gb2312、utf-8、gbk wait , Every kind of coded Base64 The corresponding values are different . The following example takes utf-8 For example .

First ,” yan ” Of utf-8 Encoded as E4B8A5, Written in binary is three bytes ”11100100 10111000 10100101″. Put this 24 Bit binary string , According to section 3 The rules in Section , Convert to four groups, a total of 32 Binary value of bit ”00111001 00001011 00100010 00100101″, The corresponding decimal number is 57、11、34、37, They correspond to Base64 Value is 5、L、i、l.

therefore , Chinese characters ” yan ”(utf-8 code ) Of Base64 The value is 5Lil.


stay PHP In language , There are a pair of special functions for Base64 transformation :base64_encode() Used to code 、base64_decode() Used to decode .

The characteristic of this pair of functions is , They don't care what the encoding of the input text is , Will follow the rules Base64 code . therefore , If you want to get utf-8 Under coding Base64 Corresponding value , You have to promise yourself , The input text is utf-8 Coded .


This section describes how to use Javascript Language goes on Base64 code .

First , Suppose the code of the web page is utf-8, We hope that for the same string , use PHP and Javascript You can get the same Base64 code .

A problem arises here . because Javascript Internal string , Are subject to utf-16 In the form of , So when coding , We must first of all utf-8 The value of is converted to utf-16 Recode , When decoding , It is necessary to decode utf-16 The value of is reversed to utf-8.

Someone on the Internet has written ready-made Javascript function :

/* utf.js – UTF-8 <=> UTF-16 convertion * * Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * utf8 = utf16to8(utf16); * utf16 = utf8to16(utf8); */ function utf16to8(str) { var out, i, len, c; out = “”; len = str.length; for(i = 0; i < len; i++) { c = str.charCodeAt(i); if ((c >= 0x0001) && (c <= 0x007F)) { out += str.charAt(i); } else if (c > 0x07FF) { out += String.fromCharCode(0xE0 | ((c >> 12) & 0x0F)); out += String.fromCharCode(0x80 | ((c >> 6) & 0x3F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } else { out += String.fromCharCode(0xC0 | ((c >> 6) & 0x1F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } } return out; } function utf8to16(str) { var out, i, len, c; var char2, char3; out = “”; len = str.length; i = 0; while(i < len) { c = str.charCodeAt(i++); switch(c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx out += str.charAt(i-1); break; case 12: case 13: // 110x xxxx 10xx xxxx char2 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F)); break; case 14: // 1110 xxxx 10xx xxxx 10xx xxxx char2 = str.charCodeAt(i++); char3 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x0F) << 12) | ((char2 & 0x3F) << 6) | ((char3 & 0x3F) << 0)); break; } } return out; }

The above code defines two functions ,utf16to8() Is used to utf-16 Turn into utf-8,utf8to16 Is used to utf-8 Turn into utf-16.

The following is really used for base64 Coded functions .

/* Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * b64 = base64encode(data); * data = base64decode(b64); */ var base64EncodeChars = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”; var base64DecodeChars = new Array( -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1); function base64encode(str) { var out, i, len; var c1, c2, c3; len = str.length; i = 0; out = “”; while(i < len) { c1 = str.charCodeAt(i++) & 0xff; if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt((c1 & 0x3) << 4); out += “==”; break; } c2 = str.charCodeAt(i++); if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt((c2 & 0xF) << 2); out += “=”; break; } c3 = str.charCodeAt(i++); out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6)); out += base64EncodeChars.charAt(c3 & 0x3F); } return out; } function base64decode(str) { var c1, c2, c3, c4; var i, len, out; len = str.length; i = 0; out = “”; while(i < len) { /* c1 */ do { c1 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c1 == -1); if(c1 == -1) break; /* c2 */ do { c2 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c2 == -1); if(c2 == -1) break; out += String.fromCharCode((c1 << 2) | ((c2 & 0x30) >> 4)); /* c3 */ do { c3 = str.charCodeAt(i++) & 0xff; if(c3 == 61) return out; c3 = base64DecodeChars[c3]; } while(i < len && c3 == -1); if(c3 == -1) break; out += String.fromCharCode(((c2 & 0XF) << 4) | ((c3 & 0x3C) >> 2)); /* c4 */ do { c4 = str.charCodeAt(i++) & 0xff; if(c4 == 61) return out; c4 = base64DecodeChars[c4]; } while(i < len && c4 == -1); if(c4 == -1) break; out += String.fromCharCode(((c3 & 0x03) << 6) | c4); } return out; }

In the above code base64encode() Used to code ,base64decode() Used to decode .

therefore , Yes utf-8 Characters should be encoded in this way :


then , Decoding should be written like this :


( End )

