当前位置:网站首页>[Base64 notes] [suggestions collection]

[Base64 notes] [suggestions collection]

2022-07-07 19:08:00 Full stack programmer webmaster

Hello everyone , I meet you again , I'm the king of the whole stack .

1.

Yesterday's 《MIME note 》 I mentioned ,MIME There are mainly two ways of code conversion —-Quoted-printable and Base64—- take 8 The non English character of bit is converted into 7 Bit ASCII character .

Although this original intention , This is to meet the requirement that non - E - mail can not be used directly in E - mail ASCII Provisions for code characters , But there are other important implications :

a) All binaries , Can therefore be converted into printable text coding , Use text software for editing ; b) Can simply encrypt text .

2.

First , A brief introduction Quoted-printable Code conversion mode . It is mainly used for ACSII The text contains a small amount of non ASCII Code characters , Not suitable for converting pure binary files .

It stipulates that every 8 Bytes of bits , Convert to 3 Characters .

The first character is ”=” Number , It's fixed .

The last two characters are two hexadecimal numbers , It represents the values of the first four bits and the last four bits of this byte respectively .

for instance ,ASCII In code ” Page feed key ”(form feed) yes 12, The binary form is 00001100, Written in hexadecimal is 0C, Therefore, its encoding value is ”=0C”.”=” The no. ASCII The value is 61, The binary form is 00111101, Because its encoded value is ”=3D”. Except printable ASCII Out of yards , All other characters must be converted in this way .

All printable ASCII Code character ( Decimal value from 33 To 126) All remain the same ,”=”( Decimal value 61) With the exception of .

3.

below , Detailed introduction Base64 Coding conversion mode of .

So-called Base64, That is to say, to choose 64 Characters —- Lowercase letters a-z、 Capital A-Z、 Numbers 0-9、 Symbol ”+”、”/”( Plus as a pad ”=”, It's actually 65 Characters )—- As a basic character set . then , All other symbols are converted to characters in this character set .

say concretely , The transformation can be divided into four steps .

First step , Take every three bytes as a group , Is the total 24 Binary bits . The second step , Will this 24 Four binary bits are divided into four groups , Each group has 6 Binary bits . The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes . Step four , According to the following table , Get the corresponding symbol of each byte after expansion , This is it. Base64 The encoding value of .

  0 A  17 R   34 i   51 z   1 B  18 S   35 j   52 0   2 C  19 T   36 k   53 1   3 D  20 U   37 l   54 2   4 E  21 V   38 m   55 3   5 F  22 W   39 n   56 4   6 G  23 X   40 o   57 5   7 H  24 Y   41 p   58 6   8 I   25 Z   42 q   59 7   9 J  26 a   43 r   60 8   10 K  27 b   44 s   61 9   11 L  28 c   45 t   62 +   12 M  29 d   46 u   63 /   13 N  30 e   47 v   14 O  31 f   48 w      15 P  32 g   49 x   16 Q  33 h   50 y

because ,Base64 Convert three bytes to four bytes , therefore Base64 Encoded text , It will be about one third larger than the original .

4.

Take a concrete example , Demonstrate English words Man How to turn it into Base64 code .

Text content

M

a

n

ASCII

77

97

110

Bit pattern

0

1

0

0

1

1

0

1

0

1

1

0

0

0

0

1

0

1

1

0

1

1

1

0

Index

19

22

5

46

Base64-Encoded

T

W

F

u

First step ,”M”、”a”、”n” Of ASCII Values are respectively 77、97、110, The corresponding binary value is 01001101、01100001、01101110, Connect them into one 24 Bit binary string 010011010110000101101110. The second step , Put this 24 The binary string of bits is divided into 4 Group , Each group 6 Binary bits :010011、010110、000101、101110. The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes :00010011、00010110、00000101、00101110. Their decimal values are 19、22、5、46. Step four , According to the table , Get each value corresponding to Base64 code , namely T、W、F、u.

therefore ,Man Of Base64 The encoding is TWFu.

5.

If the number of bytes is less than three , This is how it is handled :

a) In the case of two bytes : The total number of these two bytes is 16 Binary bits , According to the above rules , Turn it into three groups , The last group, in addition to the front two 0 outside , We need to add two at the end 0. So you get a triple Base64 code , Add one more at the end ”=” Number . such as ,”Ma” This string is two bytes , It can be converted into three groups 00010011、00010110、00010000 in the future , Corresponding Base64 Values, respectively T、W、E, One more ”=” Number , therefore ”Ma” Of Base64 The encoding is TWE=.

b) In the case of one byte : Put this one byte 8 Binary bits , Turn into two groups according to the above rules , The last group, in addition to the first two 0 outside , Add... To the back 4 individual 0. Two of you get one like this Base64 code , Add two more at the end ”=” Number . such as ,”M” This letter is a byte , Can be converted into two groups 00010011、00010000, Corresponding Base64 Values, respectively T、Q, Two more ”=” Number , therefore ”M” Of Base64 The encoding is TQ==.

6.

Another Chinese example , Chinese characters ” yan ” How to translate it into Base64 code ?

Here we need to pay attention to , Chinese characters themselves can have multiple codes , such as gb2312、utf-8、gbk wait , Every kind of coded Base64 The corresponding values are different . The following example takes utf-8 For example .

First ,” yan ” Of utf-8 Encoded as E4B8A5, Written in binary is three bytes ”11100100 10111000 10100101″. Put this 24 Bit binary string , According to section 3 The rules in Section , Convert to four groups, a total of 32 Binary value of bit ”00111001 00001011 00100010 00100101″, The corresponding decimal number is 57、11、34、37, They correspond to Base64 Value is 5、L、i、l.

therefore , Chinese characters ” yan ”(utf-8 code ) Of Base64 The value is 5Lil.

7.

stay PHP In language , There are a pair of special functions for Base64 transformation :base64_encode() Used to code 、base64_decode() Used to decode .

The characteristic of this pair of functions is , They don't care what the encoding of the input text is , Will follow the rules Base64 code . therefore , If you want to get utf-8 Under coding Base64 Corresponding value , You have to promise yourself , The input text is utf-8 Coded .

8.

This section describes how to use Javascript Language goes on Base64 code .

First , Suppose the code of the web page is utf-8, We hope that for the same string , use PHP and Javascript You can get the same Base64 code .

A problem arises here . because Javascript Internal string , Are subject to utf-16 In the form of , So when coding , We must first of all utf-8 The value of is converted to utf-16 Recode , When decoding , It is necessary to decode utf-16 The value of is reversed to utf-8.

Someone on the Internet has written ready-made Javascript function :

/* utf.js – UTF-8 <=> UTF-16 convertion * * Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * utf8 = utf16to8(utf16); * utf16 = utf8to16(utf8); */ function utf16to8(str) { var out, i, len, c; out = “”; len = str.length; for(i = 0; i < len; i++) { c = str.charCodeAt(i); if ((c >= 0x0001) && (c <= 0x007F)) { out += str.charAt(i); } else if (c > 0x07FF) { out += String.fromCharCode(0xE0 | ((c >> 12) & 0x0F)); out += String.fromCharCode(0x80 | ((c >> 6) & 0x3F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } else { out += String.fromCharCode(0xC0 | ((c >> 6) & 0x1F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } } return out; } function utf8to16(str) { var out, i, len, c; var char2, char3; out = “”; len = str.length; i = 0; while(i < len) { c = str.charCodeAt(i++); switch(c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx out += str.charAt(i-1); break; case 12: case 13: // 110x xxxx 10xx xxxx char2 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F)); break; case 14: // 1110 xxxx 10xx xxxx 10xx xxxx char2 = str.charCodeAt(i++); char3 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x0F) << 12) | ((char2 & 0x3F) << 6) | ((char3 & 0x3F) << 0)); break; } } return out; }

The above code defines two functions ,utf16to8() Is used to utf-16 Turn into utf-8,utf8to16 Is used to utf-8 Turn into utf-16.

The following is really used for base64 Coded functions .

/* Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * b64 = base64encode(data); * data = base64decode(b64); */ var base64EncodeChars = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”; var base64DecodeChars = new Array( -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1); function base64encode(str) { var out, i, len; var c1, c2, c3; len = str.length; i = 0; out = “”; while(i < len) { c1 = str.charCodeAt(i++) & 0xff; if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt((c1 & 0x3) << 4); out += “==”; break; } c2 = str.charCodeAt(i++); if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt((c2 & 0xF) << 2); out += “=”; break; } c3 = str.charCodeAt(i++); out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6)); out += base64EncodeChars.charAt(c3 & 0x3F); } return out; } function base64decode(str) { var c1, c2, c3, c4; var i, len, out; len = str.length; i = 0; out = “”; while(i < len) { /* c1 */ do { c1 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c1 == -1); if(c1 == -1) break; /* c2 */ do { c2 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c2 == -1); if(c2 == -1) break; out += String.fromCharCode((c1 << 2) | ((c2 & 0x30) >> 4)); /* c3 */ do { c3 = str.charCodeAt(i++) & 0xff; if(c3 == 61) return out; c3 = base64DecodeChars[c3]; } while(i < len && c3 == -1); if(c3 == -1) break; out += String.fromCharCode(((c2 & 0XF) << 4) | ((c3 & 0x3C) >> 2)); /* c4 */ do { c4 = str.charCodeAt(i++) & 0xff; if(c4 == 61) return out; c4 = base64DecodeChars[c4]; } while(i < len && c4 == -1); if(c4 == -1) break; out += String.fromCharCode(((c3 & 0x03) << 6) | c4); } return out; }

In the above code base64encode() Used to code ,base64decode() Used to decode .

therefore , Yes utf-8 Characters should be encoded in this way :

sEncoded=base64encode(utf16to8(str));

then , Decoding should be written like this :

sDecoded=utf8to16(base64decode(sEncoded));

( End )

Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/120845.html Link to the original text :https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071653297729.html