当前位置:网站首页>[Base64 notes] [suggestions collection]
[Base64 notes] [suggestions collection]
2022-07-07 19:08:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack .
1.
Yesterday's 《MIME note 》 I mentioned ,MIME There are mainly two ways of code conversion —-Quoted-printable and Base64—- take 8 The non English character of bit is converted into 7 Bit ASCII character .
Although this original intention , This is to meet the requirement that non - E - mail can not be used directly in E - mail ASCII Provisions for code characters , But there are other important implications :
a) All binaries , Can therefore be converted into printable text coding , Use text software for editing ; b) Can simply encrypt text .
2.
First , A brief introduction Quoted-printable Code conversion mode . It is mainly used for ACSII The text contains a small amount of non ASCII Code characters , Not suitable for converting pure binary files .
It stipulates that every 8 Bytes of bits , Convert to 3 Characters .
The first character is ”=” Number , It's fixed .
The last two characters are two hexadecimal numbers , It represents the values of the first four bits and the last four bits of this byte respectively .
for instance ,ASCII In code ” Page feed key ”(form feed) yes 12, The binary form is 00001100, Written in hexadecimal is 0C, Therefore, its encoding value is ”=0C”.”=” The no. ASCII The value is 61, The binary form is 00111101, Because its encoded value is ”=3D”. Except printable ASCII Out of yards , All other characters must be converted in this way .
All printable ASCII Code character ( Decimal value from 33 To 126) All remain the same ,”=”( Decimal value 61) With the exception of .
3.
below , Detailed introduction Base64 Coding conversion mode of .
So-called Base64, That is to say, to choose 64 Characters —- Lowercase letters a-z、 Capital A-Z、 Numbers 0-9、 Symbol ”+”、”/”( Plus as a pad ”=”, It's actually 65 Characters )—- As a basic character set . then , All other symbols are converted to characters in this character set .
say concretely , The transformation can be divided into four steps .
First step , Take every three bytes as a group , Is the total 24 Binary bits . The second step , Will this 24 Four binary bits are divided into four groups , Each group has 6 Binary bits . The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes . Step four , According to the following table , Get the corresponding symbol of each byte after expansion , This is it. Base64 The encoding value of .
0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y
because ,Base64 Convert three bytes to four bytes , therefore Base64 Encoded text , It will be about one third larger than the original .
4.
Take a concrete example , Demonstrate English words Man How to turn it into Base64 code .
Text content | M | a | n | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASCII | 77 | 97 | 110 | |||||||||||||||||||||
Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
Index | 19 | 22 | 5 | 46 | ||||||||||||||||||||
Base64-Encoded | T | W | F | u | ||||||||||||||||||||
First step ,”M”、”a”、”n” Of ASCII Values are respectively 77、97、110, The corresponding binary value is 01001101、01100001、01101110, Connect them into one 24 Bit binary string 010011010110000101101110. The second step , Put this 24 The binary string of bits is divided into 4 Group , Each group 6 Binary bits :010011、010110、000101、101110. The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes :00010011、00010110、00000101、00101110. Their decimal values are 19、22、5、46. Step four , According to the table , Get each value corresponding to Base64 code , namely T、W、F、u.
therefore ,Man Of Base64 The encoding is TWFu.
5.
If the number of bytes is less than three , This is how it is handled :
a) In the case of two bytes : The total number of these two bytes is 16 Binary bits , According to the above rules , Turn it into three groups , The last group, in addition to the front two 0 outside , We need to add two at the end 0. So you get a triple Base64 code , Add one more at the end ”=” Number . such as ,”Ma” This string is two bytes , It can be converted into three groups 00010011、00010110、00010000 in the future , Corresponding Base64 Values, respectively T、W、E, One more ”=” Number , therefore ”Ma” Of Base64 The encoding is TWE=.
b) In the case of one byte : Put this one byte 8 Binary bits , Turn into two groups according to the above rules , The last group, in addition to the first two 0 outside , Add... To the back 4 individual 0. Two of you get one like this Base64 code , Add two more at the end ”=” Number . such as ,”M” This letter is a byte , Can be converted into two groups 00010011、00010000, Corresponding Base64 Values, respectively T、Q, Two more ”=” Number , therefore ”M” Of Base64 The encoding is TQ==.
6.
Another Chinese example , Chinese characters ” yan ” How to translate it into Base64 code ?
Here we need to pay attention to , Chinese characters themselves can have multiple codes , such as gb2312、utf-8、gbk wait , Every kind of coded Base64 The corresponding values are different . The following example takes utf-8 For example .
First ,” yan ” Of utf-8 Encoded as E4B8A5, Written in binary is three bytes ”11100100 10111000 10100101″. Put this 24 Bit binary string , According to section 3 The rules in Section , Convert to four groups, a total of 32 Binary value of bit ”00111001 00001011 00100010 00100101″, The corresponding decimal number is 57、11、34、37, They correspond to Base64 Value is 5、L、i、l.
therefore , Chinese characters ” yan ”(utf-8 code ) Of Base64 The value is 5Lil.
7.
stay PHP In language , There are a pair of special functions for Base64 transformation :base64_encode() Used to code 、base64_decode() Used to decode .
The characteristic of this pair of functions is , They don't care what the encoding of the input text is , Will follow the rules Base64 code . therefore , If you want to get utf-8 Under coding Base64 Corresponding value , You have to promise yourself , The input text is utf-8 Coded .
8.
This section describes how to use Javascript Language goes on Base64 code .
First , Suppose the code of the web page is utf-8, We hope that for the same string , use PHP and Javascript You can get the same Base64 code .
A problem arises here . because Javascript Internal string , Are subject to utf-16 In the form of , So when coding , We must first of all utf-8 The value of is converted to utf-16 Recode , When decoding , It is necessary to decode utf-16 The value of is reversed to utf-8.
Someone on the Internet has written ready-made Javascript function :
/* utf.js – UTF-8 <=> UTF-16 convertion * * Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * utf8 = utf16to8(utf16); * utf16 = utf8to16(utf8); */ function utf16to8(str) { var out, i, len, c; out = “”; len = str.length; for(i = 0; i < len; i++) { c = str.charCodeAt(i); if ((c >= 0x0001) && (c <= 0x007F)) { out += str.charAt(i); } else if (c > 0x07FF) { out += String.fromCharCode(0xE0 | ((c >> 12) & 0x0F)); out += String.fromCharCode(0x80 | ((c >> 6) & 0x3F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } else { out += String.fromCharCode(0xC0 | ((c >> 6) & 0x1F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } } return out; } function utf8to16(str) { var out, i, len, c; var char2, char3; out = “”; len = str.length; i = 0; while(i < len) { c = str.charCodeAt(i++); switch(c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx out += str.charAt(i-1); break; case 12: case 13: // 110x xxxx 10xx xxxx char2 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F)); break; case 14: // 1110 xxxx 10xx xxxx 10xx xxxx char2 = str.charCodeAt(i++); char3 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x0F) << 12) | ((char2 & 0x3F) << 6) | ((char3 & 0x3F) << 0)); break; } } return out; }
The above code defines two functions ,utf16to8() Is used to utf-16 Turn into utf-8,utf8to16 Is used to utf-8 Turn into utf-16.
The following is really used for base64 Coded functions .
/* Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * b64 = base64encode(data); * data = base64decode(b64); */ var base64EncodeChars = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”; var base64DecodeChars = new Array( -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1); function base64encode(str) { var out, i, len; var c1, c2, c3; len = str.length; i = 0; out = “”; while(i < len) { c1 = str.charCodeAt(i++) & 0xff; if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt((c1 & 0x3) << 4); out += “==”; break; } c2 = str.charCodeAt(i++); if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt((c2 & 0xF) << 2); out += “=”; break; } c3 = str.charCodeAt(i++); out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6)); out += base64EncodeChars.charAt(c3 & 0x3F); } return out; } function base64decode(str) { var c1, c2, c3, c4; var i, len, out; len = str.length; i = 0; out = “”; while(i < len) { /* c1 */ do { c1 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c1 == -1); if(c1 == -1) break; /* c2 */ do { c2 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c2 == -1); if(c2 == -1) break; out += String.fromCharCode((c1 << 2) | ((c2 & 0x30) >> 4)); /* c3 */ do { c3 = str.charCodeAt(i++) & 0xff; if(c3 == 61) return out; c3 = base64DecodeChars[c3]; } while(i < len && c3 == -1); if(c3 == -1) break; out += String.fromCharCode(((c2 & 0XF) << 4) | ((c3 & 0x3C) >> 2)); /* c4 */ do { c4 = str.charCodeAt(i++) & 0xff; if(c4 == 61) return out; c4 = base64DecodeChars[c4]; } while(i < len && c4 == -1); if(c4 == -1) break; out += String.fromCharCode(((c3 & 0x03) << 6) | c4); } return out; }
In the above code base64encode() Used to code ,base64decode() Used to decode .
therefore , Yes utf-8 Characters should be encoded in this way :
sEncoded=base64encode(utf16to8(str));
then , Decoding should be written like this :
sDecoded=utf8to16(base64decode(sEncoded));
( End )
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/120845.html Link to the original text :https://javaforall.cn
边栏推荐
- 标准ACL与扩展ACL
- [information security laws and regulations] review
- Redis集群与扩展
- [software test] from the direct employment of the boss of the enterprise version, looking at the resume, there is a reason why you are not covered
- 高温火烧浑不怕,钟薛高想留清白在人间
- Static routing configuration
- In 2021, the national average salary was released. Have you reached the standard?
- 链式二叉树的基本操作(C语言实现)
- PTA 1102 teaching Super Champion volume
- Embedded interview questions (algorithm part)
猜你喜欢

强化学习-学习笔记8 | Q-learning

Tsinghua, Cambridge and UIC jointly launched the first Chinese fact verification data set: evidence-based, covering many fields such as medical society

Basic concepts and properties of binary tree

Datasimba launched wechat applet, and datanuza accepted the test of the whole scene| StartDT Hackathon

Charles+drony的APP抓包

微信网页调试8.0.19换掉X5内核,改用xweb,所以x5调试方式已经不能用了,现在有了解决方案

Reuse of data validation framework Apache bval
![[C language] string function](/img/6c/c77e8ed5bf383b7c656f45b361940f.png)
[C language] string function

Complete e-commerce system

直播预约通道开启!解锁音视频应用快速上线的秘诀
随机推荐
6.关于jwt
50亿,福建又诞生一只母基金
来了!GaussDB(for Cassandra)新特性亮相
直播预约通道开启!解锁音视频应用快速上线的秘诀
伺服力矩控制模式下的力矩目标值(fTorque)计算
Wechat web debugging 8.0.19 replace the X5 kernel with xweb, so the X5 debugging method can no longer be used. Now there is a solution
PTA 1102 teaching Super Champion volume
What is the general yield of financial products in 2022?
Datasimba launched wechat applet, and datanuza accepted the test of the whole scene| StartDT Hackathon
Three forms of multimedia technology commonly used in enterprise exhibition hall design
Continuous test (CT) practical experience sharing
10 schemes to ensure interface data security
Embedded interview questions (algorithm part)
Redis publishing and subscription
【塔望方法论】塔望3W消费战略 - U&A研究法
手把手教姐姐写消息队列
Reject policy of thread pool
标准ACL与扩展ACL
App capture of charles+postern
Rules for filling in volunteers for college entrance examination