当前位置:网站首页>[Base64 notes] [suggestions collection]
[Base64 notes] [suggestions collection]
2022-07-07 19:08:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack .
1.
Yesterday's 《MIME note 》 I mentioned ,MIME There are mainly two ways of code conversion —-Quoted-printable and Base64—- take 8 The non English character of bit is converted into 7 Bit ASCII character .
Although this original intention , This is to meet the requirement that non - E - mail can not be used directly in E - mail ASCII Provisions for code characters , But there are other important implications :
a) All binaries , Can therefore be converted into printable text coding , Use text software for editing ; b) Can simply encrypt text .
2.
First , A brief introduction Quoted-printable Code conversion mode . It is mainly used for ACSII The text contains a small amount of non ASCII Code characters , Not suitable for converting pure binary files .
It stipulates that every 8 Bytes of bits , Convert to 3 Characters .
The first character is ”=” Number , It's fixed .
The last two characters are two hexadecimal numbers , It represents the values of the first four bits and the last four bits of this byte respectively .
for instance ,ASCII In code ” Page feed key ”(form feed) yes 12, The binary form is 00001100, Written in hexadecimal is 0C, Therefore, its encoding value is ”=0C”.”=” The no. ASCII The value is 61, The binary form is 00111101, Because its encoded value is ”=3D”. Except printable ASCII Out of yards , All other characters must be converted in this way .
All printable ASCII Code character ( Decimal value from 33 To 126) All remain the same ,”=”( Decimal value 61) With the exception of .
3.
below , Detailed introduction Base64 Coding conversion mode of .
So-called Base64, That is to say, to choose 64 Characters —- Lowercase letters a-z、 Capital A-Z、 Numbers 0-9、 Symbol ”+”、”/”( Plus as a pad ”=”, It's actually 65 Characters )—- As a basic character set . then , All other symbols are converted to characters in this character set .
say concretely , The transformation can be divided into four steps .
First step , Take every three bytes as a group , Is the total 24 Binary bits . The second step , Will this 24 Four binary bits are divided into four groups , Each group has 6 Binary bits . The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes . Step four , According to the following table , Get the corresponding symbol of each byte after expansion , This is it. Base64 The encoding value of .
0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y
because ,Base64 Convert three bytes to four bytes , therefore Base64 Encoded text , It will be about one third larger than the original .
4.
Take a concrete example , Demonstrate English words Man How to turn it into Base64 code .
Text content | M | a | n | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASCII | 77 | 97 | 110 | |||||||||||||||||||||
Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
Index | 19 | 22 | 5 | 46 | ||||||||||||||||||||
Base64-Encoded | T | W | F | u |
First step ,”M”、”a”、”n” Of ASCII Values are respectively 77、97、110, The corresponding binary value is 01001101、01100001、01101110, Connect them into one 24 Bit binary string 010011010110000101101110. The second step , Put this 24 The binary string of bits is divided into 4 Group , Each group 6 Binary bits :010011、010110、000101、101110. The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes :00010011、00010110、00000101、00101110. Their decimal values are 19、22、5、46. Step four , According to the table , Get each value corresponding to Base64 code , namely T、W、F、u.
therefore ,Man Of Base64 The encoding is TWFu.
5.
If the number of bytes is less than three , This is how it is handled :
a) In the case of two bytes : The total number of these two bytes is 16 Binary bits , According to the above rules , Turn it into three groups , The last group, in addition to the front two 0 outside , We need to add two at the end 0. So you get a triple Base64 code , Add one more at the end ”=” Number . such as ,”Ma” This string is two bytes , It can be converted into three groups 00010011、00010110、00010000 in the future , Corresponding Base64 Values, respectively T、W、E, One more ”=” Number , therefore ”Ma” Of Base64 The encoding is TWE=.
b) In the case of one byte : Put this one byte 8 Binary bits , Turn into two groups according to the above rules , The last group, in addition to the first two 0 outside , Add... To the back 4 individual 0. Two of you get one like this Base64 code , Add two more at the end ”=” Number . such as ,”M” This letter is a byte , Can be converted into two groups 00010011、00010000, Corresponding Base64 Values, respectively T、Q, Two more ”=” Number , therefore ”M” Of Base64 The encoding is TQ==.
6.
Another Chinese example , Chinese characters ” yan ” How to translate it into Base64 code ?
Here we need to pay attention to , Chinese characters themselves can have multiple codes , such as gb2312、utf-8、gbk wait , Every kind of coded Base64 The corresponding values are different . The following example takes utf-8 For example .
First ,” yan ” Of utf-8 Encoded as E4B8A5, Written in binary is three bytes ”11100100 10111000 10100101″. Put this 24 Bit binary string , According to section 3 The rules in Section , Convert to four groups, a total of 32 Binary value of bit ”00111001 00001011 00100010 00100101″, The corresponding decimal number is 57、11、34、37, They correspond to Base64 Value is 5、L、i、l.
therefore , Chinese characters ” yan ”(utf-8 code ) Of Base64 The value is 5Lil.
7.
stay PHP In language , There are a pair of special functions for Base64 transformation :base64_encode() Used to code 、base64_decode() Used to decode .
The characteristic of this pair of functions is , They don't care what the encoding of the input text is , Will follow the rules Base64 code . therefore , If you want to get utf-8 Under coding Base64 Corresponding value , You have to promise yourself , The input text is utf-8 Coded .
8.
This section describes how to use Javascript Language goes on Base64 code .
First , Suppose the code of the web page is utf-8, We hope that for the same string , use PHP and Javascript You can get the same Base64 code .
A problem arises here . because Javascript Internal string , Are subject to utf-16 In the form of , So when coding , We must first of all utf-8 The value of is converted to utf-16 Recode , When decoding , It is necessary to decode utf-16 The value of is reversed to utf-8.
Someone on the Internet has written ready-made Javascript function :
/* utf.js – UTF-8 <=> UTF-16 convertion * * Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * utf8 = utf16to8(utf16); * utf16 = utf8to16(utf8); */ function utf16to8(str) { var out, i, len, c; out = “”; len = str.length; for(i = 0; i < len; i++) { c = str.charCodeAt(i); if ((c >= 0x0001) && (c <= 0x007F)) { out += str.charAt(i); } else if (c > 0x07FF) { out += String.fromCharCode(0xE0 | ((c >> 12) & 0x0F)); out += String.fromCharCode(0x80 | ((c >> 6) & 0x3F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } else { out += String.fromCharCode(0xC0 | ((c >> 6) & 0x1F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } } return out; } function utf8to16(str) { var out, i, len, c; var char2, char3; out = “”; len = str.length; i = 0; while(i < len) { c = str.charCodeAt(i++); switch(c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx out += str.charAt(i-1); break; case 12: case 13: // 110x xxxx 10xx xxxx char2 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F)); break; case 14: // 1110 xxxx 10xx xxxx 10xx xxxx char2 = str.charCodeAt(i++); char3 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x0F) << 12) | ((char2 & 0x3F) << 6) | ((char3 & 0x3F) << 0)); break; } } return out; }
The above code defines two functions ,utf16to8() Is used to utf-16 Turn into utf-8,utf8to16 Is used to utf-8 Turn into utf-16.
The following is really used for base64 Coded functions .
/* Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * b64 = base64encode(data); * data = base64decode(b64); */ var base64EncodeChars = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”; var base64DecodeChars = new Array( -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1); function base64encode(str) { var out, i, len; var c1, c2, c3; len = str.length; i = 0; out = “”; while(i < len) { c1 = str.charCodeAt(i++) & 0xff; if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt((c1 & 0x3) << 4); out += “==”; break; } c2 = str.charCodeAt(i++); if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt((c2 & 0xF) << 2); out += “=”; break; } c3 = str.charCodeAt(i++); out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6)); out += base64EncodeChars.charAt(c3 & 0x3F); } return out; } function base64decode(str) { var c1, c2, c3, c4; var i, len, out; len = str.length; i = 0; out = “”; while(i < len) { /* c1 */ do { c1 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c1 == -1); if(c1 == -1) break; /* c2 */ do { c2 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c2 == -1); if(c2 == -1) break; out += String.fromCharCode((c1 << 2) | ((c2 & 0x30) >> 4)); /* c3 */ do { c3 = str.charCodeAt(i++) & 0xff; if(c3 == 61) return out; c3 = base64DecodeChars[c3]; } while(i < len && c3 == -1); if(c3 == -1) break; out += String.fromCharCode(((c2 & 0XF) << 4) | ((c3 & 0x3C) >> 2)); /* c4 */ do { c4 = str.charCodeAt(i++) & 0xff; if(c4 == 61) return out; c4 = base64DecodeChars[c4]; } while(i < len && c4 == -1); if(c4 == -1) break; out += String.fromCharCode(((c3 & 0x03) << 6) | c4); } return out; }
In the above code base64encode() Used to code ,base64decode() Used to decode .
therefore , Yes utf-8 Characters should be encoded in this way :
sEncoded=base64encode(utf16to8(str));
then , Decoding should be written like this :
sDecoded=utf8to16(base64decode(sEncoded));
( End )
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/120845.html Link to the original text :https://javaforall.cn
边栏推荐
- Tapdata 的 2.0 版 ,开源的 Live Data Platform 现已发布
- 【Unity Shader】插入Pass实现模型遮挡X光透视效果
- Static routing configuration
- CVPR 2022丨学习用于小样本语义分割的非目标知识
- POJ 2392 Space Elevator
- Nat address translation
- Will low code help enterprises' digital transformation make programmers unemployed?
- Policy mode - unity
- Kubernetes DevOps CD工具对比选型
- Draw squares with Obama (Lua)
猜你喜欢
Short selling, overprinting and stock keeping, Oriental selection actually sold 2.66 million books in Tiktok in one month
Scientists have observed for the first time that the "electron vortex" helps to design more efficient electronic products
微信网页调试8.0.19换掉X5内核,改用xweb,所以x5调试方式已经不能用了,现在有了解决方案
數據驗證框架 Apache BVal 再使用
[tpm2.0 principle and Application guide] Chapter 16, 17 and 18
[C language] string function
直播预约通道开启!解锁音视频应用快速上线的秘诀
Multimodal point cloud fusion and visual location based on image and laser
Learn open62541 -- [67] add custom enum and display name
伺服力矩控制模式下的力矩目标值(fTorque)计算
随机推荐
低代码助力企业数字化转型会让程序员失业?
How many are there (Lua)
Sports Federation: resume offline sports events in a safe and orderly manner, and strive to do everything possible for domestic events
Redis publishing and subscription
ES6笔记一
6.关于jwt
POJ 2392 Space Elevator
线程池的拒绝策略
Tapdata 的 2.0 版 ,开源的 Live Data Platform 现已发布
6. About JWT
Multimodal point cloud fusion and visual location based on image and laser
PTA 1102 teaching Super Champion volume
The highest level of anonymity in C language
How to implement safety practice in software development stage
微服务远程Debug,Nocalhost + Rainbond微服务开发第二弹
Tsinghua, Cambridge and UIC jointly launched the first Chinese fact verification data set: evidence-based, covering many fields such as medical society
unity2d的Rigidbody2D的MovePosition函数移动时人物或屏幕抖动问题解决
Usage of PHP interview questions foreach ($arr as $value) and foreach ($arr as $value)
【软件测试】从企业版BOSS直聘,看求职简历,你没被面上是有原因的
Draw squares with Obama (Lua)