当前位置:网站首页>[Base64 notes] [suggestions collection]
[Base64 notes] [suggestions collection]
2022-07-07 19:08:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack .
1.
Yesterday's 《MIME note 》 I mentioned ,MIME There are mainly two ways of code conversion —-Quoted-printable and Base64—- take 8 The non English character of bit is converted into 7 Bit ASCII character .
Although this original intention , This is to meet the requirement that non - E - mail can not be used directly in E - mail ASCII Provisions for code characters , But there are other important implications :
a) All binaries , Can therefore be converted into printable text coding , Use text software for editing ; b) Can simply encrypt text .
2.
First , A brief introduction Quoted-printable Code conversion mode . It is mainly used for ACSII The text contains a small amount of non ASCII Code characters , Not suitable for converting pure binary files .
It stipulates that every 8 Bytes of bits , Convert to 3 Characters .
The first character is ”=” Number , It's fixed .
The last two characters are two hexadecimal numbers , It represents the values of the first four bits and the last four bits of this byte respectively .
for instance ,ASCII In code ” Page feed key ”(form feed) yes 12, The binary form is 00001100, Written in hexadecimal is 0C, Therefore, its encoding value is ”=0C”.”=” The no. ASCII The value is 61, The binary form is 00111101, Because its encoded value is ”=3D”. Except printable ASCII Out of yards , All other characters must be converted in this way .
All printable ASCII Code character ( Decimal value from 33 To 126) All remain the same ,”=”( Decimal value 61) With the exception of .
3.
below , Detailed introduction Base64 Coding conversion mode of .
So-called Base64, That is to say, to choose 64 Characters —- Lowercase letters a-z、 Capital A-Z、 Numbers 0-9、 Symbol ”+”、”/”( Plus as a pad ”=”, It's actually 65 Characters )—- As a basic character set . then , All other symbols are converted to characters in this character set .
say concretely , The transformation can be divided into four steps .
First step , Take every three bytes as a group , Is the total 24 Binary bits . The second step , Will this 24 Four binary bits are divided into four groups , Each group has 6 Binary bits . The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes . Step four , According to the following table , Get the corresponding symbol of each byte after expansion , This is it. Base64 The encoding value of .
0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y
because ,Base64 Convert three bytes to four bytes , therefore Base64 Encoded text , It will be about one third larger than the original .
4.
Take a concrete example , Demonstrate English words Man How to turn it into Base64 code .
Text content | M | a | n | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASCII | 77 | 97 | 110 | |||||||||||||||||||||
Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
Index | 19 | 22 | 5 | 46 | ||||||||||||||||||||
Base64-Encoded | T | W | F | u |
First step ,”M”、”a”、”n” Of ASCII Values are respectively 77、97、110, The corresponding binary value is 01001101、01100001、01101110, Connect them into one 24 Bit binary string 010011010110000101101110. The second step , Put this 24 The binary string of bits is divided into 4 Group , Each group 6 Binary bits :010011、010110、000101、101110. The third step , Add two in front of each group 00, Expanded into 32 Binary bits , Four bytes :00010011、00010110、00000101、00101110. Their decimal values are 19、22、5、46. Step four , According to the table , Get each value corresponding to Base64 code , namely T、W、F、u.
therefore ,Man Of Base64 The encoding is TWFu.
5.
If the number of bytes is less than three , This is how it is handled :
a) In the case of two bytes : The total number of these two bytes is 16 Binary bits , According to the above rules , Turn it into three groups , The last group, in addition to the front two 0 outside , We need to add two at the end 0. So you get a triple Base64 code , Add one more at the end ”=” Number . such as ,”Ma” This string is two bytes , It can be converted into three groups 00010011、00010110、00010000 in the future , Corresponding Base64 Values, respectively T、W、E, One more ”=” Number , therefore ”Ma” Of Base64 The encoding is TWE=.
b) In the case of one byte : Put this one byte 8 Binary bits , Turn into two groups according to the above rules , The last group, in addition to the first two 0 outside , Add... To the back 4 individual 0. Two of you get one like this Base64 code , Add two more at the end ”=” Number . such as ,”M” This letter is a byte , Can be converted into two groups 00010011、00010000, Corresponding Base64 Values, respectively T、Q, Two more ”=” Number , therefore ”M” Of Base64 The encoding is TQ==.
6.
Another Chinese example , Chinese characters ” yan ” How to translate it into Base64 code ?
Here we need to pay attention to , Chinese characters themselves can have multiple codes , such as gb2312、utf-8、gbk wait , Every kind of coded Base64 The corresponding values are different . The following example takes utf-8 For example .
First ,” yan ” Of utf-8 Encoded as E4B8A5, Written in binary is three bytes ”11100100 10111000 10100101″. Put this 24 Bit binary string , According to section 3 The rules in Section , Convert to four groups, a total of 32 Binary value of bit ”00111001 00001011 00100010 00100101″, The corresponding decimal number is 57、11、34、37, They correspond to Base64 Value is 5、L、i、l.
therefore , Chinese characters ” yan ”(utf-8 code ) Of Base64 The value is 5Lil.
7.
stay PHP In language , There are a pair of special functions for Base64 transformation :base64_encode() Used to code 、base64_decode() Used to decode .
The characteristic of this pair of functions is , They don't care what the encoding of the input text is , Will follow the rules Base64 code . therefore , If you want to get utf-8 Under coding Base64 Corresponding value , You have to promise yourself , The input text is utf-8 Coded .
8.
This section describes how to use Javascript Language goes on Base64 code .
First , Suppose the code of the web page is utf-8, We hope that for the same string , use PHP and Javascript You can get the same Base64 code .
A problem arises here . because Javascript Internal string , Are subject to utf-16 In the form of , So when coding , We must first of all utf-8 The value of is converted to utf-16 Recode , When decoding , It is necessary to decode utf-16 The value of is reversed to utf-8.
Someone on the Internet has written ready-made Javascript function :
/* utf.js – UTF-8 <=> UTF-16 convertion * * Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * utf8 = utf16to8(utf16); * utf16 = utf8to16(utf8); */ function utf16to8(str) { var out, i, len, c; out = “”; len = str.length; for(i = 0; i < len; i++) { c = str.charCodeAt(i); if ((c >= 0x0001) && (c <= 0x007F)) { out += str.charAt(i); } else if (c > 0x07FF) { out += String.fromCharCode(0xE0 | ((c >> 12) & 0x0F)); out += String.fromCharCode(0x80 | ((c >> 6) & 0x3F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } else { out += String.fromCharCode(0xC0 | ((c >> 6) & 0x1F)); out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F)); } } return out; } function utf8to16(str) { var out, i, len, c; var char2, char3; out = “”; len = str.length; i = 0; while(i < len) { c = str.charCodeAt(i++); switch(c >> 4) { case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: // 0xxxxxxx out += str.charAt(i-1); break; case 12: case 13: // 110x xxxx 10xx xxxx char2 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F)); break; case 14: // 1110 xxxx 10xx xxxx 10xx xxxx char2 = str.charCodeAt(i++); char3 = str.charCodeAt(i++); out += String.fromCharCode(((c & 0x0F) << 12) | ((char2 & 0x3F) << 6) | ((char3 & 0x3F) << 0)); break; } } return out; }
The above code defines two functions ,utf16to8() Is used to utf-16 Turn into utf-8,utf8to16 Is used to utf-8 Turn into utf-16.
The following is really used for base64 Coded functions .
/* Copyright (C) 1999 Masanao Izumo <[email protected]> * Version: 1.0 * LastModified: Dec 25 1999 * This library is free. You can redistribute it and/or modify it. */ /* * Interfaces: * b64 = base64encode(data); * data = base64decode(b64); */ var base64EncodeChars = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”; var base64DecodeChars = new Array( -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1); function base64encode(str) { var out, i, len; var c1, c2, c3; len = str.length; i = 0; out = “”; while(i < len) { c1 = str.charCodeAt(i++) & 0xff; if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt((c1 & 0x3) << 4); out += “==”; break; } c2 = str.charCodeAt(i++); if(i == len) { out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt((c2 & 0xF) << 2); out += “=”; break; } c3 = str.charCodeAt(i++); out += base64EncodeChars.charAt(c1 >> 2); out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)); out += base64EncodeChars.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6)); out += base64EncodeChars.charAt(c3 & 0x3F); } return out; } function base64decode(str) { var c1, c2, c3, c4; var i, len, out; len = str.length; i = 0; out = “”; while(i < len) { /* c1 */ do { c1 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c1 == -1); if(c1 == -1) break; /* c2 */ do { c2 = base64DecodeChars[str.charCodeAt(i++) & 0xff]; } while(i < len && c2 == -1); if(c2 == -1) break; out += String.fromCharCode((c1 << 2) | ((c2 & 0x30) >> 4)); /* c3 */ do { c3 = str.charCodeAt(i++) & 0xff; if(c3 == 61) return out; c3 = base64DecodeChars[c3]; } while(i < len && c3 == -1); if(c3 == -1) break; out += String.fromCharCode(((c2 & 0XF) << 4) | ((c3 & 0x3C) >> 2)); /* c4 */ do { c4 = str.charCodeAt(i++) & 0xff; if(c4 == 61) return out; c4 = base64DecodeChars[c4]; } while(i < len && c4 == -1); if(c4 == -1) break; out += String.fromCharCode(((c3 & 0x03) << 6) | c4); } return out; }
In the above code base64encode() Used to code ,base64decode() Used to decode .
therefore , Yes utf-8 Characters should be encoded in this way :
sEncoded=base64encode(utf16to8(str));
then , Decoding should be written like this :
sDecoded=utf8to16(base64decode(sEncoded));
( End )
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/120845.html Link to the original text :https://javaforall.cn
边栏推荐
- Redis publishing and subscription
- DeSci:去中心化科学是Web3.0的新趋势?
- Kirk borne's selection of learning resources this week [click the title to download directly]
- ES6笔记一
- String type, constant type and container type of go language
- 单臂路由和三层交换的简单配置
- The top of slashdata developer tool is up to you!!!
- Realize payment function in applet
- 完整的电商系统
- 云安全日报220707:思科Expressway系列和网真视频通信服务器发现远程攻击漏洞,需要尽快升级
猜你喜欢
3.关于cookie
Borui data was selected in the 2022 love analysis - Panoramic report of it operation and maintenance manufacturers
How to choose the appropriate automated testing tools?
Kubernetes DevOps CD工具对比选型
Redis
Learn open62541 -- [67] add custom enum and display name
Comparison and selection of kubernetes Devops CD Tools
Basic operation of chain binary tree (implemented in C language)
Wireshark analyzes packet capture data * cap
Redis cluster and expansion
随机推荐
Multimodal point cloud fusion and visual location based on image and laser
Redis
LeetCode 890(C#)
Policy mode - unity
AntiSamy:防 XSS 攻击的一种解决方案使用教程
ES6笔记一
我感觉被骗了,微信内测 “大小号” 功能,同一手机号可注册两个微信
The performance and efficiency of the model that can do three segmentation tasks at the same time is better than maskformer! Meta & UIUC proposes a general segmentation model with better performance t
Big Ben (Lua)
线程池中的线程工厂
Wireshark analyzes packet capture data * cap
How many are there (Lua)
RISCV64
SD_ DATA_ SEND_ SHIFT_ REGISTER
企业展厅设计中常用的三种多媒体技术形式
Desci: is decentralized science the new trend of Web3.0?
手把手教姐姐写消息队列
强化学习-学习笔记8 | Q-learning
Draw squares with Obama (Lua)
2022上半年朋友圈都在传的10本书,找到了