当前位置:网站首页>\Processing method of ufeff
\Processing method of ufeff
2022-06-21 08:59:00 【Break through】
An error occurred while reading the file \ufeff, How to solve it
problem :
The language used :Python
The tools used :pycharm
A problem was found while reading the file : existing csv file ( Non empty ) The print result will appear \ufeff
import csv
# test : from csv The data is read from the file
file = open('userinfo.csv', 'r')
table = csv.reader(file)
for row in table:
print(row)Print the results
['\ufeff Zhang Ming ', '123456', ' Login successful '] [' Wang two pock marks ', '123456', ' The username does not exist '] [' Zhang Ming ', '111111', ' Wrong password '] ['', '', ' The username does not exist '] Process finished with exit code 0
resolvent : Just modify the encoding format accordingly , hold UTF-8 code Change to UTF-8-sig
import csv
# test : from csv The data is read from the file
file = open('userinfo.csv', 'r', encoding='UTF-8-sig')
table = csv.reader(file)
for row in table:
print(row)Print the results
[' Zhang Ming ', '123456', ' Login successful '] [' Wang two pock marks ', '123456', ' The username does not exist '] [' Zhang Ming ', '111111', ' Wrong password '] ['', '', ' The username does not exist '] Process finished with exit code 0
The following is from the Internet
utf-8 And utf-8-sig The difference between the two coding formats :
As UTF-8 is an 8-bit encoding no BOM is required and anyU+FEFF character in the decoded Unicode string (even if it’s the firstcharacter) is treated as a ZERO WIDTH NO-BREAK SPACE.
UTF-8 Take byte as encoding unit , Its byte order is the same in all systems , There is no problem with byte order , And so it doesn't really need BOM(“ByteOrder Mark”). however UTF-8 with BOM namely utf-8-sig Need to provide BOM.
About \ufeff Some information about ( From Wikipedia ):
Byte order mark ( English :byte-order mark,BOM) It's at the code point U+FEFF The name of the Unicode character . When we use UTF-16 or UTF-32 to UCS/ When encoding a string composed of Unicode characters , This character is used to indicate its byte order . It is often used as a marking document to UTF-8、UTF-16 or UTF-32 The sign of the code .
character U+FEFF If it appears at the beginning of the byte stream , Is used to identify the byte order of the byte stream , Is it high or low . If it appears in the middle of the byte stream , Then express Zero width non newline space The meaning of , The user looks like a space . from Unicode3.2 Start ,U+FEFF Can only appear at the beginning of a byte stream , Can only be used to identify byte order , Like its name —— Byte order mark —— It's the same as ; Other uses have been abandoned . In its place , Use U+2060 To express zero width, no break blank .
stay UTF-16 in , The byte order mark is placed as the first character of a file or string stream , To indicate in this file or string stream , The end order of the code in all 16 bits ( Byte order ).
If a 16 bit unit is represented as a large tail order , This byte order marker character will appear in the sequence
0xFE, And then0xFF( Among them0xIt's used to indicate hexadecimal ).If the 16 bit unit uses small tail sequence , This byte sequence is
0xFF, And then0xFE.
And in Unicode , The value is U+FFFE Is guaranteed that it will not be specified as a Unicode character . It means 0xFF、0xFE It can only be interpreted as U+FEFF( Because it can't be in the big endings U+FFFE).
UTF-8 There is no byte order issue .UTF-8 The encoded byte order mark is used to indicate that it is UTF-8 The file of . It's only used to mark one UTF-8 The file of , It's not about byte order .[1] Many windows programs ( Including Notepad ) Will add a byte order mark to UTF-8 file . However , In the class Unix System ( Use a lot of text files , For file formats , For interprocess communication ) in , This approach is not recommended . Because it gets in the way of the interpreter script at the beginning of Shebang And so on . It also affects programming languages that don't recognize it . Such as gcc It will report an unrecognized character at the beginning of the source file . And in the PHP in , If the output buffer is not activated (output buffering), It makes the content of the page start to be sent to the browser ( namely : The user header file has been submitted ), This makes PHP Script cannot specify user header file (HTTP Header). The byte order is marked in UTF-8 Is represented as a sequence EF BB BF, I'm not ready to deal with most of them UTF-8 For text editors and web browsers , stay ISO-8859-1 It will be displayed in the environment of .
Although byte order marks can also be used for UTF-32, But this code is rarely used for transmission , The rules are like UTF-16. For those who have been in IANA Registered character set UTF-16BE、UTF-16LE、UTF-32BE and UTF-32LE Wait a minute , Do not use byte order marks . At the beginning of the document U+FEFF Will be interpreted as a ( Abandoned )" Zero width, no break ", Because the names of these character sets determine their byte order . For registered character sets UTF-16 and UTF-32 Come on , A beginning U+FEFF Is used to indicate the byte order .

边栏推荐
- Storage of floating point numbers in C language in memory
- 利用注解改进代码检查
- Redis master-slave vulnerability and remote connection vulnerability
- Fd: file descriptor
- nodejs的post请求json类型及表单类型
- tidb4.0.0遇见的问题、报错总结(tiup部署)
- Compiling 32-bit programs using cmake on 64 bit machines
- 一条命令开启监控之旅!
- Improve code checking with annotations
- Abstractqueuedsynchronizer (AQS) source code detailed analysis - countdownlatch source code analysis
猜你喜欢

Client construction and Optimization Practice

Abstractqueuedsynchronizer (AQS) source code analysis - cyclicbarrier source code analysis

adb使用技巧和usb通信原理

Figure out how MySQL works
![[vs], [usage problem], [solution] when VS2010 is opened, it stays in the startup interface](/img/04/a7455760caa4fc0480a034de1e24b8.png)
[vs], [usage problem], [solution] when VS2010 is opened, it stays in the startup interface

Pingcap was selected as the "voice of customers" of Gartner cloud database in 2022, and won the highest score of "outstanding performer"

Unity写多线程注意事项

【活动早知道】LiveVideoStack近期活动一览

Can you implement these requirements with MySQL

Qsort sort string
随机推荐
4.6 lodash usage documents
Abstractqueuedsynchronizer (AQS) source code detailed analysis - countdownlatch source code analysis
What should I do if a white page appears during MySQL installation
智能制造的下一站:云原生+边缘计算双轮驱动
android 数据库升级
【C】 [time operation] time operation in C language
MySQL installation process under linux environment
[vs], [usage problem], [solution] when VS2010 is opened, it stays in the startup interface
[DB written interview 367] in the three-level schema structure of the database, the description of all data logical structures and features in the database is ()
Unity write multithreading considerations
apk 反编译 上的填坑之路
The next stop of Intelligent Manufacturing: cloud native + edge computing two wheel drive
The @transactional in JUnit disappears. Can @rollback test the flag of rollback?
微信小程序
Retrofit Extended reading
Tsinghua University | van: visual attention network
Improve code checking with annotations
Given a two-dimensional list of m*n, find out whether a number exists
客户端建设及调优实践
AQS source code exploration_ 01 handwriting a simplified reentrantlock reentrant lock