当前位置:网站首页>[Delphi] determine the encoding method of the file (ANSI, Unicode, utf8, unicodebig)
[Delphi] determine the encoding method of the file (ANSI, Unicode, utf8, unicodebig)
2022-06-11 23:03:00 【sensor_ WU】
In development , It is often encountered that the file encoding format is incorrect , Sometimes code conversion is also required , Specific coding principles can be found by yourself , Here's how I handled it . According to this principle, code conversion and automatic judgment can be realized with a little modification .
{ Determine the encoding method of the document , Code conversion can be realized
sensor
2018-08-02
}
unit uCODE_Convert;
interface
uses
Winapi.Windows,
System.SysUtils,
System.Variants,
System.Classes;
type
TCODE_TYPE = (ctANSI,ctUnicode,ctUTF8,ctUnicodeBIG);
// Determine the encoding format of a file
// Entrance parameters :FieName File name
// Export parameters : file type
function Get_FileCode_TYPE(FileName : string) : TCODE_TYPE;
// Judge whether a file is ANSI code , The criteria are Greater than $80 Bytes of must appear in even numbers ,
// Cannot appear in the content $00 Otherwise it's definitely not , If it's all , It means ANSI code
// entrance : File byte stream
// exit :True Presentation time ANSI code , Otherwise, it's not ANSI code
function is_ANSI_CODE(M : TMemoryStream) : Boolean;
// Judge whether it is UTF8
function is_UTF8_CODE(BB : TBytes) : Boolean;
function GetEncodingType(code: TCODE_TYPE): Tencoding;
function GetFileEncoding(FileName : string) : Tencoding;
implementation
function GetFileEncoding(FileName : string) : Tencoding;
var
code: TCODE_TYPE;
begin
code := Get_FileCode_TYPE(FileName);
result := GetEncodingType(code);
end;
function GetEncodingType(code: TCODE_TYPE): Tencoding;
begin
case code of
ctANSI:
result := TEncoding.ANSI;
ctUnicode:
result := TEncoding.Unicode;
ctUTF8:
result := TEncoding.UTF8;
ctUnicodeBIG:
result := TEncoding.BigEndianUnicode
else
result := TEncoding.ANSI;
end;
end;
// Determine the encoding format of a file
function Get_FileCode_TYPE(FileName : string) : TCODE_TYPE;
var
MF : TMemoryStream;
MB : TBytes;
B : Byte;
Position,Len,i : Int64;
D80 : Int64;
isANSI : Boolean;
begin
// So let's open the file
if not FileExists(FileName) then Exit(ctANSI);
MF := TMemoryStream.Create;
MF.LoadFromFile(FileName);
MF.Position := 0;
Len := MF.Size;
SetLength(MB,Len);
MF.Read(MB[0],Len); // Read data into memory table
MF.Free;
try
//1. First, judge according to the sign FF FE?
if (MB[0] = $FF) and (MB[1] = $FE) then Exit(ctUnicode); //Unicode
//2. First, judge according to the sign FE FF?
if (MB[0] = $FE) and (MB[1] = $FF) then Exit(ctUnicodeBIG); //ctUnicodeBIG
//3. First, judge according to the sign EF BB BF?
if (MB[0] = $EF) and (MB[1] = $BB) and (MB[2] = $BF) then Exit(ctUTF8); //ctUTF8
// Let's judge whether it is UTF8
if is_UTF8_CODE(MB) then exit(ctUTF8);
// Let's judge whether it is ANSI , And confirm Unicode
isANSI := True;
for i := 0 to Len - 1 do
begin
B := MB[i];
if B = 0 then
if (Len Mod 2) = 0 then // That is the Unicode , At this point, it is necessary to judge whether it is Big
begin
if i = 0 then Exit(ctUnicodeBIG); // The first one is 0, And even bytes , You can be sure it's ctUnicodeBIG
if MB[i - 1] < $80 then
Exit(ctUnicode) // first $00 The front of is less than $80, That is the Unicode, It is UnicodeBig
else
Exit(ctUnicodeBIG)
end
else
begin
isANSI := False;
Break; // appear 0 character , The length is not even bytes , It is certain that the expression is not ANSI
end;
end;
if isANSI then Exit(ctANSI);
finally
// MF.Free;
end;
end;
function is_ANSI_CODE(M : TMemoryStream) : Boolean;
var
MB : TBytes;
B : Byte;
Position,Len,i : Int64;
D80 : Int64;
begin
Result := False;
Len := M.Size; // file length
M.Position := 0; // From the first
D80 := 0; // Default $80 Count
SetLength(MB,Len);
M.Read(MB[0],Len); // Read into memory
for i := 0 to Len - 1 do
begin
B := MB[i];
if B = 0 then Exit(False); // appear 0 character , It means not ANSI
if B >= $80 then
D80 := D80 + 1
else
if (D80 mod 2) = 0 then
D80 := 0
else
Exit(False);
end;
Result := True;
end;
// Judge whether it is UTF8
function is_UTF8_CODE(BB : TBytes) : Boolean;
var
B : Byte;
Position,Len,i : Int64;
D80 : Int64;
begin
Result := True;
Len := Length(BB);
i := 0;
while (i < Len - 1) do
begin
B := BB[i];
if B < $80 then
begin
i := i + 1;
Continue;
end;
if B < $C0 then // (11000000): The value is between 0x80 And 0xC0 Between is invalid UTF-8 character
begin
Exit(False);
end;
if B < $E0 then // (11100000): This range is 2 byte UTF-8 character
begin
if i >= (Len - 1) then Exit(False);
if (BB[ i + 1 ] and $C0) <> $80 then Exit(False);
i := i + 2;
end;
if B < $F0 then // (11110000): This range is 3 byte UTF-8 character
begin
if i >= (Len - 1 - 1) then Exit(False);
if ((BB[ i + 1 ] and $C0) <> $80) and ((BB[ i + 2 ] and $C0) <> $80) then Exit(False);
i := i + 3;
end
else
Exit(False);
end;
end;
end.边栏推荐
- The key to the safe was inserted into the door, and the college students stole the mobile phone numbers of 1.1 billion users of Taobao alone
- Jsonparseexception: unrecognized token 'username': was expecting error when submitting login data
- [day15 literature extensive reading] numerical magnetic effects temporary memories but not time encoding
- [day3 literature intensive reading] Oriental time and space interaction in tau and kappa effects
- Exercise 8-8 judging palindrome string (20 points)
- Read dense visual slam for rgb-d cameras
- A method of relay for ultra long distance wireless transmission of low power wireless module
- IEEE浮点数尾数向偶舍入-四舍六入五成双
- 2022年R1快开门式压力容器操作考题及在线模拟考试
- 动态规划之0-1背包问题(详解+分析+原码)
猜你喜欢

【Day11-12 文献精读】On magnitudes in memory: An internal clock account of space-time interaction

SDNU_ ACM_ ICPC_ 2022_ Weekly_ Practice_ 1st (supplementary question)

Is it too troublesome to turn pages manually when you encounter a form? I'll teach you to write a script that shows all the data on one page

H. 265 introduction to coding principles

动态规划之0-1背包问题(详解+分析+原码)

【自然语言处理】【多模态】ALBEF:基于动量蒸馏的视觉语言表示学习

2022安全员-C证判断题模拟考试平台操作

Unity3d C#开发微信小游戏音频/音效播放问题解决过程分享

Inventory | more than 20 typical security incidents occurred in February, with a loss of nearly $400million

Meetup回顾|DevOps&MLOps如何在企业中解决机器学习困境?
随机推荐
阿里云服务器mysql远程连接一直连不上
[day15 literature extensive reading] numerical magnetic effects temporary memories but not time encoding
Dynamics 365 option set operation
2022年起重机司机(限桥式起重机)考试题模拟考试题库及模拟考试
R7-1 sum of numeric elements of a list or tuple
Cloudcompare source code analysis: read ply file
Tensorflow [actual Google deep learning framework] uses HDF5 to process large data sets with tflearn
【Day10 文献泛读】Temporal Cognition Can Affect Spatial Cognition More Than Vice Versa: The Effect of ...
【Day3 文献精读】Asymmetrical time and space interference in Tau and Kappa effects
Pourquoi Google Search ne peut - il pas Pager indéfiniment?
Exercise 11-3 calculate the longest string length (15 points)
Postgresql10 process
2022年低压电工上岗证题目及在线模拟考试
【Day9 文献泛读】On the (a)symmetry between the perception of time and space in large-scale environments
Si4432 RF chip scheme typical application of data transmission of wireless communication module of Internet of things
习题9-1 时间换算 (15 分)
postgresql10 進程
Research Report on development trend and competitive strategy of global seabed leakage detection system industry
Computer forced shutdown Oracle login failed
Gcache of goframe memory cache