当前位置:网站首页>What should ABAP do when it calls a third-party API and encounters garbled code?

What should ABAP do when it calls a third-party API and encounters garbled code?

2022-07-04 15:58:00 JerryWang_ Wang Zixi

This is a Jerry 2022 The second original article in , This is also the first official account. 370 Original articles .

A friend asked me this question on Zhihu before , I think it's very representative , Therefore, a special article is used to describe some relevant knowledge points .

Let's first look at the specific problems encountered by this friend .

use Postman Call the third-party interface , The Chinese characters inside can be displayed normally .

However, when used ABAP Of HTTP Tool class CL_HTTP_CLIENT Of response->get_data( ) After reading the response , Find the Chinese characters inside , for example " Successful visit " It's garbled :

First of all, make it clear , since Postman It can correctly display the Chinese content in the response data , explain API provider There is no problem , This garbled code problem occurs in the receiver , namely ABAP The programming implementation of the code needs to be adjusted .

We just need to find out the reason for the garbled code , Can be targeted for repair .

Last century 60 years , The United States has developed a set of character codes , The one-to-one mapping relationship between English characters and binary bits is defined , be called ASCII code . Graphic display of a symbol , This behavior associated with its binary storage bits , It is called character coding .ASCII It is the simplest character set and character encoding method .

One byte has 8 position ,2 Of 8 The second party is 256, therefore 1 Bytes can only represent 256 Symbols , The total number of Chinese characters exceeds 10 m , Obviously it doesn't work 1 Bytes to store .

In addition to the familiar English characters and Chinese characters , There are many words with a longer history , Like Egyptian hieroglyphs :

And Jay Chou 《 Love is before the West dollar 》 The cuneiform mentioned in :

Is there such a way of computer coding , Can you include these strange symbols ? Yes , This is it. Unicode, As its name implies ,Unicode Each character of the world's languages is assigned a unique code , To meet cross language needs 、 Cross platform text information conversion .

We according to the Unicode Encoding table , You can find a character corresponding to Unicode code , Like Chinese characters " Wang " Corresponding Unicode Encoded as 00006C6A.

6C6A The binary representation of 0110 1100 0110 1010, Need two bytes to store . Other symbols , It may take three or even four bytes to store .

On the other hand , For what already exists in ASCII English characters in the coding table , Only 1 One byte can store . If Unicode It is mandatory that each character be stored according to the maximum required storage space , namely 4 Bytes to store , Obviously, for English characters , It means a great waste of space .

therefore ,Unicode Only define the mapping relationship between characters and their encoding . How many bytes do these codes take to store , from Unicode Specific implementation mode , such as UTF-8,UTF16 Wait to decide .

UTF-8 It's a variable length encoding , Use 1 To 4 A byte represents a character , The symbols are different , The length of bytes used for storage is also different . such as " Wang " Of UTF-8 The code value is E6B1AA, Three bytes are required to store .

according to SAP Help document ,ABAP use UCS-2 Encoding mode , Can be seen as UTF-16 Subset , because UCS-2 I won't support it UTF-16 Of surrogates Some special symbols defined in the interval .

So-called UTF-16, That is, all characters are fixed in two bytes .

As can be seen from the table below ,UTF-16 Divide again UTF-16BE and UTF-16LE Two ways of implementation . In Chinese " Wang " Of Unicode Encoding value 6C6A For example , If 6C Low address stored in memory ,6A High address stored in memory , This is it. Big Endian That is, the big tail sequence ( Sometimes translated as big head , Big end ) storage , And vice versa Little Endian That is, the storage mode of small tail order .

These two names come from the English satirical fable writer Swift's 《 Gulliver's Travels 》. The Lilliput in the book broke out in civil war , The cause of the war is that people argue that when eating eggs, they should start from the big end (Big Endian) Knock one end open , Or from the small head (Little Endian) knock open .

that ABAP Of UCS-2(UTF-16 Subset ), What is the BE Storage or LE Storage ? Try and you will know .

In my system , The answer is UTF-16LE.

Another way , Check the system class directly CL_ABAP_CHAR_UTILITIES Properties of ENDIAN. stay Jerry In the system , The value of this property is L, representative Little Endian:

We learned this knowledge , Then fix the garbled code problem described at the beginning of the article .

Observe carefully Postman call API Return result of , I found another important message :charset=GB18030, intend API Response data take GB18030 Character set encoding .

Chinese characters " interview " Of GB18030 The code value is B7C3, Completely different from UTF-16LE Coded value in BF8B.

If we were ABAP In the code , By default UTF-16LE The way to read a according to GB18030 Coded symbols , Of course, you won't get the desired results . This decoding method is shown in the figure below 55 Yes get_cdata Method , Finally, there will be garbled code .

The right way , Take No 57 That's ok get_data, Return to one 16 Binary data stream , The type is xstring:

In this 16 In the binary data stream , We have seen Chinese characters " interview " and " ask " Corresponding GB18030 Encoding value .

The rest is easy , Using character sets GB18030 Decode this data stream .

Let's first open the database table TCP00, According to the key words 18030 Query table fields CPCOMMENT:

obtain GB18030 Corresponding SAP Code Page by 8401:

In the following code , Pass in 8401, Variable lv_binary What's stored is 16 Binary data stream , Variable lv_text The storage is based on GB18030 Of API Response content :

You can see that the garbled code has disappeared , stay ABAP The content displayed in the program has been compared with Postman Exactly the same as observed in .

I hope the example introduced in this article , To everyone in ABAP It is enlightening to deal with the problem of Chinese garbled code in , Thank you for reading .

Jerry Of ABAP special collection of works

more Jerry The original article of , All in :" Wang Zixi ":

原网站

版权声明
本文为[JerryWang_ Wang Zixi]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202141211408841.html