当前位置:网站首页>Recurrence and solution of long jump in data warehouse
Recurrence and solution of long jump in data warehouse
2022-06-09 05:07:00 【Huawei cloud developer community】
Abstract : This article will GaussDB(DWS) The error caused by medium and long jump is abstracted as an example , it has been reviewed that C The possible problems of language under long jump , Finally, a simple solution and verification are given .
This article is shared from Huawei cloud community 《GaussDB(DWS) Possible problems in medium and long jump 》, author : Lightning and showers .
Problem description , stay GaussDB(DWS) In coding practice , Found in debug There is no problem with the version without compiler optimization , But in release edition , Some variables fail after assignment , Still old bug, This paper will make a simple analysis from two angles .
What is a long jump ?
stay C In language ,goto Statements often implement short-range jumps in program execution (local jump),longjmp() and setjmp() Function to realize remote jump in program execution (nonlocaljump, Also called farjump).
Mainly related to the signature of two functions :
int setjmp(jmp_buf env); void longjmp(jmp_buf env, int value); It is generally understood as :setjmp Function saves all kinds of context information when executing this function , Store in jmp_buf in , Mainly Current stack position , Register state .longjmp The function jumps to the parameter env The context saved in the buffer ( snapshot ) In the middle . And some people have suggested that they will cooperate with Realization way implementation of .

I think the following sentence is more believable :
The setjmp() function saves the contents of most of the general purpose registers, in the same way as they would be saved on any function entry. It also saves the stack pointer and the return address. All these are placed in the buffer. It then arranges for the function to return zero.
Compiler optimization problems
The problem occurred in debug Version and release The version has different results , The main difference is the optimization process of compiler in compilation and construction . The common methods of compiler optimization are : Cache memory variables into registers .
Because accessing registers is much faster than accessing memory units , When the compiler accesses variables , To improve access speed , Compiler optimizations sometimes read variables into a register first ; When the variable value is taken later, it will be taken directly from the register . But in many cases, dirty data will be read , Seriously affect the running effect of the program .
resolvent C++ Volatile keyword
Volatile, The explanation in the dictionary is : Volatile ; Changeable ; Volatile . Personal understanding is that after each assignment to the variable , It needs to be put into memory , Instead of using registers directly , This can be avoided because jump The non written memory caused by and function jump leads to unsuccessful assignment ( It's still the old value ), Or compiler optimization , Put the value directly in the register ( This value may be used more than once , Avoid multiple reads back and forth from memory ).
Problem recurrence
Instance not optimized ,debug Unoptimized version
#include <stdio.h>#include <stdlib.h>#include <setjmp.h>static jmp_buf env;static voiddoJump(int nvar, int rvar, int vvar){ printf("Inside doJump(): nvar=%d, rvar=%d, vvar=%d\n" , nvar,rvar, vvar); // Dead code block int nvar0 = nvar; int rvar0 = rvar; int vvar0 = vvar; longjmp(env, 1);}int main(int argc, char** argv){ int nvar; register int rvar; volatile int vvar; nvar = 111; rvar = 222; vvar = 333; if(setjmp(env) == 0) { nvar = 777; rvar = 888; vvar = 999; doJump(nvar, rvar, vvar); } else { int nvar1 = nvar; int rvar1 = rvar; int vvar1 = vvar; printf("After longjmp(): nvar =%d, rvar=%d, vvar=%d\n", nvar, rvar, vvar); } exit(EXIT_SUCCESS);}Program run results
Pass the program through gcc Compiling and constructing , No optimizations are used . Run the resulting binary file , The following results can be obtained :

You can find , Register variables rvar The value of is not affected by the subsequent assignment , It's still the old value 222, Different from expectations , But ordinary int The type and volatile The type values are correct . After a long jump , The re assignment of register variables in the jump is easy to cause the problem of loss .
Assembly Perspective
The following figure shows , In assignment ,rvar It's directly in ESI In the register , Without overwriting what was previously stored in memory 222 value , That is to say 888 Assignment to register , And the memory should also be 222, The rest 777,999 All into memory .
And enter the next custom function Function time , All three variables are placed in registers . Carry out value transmission .

The picture below shows , Namely jump Come back when ,rvar True value of ( Value in register 888) Has been lost , The value of the register is jump buffer Flushed by cache value , Later, when printing the variable value , Old value read from memory .

Memory perspective


The above figure shows the completion of assignment 777,888,999, At this time, we found that , This 888 Assigned to register ( As can be seen from the compilation ), Here we find 222 Not covered .
Finally through jump return , Read the values , At this time, the reading is from memory , I found it read 777,222,999, Something unexpected happened to the program . The following figure shows the values in the memory address ,222 stay -0x28 + 0x7fffffffe160 Address bit .

Instance optimization O2,release edition
Program run results
Add in compilation O2 Compiler optimization , And run the program . At this time, it is found that ,nvar and rvar The values of all have changed , Not stored in what we expected 777 and 888, It is old The value has not been changed .
Because there are compiler optimization problems , Variable nvar and rvar In jump , The rewritten value is put into the register ,jump after , The value of the register is flushed , To cause such problems . Variables vvar The value of is put into memory ,jump after , It can still be called through the register pointer .

Next, check the running process of the program and analyze the results .
Assembly Perspective

adopt objdump -d volatile_og You can view the disassembly code of the compiled file . We mainly observe main function , Its from 10c0 Start , According to the above figure env Is it equal to 0 For boundaries , Divided into 3 block , Easy to understand and read .
It is found that there is no pair of functions in the assembly Dojump Call to (callq Not present after command Dojump), The guess is that the compiler is optimized for inline functions . At the same time, the variable in this function nvar0,rvar0,vvar0 Initialize to a dead code block , It was also removed during the optimization process .
The following figure can illustrate , Only use keywords volatile Of vvar Its value can be found in the stack memory , The other variables are not lvalue.

Memory perspective
By looking at jump Values in memory before and after , To see exactly where jump What happened in :
Figure 1 below shows jump Before , Value in register , Only 333 It's in memory . You can also query through figure 2 , Find out rvar and nvar Not accessible through memory address .


stay jump after , Memory e15c Change the value in to 999.
Jump after , The space of stack memory is shown in the figure below :

The following figure , At this time only vvar You can take the address operation .

appendix
Reference material
- What is memory barrier ? Why Memory Barriers ?
- why-do-we-use-volatile-keyword
- intro.races-13
- Linux Assembly language development guide Intel Format --AT&T Format
- setjmp() And longjmp() Detailed analysis
- utilize C In language Setjmp and Longjmp, To implement exception capture and coroutine
- Exactly what “program state” does setjmp save?
Specific optimization parameters that may be involved
- l -fforce-mem: Before doing arithmetic , Force memory data copy Into the register and then execute . This causes all memory references to potentially common expressions , To produce more efficient code , When there is no common subexpression , Instruction merging will discharge individual registers into . This optimization is for variables involving only a single instruction , This may not have a great optimization effect . But for many more instructions ( Mathematical operation is required ) For the variables involved in , This will be a significant optimization , Because compared with accessing values in memory , The processor accesses the value in the register much faster .
- l -fregmove: The compiler tried to reallocate move The number of registers of instructions or other simple instructions such as operands , In order to maximize the number of bundled registers . This optimization is especially helpful for machines with double operand instructions .
- l -fschedule-insns: Compiler attempt Reorder instructions , To eliminate the delay caused by waiting for unprepared data . This optimization will be useful for slow floating-point machines as well as those that need load memory The execution of instructions is helpful , Because other instructions are allowed to execute at this time , until load memory The command of the , Or floating-point instructions again need cpu. It allows data processing to complete other instructions first .
summary :
-fforce-mem May cause the data between the memory and the register to produce the similar dirty data inconsistency and so on . For some logic that depends on the order of memory operations , It needs to be treated strictly before optimization . for example , use volatile Keywords restrict how variables operate , Or make use of barrier force cpu Executed in strict accordance with the order of instructions .
Memory barrier Memory Barriers
Cache The root cause of the consistency problem is the existence of Exclusive to multiple processors Cache, Instead of multiple processors . It has many restrictions : Multicore , Monopoly Cache,Cache Write strategy .
When one of the conditions is not satisfied, it does not exist cache Consistency issues .
in the light of CPU The multistage of Cache And storage read-write consistency :
CPU In order to improve instruction execution , Added two buffers store buffer, invalidate queue.

Store Buffer:
benefits :store In order to CPU0 and 1 Between reading and writing , No need to wait from another CPU Of Cache Data in .( Increase speed ).
Disadvantage ( Problem description ):CPU0 Modified value , But it sent “ Reading makes invalid ” Later than CPU1 Real reading time , It led to a late step , The data is wrong .
The solution of conflict :
- On the hardware :store forwarding. If the local Store Buffer There's data , Read the team directly first Store Buffer.
- Software : The hardware designer provides memory barrier Instructions , Let the software tell CPU This kind of relationship .
Failure queue :
store buffer It's usually very small , therefore CPU Carry out a few store The operation will fill up , Now CPU Must wait invalidation ACK news ( obtain invalidation ACK After the news will be storebuffer The data in is stored in cache in , Then take it from store buffer Remove ), To release store buffer Buffer space .
benefits :CPU1 May be under heavy load , Executing a large number of failed commands will have a heavier composite . Speed up ;
Disadvantage ( Problem description ): The value itself may be invalid , But the queue did not execute to .( It's late again ).
solve : Still, adding barriers can solve .
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- Program implementation of inserting, updating and deleting in Oracle Internet cafe design
- The latest JMeter pressure test in the whole network is not much to say. I just want to teach you to use JMeter to write script pressure test as soon as possible
- The 27th issue of product weekly report | members' new interests of black users; CSDN app v5.1.0 release
- Li Kou today's question -1037 Effective boomerang
- 2022 tea artist (intermediate) examination question simulation examination question bank and simulation examination
- Installation and performance test of API gateway Apache APIs IX on AWS graviton3
- TypeScript 学习【7】高级类型:联合类型与交叉类型
- ps如何给图像加边框
- 2022-06-清华管理学-清华大学-宁向东
- 模式识别大作业——PCA&Fisher&KNN&Kmeans
猜你喜欢
![[006] [esp32 Development Notes] steps for burning firmware using flash download tool](/img/a0/5d5e6c076d267c0ffe4c1e8f10e408.png)
[006] [esp32 Development Notes] steps for burning firmware using flash download tool

SQL summary statistics: use cube and rollup in SQL to realize multidimensional data summary

数据库的三大范式

Nacos1.1.4版本本地源码启动

Hengyuan cloud (gpushare)_ Beyond the model of pre training NLP
![[004] [esp32 Development Notes] audio development framework ADF environment construction - based on esp-idf](/img/55/9eb286bc56ec991837fc014b42fc20.png)
[004] [esp32 Development Notes] audio development framework ADF environment construction - based on esp-idf
![[005] [ESP32开发笔记] ADF基本框架](/img/4a/45a3e467615be4b32531af64549cf2.png)
[005] [ESP32开发笔记] ADF基本框架

wps ppt图片如何一张一张出来

Devon 2K high resolution smart screen releases 4 new products
![[006] [esp32 Development notes] burn firmware steps Using Flash Download Tool](/img/a0/5d5e6c076d267c0ffe4c1e8f10e408.png)
[006] [esp32 Development notes] burn firmware steps Using Flash Download Tool
随机推荐
SQL summary statistics: use cube and rollup in SQL to realize multidimensional data summary
Question bank and answers of G3 boiler water treatment examination in 2022
^25进程与线程
Transformer里面的缓存机制
Why do I need a thread pool? What is pooling technology?
[006] [esp32 Development Notes] steps for burning firmware using flash download tool
MQ消息丢失,消息一致性,重复消费解决方案
"Diwen Cup" electronic design competition of Hunan University of Arts and Sciences was successfully concluded
[004] [ESP32开发笔记] 音频开发框架ADF环境搭建——基于ESP-IDF
Where will the money go in the SaaS industry in 2022?
How to understand CTF information security competition
Typescript learning [6] interface type
[005] [esp32 Development Notes] ADF basic framework
[006] [ESP32开发笔记] 使用Flash下载工具烧录固件步骤
Product weekly report issue 28 | CSDN editor upgrade, adding the function of inserting existing videos
wps ppt图片如何一张一张出来
TypeScript学习【6】 接口类型
力扣今日题-1037. 有效的回旋镖
Camtasia studio2022 free key serial number installation trial detailed graphic tutorial
[Django学习笔记 - 12]:数据库操作
