当前位置:网站首页>Hexagon_ V65_ Programmers_ Reference_ Manual(8)
Hexagon_ V65_ Programmers_ Reference_ Manual(8)
2022-07-27 21:01:00 【weixin_ thirty-eight million four hundred and ninety-eight thou】
Hexagon_V65_Programmers_Reference_Manual(8)
7. Program flow
Hexagon The processor supports the following program flow tools :
- Conditional instruction
- Hardware cycle
- Software branch
- Pause
- abnormal
The software branch includes jumps 、 Call and return . Support several types of jump : - Conditions jump (Speculative jumps)
- Compare jump (Compare jumps)
- Register assignment jump (Register transfer jumps)
- Double jump (Dual jumps)
7.1 Conditional instruction
many Hexagon Processor instructions can be conditionally executed . for example :
if (P0) R0 = memw(R2) // conditionally load word if P0
if (!P1) jump label // conditionally jump if not P1
The following instructions can be specified as conditional :
- Jump and call
- Many load and store instructions
- Logical instruction ( Include AND/OR/XOR)
- Shift half word
- By register or short immediate 32 Bit plus / reduce
- Sign and zero extension
- 32 Bit register transfer and 64 Bit combination word
- The register is immediately assigned (Register transfer immediate)
- Release the frame and return
7.2 Hardware cycle
Hexagon The processor contains hardware loop instructions , You can perform loop branches with zero overhead . for example :
loop0(start,#3) // loop 3 times
start:
{ R0 = mpyi(R0,R0) } :endloop0
Two sets of hardware loop instructions are provided loop0 and loop1– So that the hardware loop can be nested one layer . for example :
// Sum the rows of a 100x200 matrix.
loop1(outer_start,#100)
outer_start:
R0 = #0
loop0(inner_start,#200)
inner_start:
R3 = memw(R1++#4)
{ R0 = add(R0,R3) }:endloop0
{ memw(R2++#4) = R0 }:endloop1
The hardware loop instruction uses the following :
- For non nested loops , Use loop0.
- For nested loops ,loop0 For internal circulation ,loop1 For external circulation .
Be careful If the program needs to create a loop nested more than one layer , Then the two innermost loops can be realized as hardware loops , The remaining outer loops are implemented as software branches .
Each hardware loop is associated with a pair of dedicated loop registers :
- Loop start address register SAn Set as the address of the first instruction in the loop ( It is usually expressed as labels in assembly language ).
- Loop count register LCn Set to 32 Bit unsigned value , Specify the number of loop iterations to perform . When PC At the end of the cycle , Check LCn To determine whether the loop should repeat or exit .
The hardware loop setting instruction sets these two registers at a time —— It is usually not necessary to set them separately . However , Because the loop register completely specifies the hardware loop status , They can be saved and restored ( Interrupt automatically by processor or manually by programmer ), Once its loop register is reloaded , You can resume the paused hardware cycle normally . Saved values .
Hexagon The processor provides two sets of loop registers for two hardware loops :
- SA0 and LC0 By loop0 Use
- SA1 and LC1 By loop1 Use
surface 7-1 Lists the hardware cycle instructions .
| grammar | describe |
|---|---|
| loopN(start, Rs) | Hardware loop with register loop count . Cycle for hardware N Set register SAn and LCn: * SAn Assigned the specified cycle start address . * LCn Assigned as a general-purpose register Rs Value . Be careful - The loop start operand is encoded as PC Related immediate numbers . |
| loopN(start, #count) | Hardware loop with instant loop count . Cycle for hardware N Set register SAn and LCn: SAn Assigned the specified cycle start address . LCn Given the specified immediate number (0-1023). Be careful - The loop start operand is encoded as PC Related immediate numbers . |
| :endloopN | Hardware cycle end instruction . Do the following :if (LCn > 1) {PC = SAn; LCn = LCn-1} Be careful : This instruction is displayed in the assembly as a suffix attached to the last packet in the loop . It is encoded in the last packet . |
| SAn = Rs | Set the starting address of the loop to the general register Rs |
| LCn = Rs | Set the cycle count to the general register Rs |
Be careful
Circular instructions are assigned to instruction classes CR.
7.2.1 Loop setting
To set the hardware cycle , The loop register must be SAn and LCn Set to the correct value .
This can be done in two ways :
- loopN Instructions
- Register transfer to SAn and LCn
loopN Instruction execution settings SAn and LCn All work of . for example :
loop0(start,#3) // SA0=&start, LC0=3
start:
{ R0 = mpyi(R0,R0) } :endloop0
In this case , Hardware cycle ( It consists of a multiplication instruction ) Yes 3 Time . loop0 Instruction will register SA0 Set as the address value at the beginning of the label , And will LC0 Set to 3.
When the cycle count is loopN When is expressed as an immediate number in , The cycle count is limited to 0-1023 Within the scope of . If the required cycle count is outside this range , It must be specified as a register value . for example :
Using a loop N:
R1 = #20000;
loop0(start,R1) // LC0=20000, SA0=&start
start:
{ R0 = mpyi(R0,R0) } :endloop0
Use register assignment :
R1 = #20000
LC0 = R1 // LC0=20000
R1 = #start
SA0 = R1 // SA0=&start
start:
{ R0 = mpyi(R0,R0) } :endloop0
If loopN The instruction is too far from its loop start address , Is used to specify the starting address PC The relative offset value may exceed the maximum range of the instruction start address operand . If this happens , Either loopN The command approach cycle begins , Or specify the loop start address as 32 Bit constant ( The first 10.9 section ). for example :
Use 32 Bit constant :
R1 = #20000;
loop0(##start,R1) // LC0=20000, SA0=&start
...
7.2.2 The loop ends
The loop end instruction indicates the last packet in the hardware loop . It is expressed in assembly language , Put a symbol after the packet ":endloopN", among N Specify hardware cycle (0 or 1). for example :
loop0(start,#3)
start:
{ R0 = mpyi(R0,R0) } :endloop0 // last packet in loop
The last instruction in the loop must always be represented as a packet in assembly language ( Use curly braces ), Even if it is the only instruction in the packet .
Nested hardware loops can specify the same instruction as the end of internal and external loops . for example :
// Sum the rows of a 100x200 matrix.
// Software pipeline the outer loop.
p0 = cmp.gt(R0,R0) // p0 = false
loop1(outer_start,#100)
outer_start:
{ if (p0) memw(R2++#4) = R0
p0 = cmp.eq(R0,R0) // p0 = true
R0 = #0
loop0(inner_start,#200) }
inner_start:
R3 = memw(R1++#4)
{ R0 = add(R0,R3) }:endloop0:endloop1
memw(R2++#4) = R0
although endloopN The behavior of is similar to regular instructions ( By implementing loop testing and branching ), But please note that it will not execute in any instruction slot , And it is not counted as the instruction in the packet . therefore , A single instruction package marked as the end of the loop can perform up to six operations :
- Four general instructions ( The normal limit of an instruction package )
- endloop0 Testing and branching
- endloop1 Testing and branching
Be careful endloopN Instructions are encoded in instruction packets ( The first 10.6 section ).
7.2.3 Loop execution
After the hardware cycle is established , Regardless of the specified cycle count , The loop body is always executed at least once ( Because the loop count is not checked until the last instruction in the loop ). therefore , If a loop needs to be optionally executed zero times , Then there must be an explicit conditional branch before it . for example :
loop0(start,R1)
P0 = cmp.eq(R1,#0)
if (P0) jump skip
start:
{ R0 = mpyi(R0,R0) } :endloop0
skip:
In this example , Use R1 The cycle count in sets a hardware cycle , But if R1 The value in is zero , Then the software branch will skip the loop body .
After executing the cycle end instruction of the hardware cycle ,Hexagon The processor will check the value in the corresponding cycle count register :
- If the value is greater than 1, Then the processor decrements the loop count register and performs a zero loop branch to the loop start address .
- If the value is less than or equal to 1, Then the processor resumes program execution at the instruction immediately after the loop end instruction .
Be careful Because nested hardware loops can share the same loop end instruction , The processor can check two loop count registers in one operation .
7.2.4 Pipelined hardware loop
Software pipelining loops for Hexagon Processor, etc VLIW Architecture is common . They improve code performance in loops by overlapping multiple loop iterations .
The software pipeline consists of three parts :
- The beginning of the cycle
- kernel ( Or steady state ) part
- The end of pipeline exhaustion
It is best to illustrate this with a simple example , As shown below .
int foo(int *A, int *result)
{
int i;
for (i=0;i<100;i++) {
result[i]= A[i]*A[i];
}
}
foo:
{ R3 = R1
loop0(.kernel,#98) // Decrease loop count by 2
}
R1 = memw(R0++#4) // 1st prologue stage
{ R1 = memw(R0++#4) // 2nd prologue stage
R2 = mpyi(R1,R1)
}
.falign
.kernel:
{ R1 = memw(R0++#4) // kernel
R2 = mpyi(R1,R1)
memw(R3++#4) = R2
}:endloop0
{ R2 = mpyi(R1,R1) // 1st epilogue stage
memw(R3++#4) = R2
}
memw(R3++#4) = R2 // 2nd epilogue stage
jumpr lr
In the above code , The kernel part of the pipelined loop executes three iterations of the loop in parallel :
- iteration N+2 The load of
- iteration N+1 Multiplication of
- iteration N The storage
One disadvantage of software pipelining is the extra code required for the preface and the end of the pipelining loop .
To solve this problem ,Hexagon The processor provides spNloop0 Instructions , Of which “N” Express 1-3 Number in range . for example :
P3 = sp2loop0(start,#10) // Set up pipelined loop
spNloop0 yes loop0 Variations of instructions : It USES SA0 and LC0 Establish a normal hardware loop , But do the following additional actions :
- perform spNloop0 When the command , True value false Assign to condition register P3.
- Associative loop execution N Next time ,P3 Automatically set to true.
This function ( It is called automatic condition control ) The stored instructions in the kernel part of the pipeline loop can be controlled by P3 To execute conditionally , So because spNloop0 control P3 The way - It will not be executed during the warm-up of the assembly line . This can reduce the code size of many software pipeline loops by eliminating the need for preamble code .
spNloop0 It cannot be used to eliminate the end code from the pipeline loop ; however , In some cases , You can do this by using programming techniques .
Usually , The problem that affects the deletion of the end code is load security . If the kernel part of the pipeline loop can safely access the end of its array —— Or because it is known to be safe , Or because the array has been filled at the end —— Then the ending code is unnecessary . however , If loading security cannot be ensured , You need explicit closing code to empty the software pipeline .
Software pipeline loop ( Use spNloop0)
int foo(int *A, int *result)
{
int i;
for (i=0;i<100;i++) {
result[i]= A[i]*A[i];
}
}
foo:
{ // load safety assumed
P3 = sp2loop0(.kernel,#102) // set up pipelined loop
R3 = R1
}
.falign
.kernel:
{ R1 = memw(R0++#4) // kernel
R2 = mpyi(R1,R1)
if (P3) memw(R3++#4) = R2
}:endloop0
jumpr lr
Be careful spNloop0 Used to control P3 The set count value is stored in the user status register USR.LPCFG in .
7.2.5 Cycle limit
Hardware cycling has the following limitations :
- loopN or spNloop0( The first 7.2.4 section ) The loop setting packet in cannot contain speculative indirect jumps 、 New value comparison jump or dealloc_return.
- The last packet in the hardware loop cannot contain any program flow instructions ( Including jump or call ).
- loop0 The loop end package in cannot contain any changes SA0 or LC0 Instructions . Again ,loop1 The loop end package in cannot contain any changes SA1 or LC1 Instructions .
- spNloop0 The loop end package in cannot contain any changes P3 Instructions .
Be careful SA1 and LC1 Can be in loop0 Change at the end of , and SA0 and LC0 Can be in loop1 Change at the end of .
边栏推荐
- Tencent jumped out with 38K and saw the real test ceiling
- 82.(cesium篇)cesium点在3d模型上运动
- Kingbasees heterogeneous database migration guide (4. Application migration process)
- VI working mode (3 kinds) and mode switching (conversion)
- 金仓数据库 KingbaseES异构数据库移植指南 (2. 概述)
- Download of MySQL driver jar package -- nanny tutorial
- [dart] a programming language for cross end development
- 五大知名人士对于AI的忧虑
- SQL coding bug
- 记一次restTemplate.getForEntity携带headers失败,restTemplate. exchange
猜你喜欢

Why does Alibaba prohibit more than three forms from joining?

CPDA | how to have data analysis thinking?
![[numpy] broadcast mechanism](/img/1f/8d61ac7b35a82067bc0b77426590eb.png)
[numpy] broadcast mechanism
![[numpy] array index and slice](/img/ce/34db7aef3fefe8a03e638d0838492f.png)
[numpy] array index and slice

程序放在哪儿?

Ue5 uses DLSS (super sampling) to improve the FPS of the scene away from the optimization scheme of Caton

82.(cesium篇)cesium点在3d模型上运动

sscanf 导致地址越界

人脸识别5.1- insightface人脸检测模型训练实战笔记

After working for bytek for two years, he got 15 offers at one go
随机推荐
2022-07-19 advanced network engineering (XX) BGP route optimization, route optimization analysis one by one
Users and permissions revoke user permissions
How to solve the problem that tp6 controller does not exist: app\controller\index
Hexagon_V65_Programmers_Reference_Manual(6)
People call this software testing engineer. You're just making a living (with HR interview Dictionary)
用户登录切换案例
Academic sharing | Tsinghua University, Kang Chongqing: power system carbon measurement technology and application (matlab code implementation)
SRE相关问题答疑
MySQL驱动jar包的下载--保姆教程
knife4j通过js动态刷新全局参数
How does the industrial switch enter the web management interface?
AIRIOT答疑第6期|如何使用二次开发引擎?
CPDA|如何拥有数据分析思维?
[numpy] array index and slice
金仓数据库 Oracle 至 KingbaseES 迁移最佳实践 (4. Oracle数据库移植实战)
Write bootloader from 0
Introduction to source insight 4.0
Beijing / Shanghai / Guangzhou / Shenzhen dama-cdga/cdgp data governance certification registration conditions
Introduction to rk3399 platform introduction to proficient series (Introduction) 21 day learning challenge
【防抖与节流】