当前位置:网站首页>Hexagon_ V65_ Programmers_ Reference_ Manual(8)
Hexagon_ V65_ Programmers_ Reference_ Manual(8)
2022-07-27 21:01:00 【weixin_ thirty-eight million four hundred and ninety-eight thou】
Hexagon_V65_Programmers_Reference_Manual(8)
7. Program flow
Hexagon The processor supports the following program flow tools :
- Conditional instruction
- Hardware cycle
- Software branch
- Pause
- abnormal
The software branch includes jumps 、 Call and return . Support several types of jump : - Conditions jump (Speculative jumps)
- Compare jump (Compare jumps)
- Register assignment jump (Register transfer jumps)
- Double jump (Dual jumps)
7.1 Conditional instruction
many Hexagon Processor instructions can be conditionally executed . for example :
if (P0) R0 = memw(R2) // conditionally load word if P0
if (!P1) jump label // conditionally jump if not P1
The following instructions can be specified as conditional :
- Jump and call
- Many load and store instructions
- Logical instruction ( Include AND/OR/XOR)
- Shift half word
- By register or short immediate 32 Bit plus / reduce
- Sign and zero extension
- 32 Bit register transfer and 64 Bit combination word
- The register is immediately assigned (Register transfer immediate)
- Release the frame and return
7.2 Hardware cycle
Hexagon The processor contains hardware loop instructions , You can perform loop branches with zero overhead . for example :
loop0(start,#3) // loop 3 times
start:
{ R0 = mpyi(R0,R0) } :endloop0
Two sets of hardware loop instructions are provided loop0 and loop1– So that the hardware loop can be nested one layer . for example :
// Sum the rows of a 100x200 matrix.
loop1(outer_start,#100)
outer_start:
R0 = #0
loop0(inner_start,#200)
inner_start:
R3 = memw(R1++#4)
{ R0 = add(R0,R3) }:endloop0
{ memw(R2++#4) = R0 }:endloop1
The hardware loop instruction uses the following :
- For non nested loops , Use loop0.
- For nested loops ,loop0 For internal circulation ,loop1 For external circulation .
Be careful If the program needs to create a loop nested more than one layer , Then the two innermost loops can be realized as hardware loops , The remaining outer loops are implemented as software branches .
Each hardware loop is associated with a pair of dedicated loop registers :
- Loop start address register SAn Set as the address of the first instruction in the loop ( It is usually expressed as labels in assembly language ).
- Loop count register LCn Set to 32 Bit unsigned value , Specify the number of loop iterations to perform . When PC At the end of the cycle , Check LCn To determine whether the loop should repeat or exit .
The hardware loop setting instruction sets these two registers at a time —— It is usually not necessary to set them separately . However , Because the loop register completely specifies the hardware loop status , They can be saved and restored ( Interrupt automatically by processor or manually by programmer ), Once its loop register is reloaded , You can resume the paused hardware cycle normally . Saved values .
Hexagon The processor provides two sets of loop registers for two hardware loops :
- SA0 and LC0 By loop0 Use
- SA1 and LC1 By loop1 Use
surface 7-1 Lists the hardware cycle instructions .
| grammar | describe |
|---|---|
| loopN(start, Rs) | Hardware loop with register loop count . Cycle for hardware N Set register SAn and LCn: * SAn Assigned the specified cycle start address . * LCn Assigned as a general-purpose register Rs Value . Be careful - The loop start operand is encoded as PC Related immediate numbers . |
| loopN(start, #count) | Hardware loop with instant loop count . Cycle for hardware N Set register SAn and LCn: SAn Assigned the specified cycle start address . LCn Given the specified immediate number (0-1023). Be careful - The loop start operand is encoded as PC Related immediate numbers . |
| :endloopN | Hardware cycle end instruction . Do the following :if (LCn > 1) {PC = SAn; LCn = LCn-1} Be careful : This instruction is displayed in the assembly as a suffix attached to the last packet in the loop . It is encoded in the last packet . |
| SAn = Rs | Set the starting address of the loop to the general register Rs |
| LCn = Rs | Set the cycle count to the general register Rs |
Be careful
Circular instructions are assigned to instruction classes CR.
7.2.1 Loop setting
To set the hardware cycle , The loop register must be SAn and LCn Set to the correct value .
This can be done in two ways :
- loopN Instructions
- Register transfer to SAn and LCn
loopN Instruction execution settings SAn and LCn All work of . for example :
loop0(start,#3) // SA0=&start, LC0=3
start:
{ R0 = mpyi(R0,R0) } :endloop0
In this case , Hardware cycle ( It consists of a multiplication instruction ) Yes 3 Time . loop0 Instruction will register SA0 Set as the address value at the beginning of the label , And will LC0 Set to 3.
When the cycle count is loopN When is expressed as an immediate number in , The cycle count is limited to 0-1023 Within the scope of . If the required cycle count is outside this range , It must be specified as a register value . for example :
Using a loop N:
R1 = #20000;
loop0(start,R1) // LC0=20000, SA0=&start
start:
{ R0 = mpyi(R0,R0) } :endloop0
Use register assignment :
R1 = #20000
LC0 = R1 // LC0=20000
R1 = #start
SA0 = R1 // SA0=&start
start:
{ R0 = mpyi(R0,R0) } :endloop0
If loopN The instruction is too far from its loop start address , Is used to specify the starting address PC The relative offset value may exceed the maximum range of the instruction start address operand . If this happens , Either loopN The command approach cycle begins , Or specify the loop start address as 32 Bit constant ( The first 10.9 section ). for example :
Use 32 Bit constant :
R1 = #20000;
loop0(##start,R1) // LC0=20000, SA0=&start
...
7.2.2 The loop ends
The loop end instruction indicates the last packet in the hardware loop . It is expressed in assembly language , Put a symbol after the packet ":endloopN", among N Specify hardware cycle (0 or 1). for example :
loop0(start,#3)
start:
{ R0 = mpyi(R0,R0) } :endloop0 // last packet in loop
The last instruction in the loop must always be represented as a packet in assembly language ( Use curly braces ), Even if it is the only instruction in the packet .
Nested hardware loops can specify the same instruction as the end of internal and external loops . for example :
// Sum the rows of a 100x200 matrix.
// Software pipeline the outer loop.
p0 = cmp.gt(R0,R0) // p0 = false
loop1(outer_start,#100)
outer_start:
{ if (p0) memw(R2++#4) = R0
p0 = cmp.eq(R0,R0) // p0 = true
R0 = #0
loop0(inner_start,#200) }
inner_start:
R3 = memw(R1++#4)
{ R0 = add(R0,R3) }:endloop0:endloop1
memw(R2++#4) = R0
although endloopN The behavior of is similar to regular instructions ( By implementing loop testing and branching ), But please note that it will not execute in any instruction slot , And it is not counted as the instruction in the packet . therefore , A single instruction package marked as the end of the loop can perform up to six operations :
- Four general instructions ( The normal limit of an instruction package )
- endloop0 Testing and branching
- endloop1 Testing and branching
Be careful endloopN Instructions are encoded in instruction packets ( The first 10.6 section ).
7.2.3 Loop execution
After the hardware cycle is established , Regardless of the specified cycle count , The loop body is always executed at least once ( Because the loop count is not checked until the last instruction in the loop ). therefore , If a loop needs to be optionally executed zero times , Then there must be an explicit conditional branch before it . for example :
loop0(start,R1)
P0 = cmp.eq(R1,#0)
if (P0) jump skip
start:
{ R0 = mpyi(R0,R0) } :endloop0
skip:
In this example , Use R1 The cycle count in sets a hardware cycle , But if R1 The value in is zero , Then the software branch will skip the loop body .
After executing the cycle end instruction of the hardware cycle ,Hexagon The processor will check the value in the corresponding cycle count register :
- If the value is greater than 1, Then the processor decrements the loop count register and performs a zero loop branch to the loop start address .
- If the value is less than or equal to 1, Then the processor resumes program execution at the instruction immediately after the loop end instruction .
Be careful Because nested hardware loops can share the same loop end instruction , The processor can check two loop count registers in one operation .
7.2.4 Pipelined hardware loop
Software pipelining loops for Hexagon Processor, etc VLIW Architecture is common . They improve code performance in loops by overlapping multiple loop iterations .
The software pipeline consists of three parts :
- The beginning of the cycle
- kernel ( Or steady state ) part
- The end of pipeline exhaustion
It is best to illustrate this with a simple example , As shown below .
int foo(int *A, int *result)
{
int i;
for (i=0;i<100;i++) {
result[i]= A[i]*A[i];
}
}
foo:
{ R3 = R1
loop0(.kernel,#98) // Decrease loop count by 2
}
R1 = memw(R0++#4) // 1st prologue stage
{ R1 = memw(R0++#4) // 2nd prologue stage
R2 = mpyi(R1,R1)
}
.falign
.kernel:
{ R1 = memw(R0++#4) // kernel
R2 = mpyi(R1,R1)
memw(R3++#4) = R2
}:endloop0
{ R2 = mpyi(R1,R1) // 1st epilogue stage
memw(R3++#4) = R2
}
memw(R3++#4) = R2 // 2nd epilogue stage
jumpr lr
In the above code , The kernel part of the pipelined loop executes three iterations of the loop in parallel :
- iteration N+2 The load of
- iteration N+1 Multiplication of
- iteration N The storage
One disadvantage of software pipelining is the extra code required for the preface and the end of the pipelining loop .
To solve this problem ,Hexagon The processor provides spNloop0 Instructions , Of which “N” Express 1-3 Number in range . for example :
P3 = sp2loop0(start,#10) // Set up pipelined loop
spNloop0 yes loop0 Variations of instructions : It USES SA0 and LC0 Establish a normal hardware loop , But do the following additional actions :
- perform spNloop0 When the command , True value false Assign to condition register P3.
- Associative loop execution N Next time ,P3 Automatically set to true.
This function ( It is called automatic condition control ) The stored instructions in the kernel part of the pipeline loop can be controlled by P3 To execute conditionally , So because spNloop0 control P3 The way - It will not be executed during the warm-up of the assembly line . This can reduce the code size of many software pipeline loops by eliminating the need for preamble code .
spNloop0 It cannot be used to eliminate the end code from the pipeline loop ; however , In some cases , You can do this by using programming techniques .
Usually , The problem that affects the deletion of the end code is load security . If the kernel part of the pipeline loop can safely access the end of its array —— Or because it is known to be safe , Or because the array has been filled at the end —— Then the ending code is unnecessary . however , If loading security cannot be ensured , You need explicit closing code to empty the software pipeline .
Software pipeline loop ( Use spNloop0)
int foo(int *A, int *result)
{
int i;
for (i=0;i<100;i++) {
result[i]= A[i]*A[i];
}
}
foo:
{ // load safety assumed
P3 = sp2loop0(.kernel,#102) // set up pipelined loop
R3 = R1
}
.falign
.kernel:
{ R1 = memw(R0++#4) // kernel
R2 = mpyi(R1,R1)
if (P3) memw(R3++#4) = R2
}:endloop0
jumpr lr
Be careful spNloop0 Used to control P3 The set count value is stored in the user status register USR.LPCFG in .
7.2.5 Cycle limit
Hardware cycling has the following limitations :
- loopN or spNloop0( The first 7.2.4 section ) The loop setting packet in cannot contain speculative indirect jumps 、 New value comparison jump or dealloc_return.
- The last packet in the hardware loop cannot contain any program flow instructions ( Including jump or call ).
- loop0 The loop end package in cannot contain any changes SA0 or LC0 Instructions . Again ,loop1 The loop end package in cannot contain any changes SA1 or LC1 Instructions .
- spNloop0 The loop end package in cannot contain any changes P3 Instructions .
Be careful SA1 and LC1 Can be in loop0 Change at the end of , and SA0 and LC0 Can be in loop1 Change at the end of .
边栏推荐
- Arduino development (II)_ RGB light control method based on Arduino uno development board
- Codeforces 1706E 并查集 + 启发式合并 + ST 表
- [Numpy] 广播机制(Broadcast)
- R语言使用epiDisplay包的power.for.2p函数进行效用分析 ( 效能分析、Power analysis)、给定两个样本的比例值(proportions)、样本量计算效用值
- 说透缓存一致性与内存屏障
- [dart] a programming language for cross end development
- mcu日志输出的一种方法
- API Gateway介绍
- 坚持做一件事情
- MySQL基本查询和运算符
猜你喜欢

IOU 目标跟踪其二:VIOU Tracker

Do you know about data synchronization?

Face recognition 5.1- insightface face face detection model training practice notes

Best practices for Oracle kingbasees migration of Jincang database (4. Oracle database migration practice)

Brand list cases

Hcip day 5

Academic sharing | Tsinghua University, Kang Chongqing: power system carbon measurement technology and application (matlab code implementation)

Slim: self supervised point cloud scene flow and motion estimation (iccv 2021)

82.(cesium篇)cesium点在3d模型上运动

金仓数据库 Oracle 至 KingbaseES 迁移最佳实践 (4. Oracle数据库移植实战)
随机推荐
[Numpy] 数组属性
Openresty Lua resty core use
People call this software testing engineer. You're just making a living (with HR interview Dictionary)
《SRE:Google运维解密》读后有感
Hexagon_V65_Programmers_Reference_Manual(8)
How does the industrial switch enter the web management interface?
金仓数据库 Oracle 至 KingbaseES 迁移最佳实践 (4. Oracle数据库移植实战)
14天鸿蒙设备开发实战-第七章 设备联网上云 学习笔记
NPDP|什么样的产品经理可以被称为优秀?
Write bootloader from 0
DJI内推码(一码一用,2022年7月26日更新)
go --- air自动重新编译
[deep learning] pytoch tensor
IOU 目标跟踪其二:VIOU Tracker
五大知名人士对于AI的忧虑
Openresty Lua resty DNS domain name resolution
Advanced SQL skills CTE and recursive query
北京/上海/广州/深圳DAMA-CDGA/CDGP数据治理认证报名条件
最新版web漏洞扫描工具AppScan\AWVS\Xray安装及使用教程
How to calculate the execution time in the function resource usage when using the timer trigger type to process database data?