当前位置：网站首页>Hexagon_ V65_ Programmers_ Reference_ Manual（8）

Hexagon_ V65_ Programmers_ Reference_ Manual（8）

2022-07-27 21:01:00 【weixin_ thirty-eight million four hundred and ninety-eight thou】

Hexagon_V65_Programmers_Reference_Manual（8）

7. Program flow

7. Program flow

Hexagon The processor supports the following program flow tools ：

Conditional instruction
Hardware cycle
Software branch
Pause
abnormal
The software branch includes jumps 、 Call and return . Support several types of jump ：
Conditions jump (Speculative jumps)
Compare jump (Compare jumps)
Register assignment jump (Register transfer jumps)
Double jump (Dual jumps)

7.1 Conditional instruction

many Hexagon Processor instructions can be conditionally executed . for example ：

if (P0) R0 = memw(R2) // conditionally load word if P0
if (!P1) jump label // conditionally jump if not P1

The following instructions can be specified as conditional ：

Jump and call
Many load and store instructions
Logical instruction （ Include AND/OR/XOR）
Shift half word
By register or short immediate 32 Bit plus / reduce
Sign and zero extension
32 Bit register transfer and 64 Bit combination word
The register is immediately assigned (Register transfer immediate)
Release the frame and return

7.2 Hardware cycle

Hexagon The processor contains hardware loop instructions , You can perform loop branches with zero overhead . for example ：

    loop0(start,#3) // loop 3 times
start:
    { R0 = mpyi(R0,R0) } :endloop0

Two sets of hardware loop instructions are provided loop0 and loop1– So that the hardware loop can be nested one layer . for example ：

// Sum the rows of a 100x200 matrix.
    loop1(outer_start,#100)
outer_start:
    R0 = #0
    loop0(inner_start,#200)
inner_start:
        R3 = memw(R1++#4)
        { R0 = add(R0,R3) }:endloop0
    { memw(R2++#4) = R0 }:endloop1

The hardware loop instruction uses the following ：

For non nested loops , Use loop0.
For nested loops ,loop0 For internal circulation ,loop1 For external circulation .

 Be careful   If the program needs to create a loop nested more than one layer , Then the two innermost loops can be realized as hardware loops , The remaining outer loops are implemented as software branches .

Each hardware loop is associated with a pair of dedicated loop registers ：

Loop start address register SAn Set as the address of the first instruction in the loop （ It is usually expressed as labels in assembly language ）.
Loop count register LCn Set to 32 Bit unsigned value , Specify the number of loop iterations to perform . When PC At the end of the cycle , Check LCn To determine whether the loop should repeat or exit .

The hardware loop setting instruction sets these two registers at a time —— It is usually not necessary to set them separately . However , Because the loop register completely specifies the hardware loop status , They can be saved and restored （ Interrupt automatically by processor or manually by programmer ）, Once its loop register is reloaded , You can resume the paused hardware cycle normally . Saved values .
Hexagon The processor provides two sets of loop registers for two hardware loops ：

SA0 and LC0 By loop0 Use
SA1 and LC1 By loop1 Use

surface 7-1 Lists the hardware cycle instructions .

grammar	describe
loopN(start, Rs)	Hardware loop with register loop count . Cycle for hardware N Set register SAn and LCn： * SAn Assigned the specified cycle start address . * LCn Assigned as a general-purpose register Rs Value . Be careful - The loop start operand is encoded as PC Related immediate numbers .
loopN(start, #count)	Hardware loop with instant loop count . Cycle for hardware N Set register SAn and LCn： SAn Assigned the specified cycle start address . LCn Given the specified immediate number (0-1023). Be careful - The loop start operand is encoded as PC Related immediate numbers .
:endloopN	Hardware cycle end instruction . Do the following ：if (LCn > 1) {PC = SAn; LCn = LCn-1} Be careful ： This instruction is displayed in the assembly as a suffix attached to the last packet in the loop . It is encoded in the last packet .
SAn = Rs	Set the starting address of the loop to the general register Rs
LCn = Rs	Set the cycle count to the general register Rs

 Be careful 
     Circular instructions are assigned to instruction classes  CR.

7.2.1 Loop setting

To set the hardware cycle , The loop register must be SAn and LCn Set to the correct value .
This can be done in two ways ：

loopN Instructions
Register transfer to SAn and LCn

loopN Instruction execution settings SAn and LCn All work of . for example ：

    loop0(start,#3) // SA0=&start, LC0=3
start:
    { R0 = mpyi(R0,R0) } :endloop0

In this case , Hardware cycle （ It consists of a multiplication instruction ） Yes 3 Time . loop0 Instruction will register SA0 Set as the address value at the beginning of the label , And will LC0 Set to 3.

When the cycle count is loopN When is expressed as an immediate number in , The cycle count is limited to 0-1023 Within the scope of . If the required cycle count is outside this range , It must be specified as a register value . for example ：

Using a loop N：

    R1 = #20000;
    loop0(start,R1) // LC0=20000, SA0=&start
start:
    { R0 = mpyi(R0,R0) } :endloop0

Use register assignment ：

    R1 = #20000
    LC0 = R1 // LC0=20000
    R1 = #start
    SA0 = R1 // SA0=&start
start:
    { R0 = mpyi(R0,R0) } :endloop0

If loopN The instruction is too far from its loop start address , Is used to specify the starting address PC The relative offset value may exceed the maximum range of the instruction start address operand . If this happens , Either loopN The command approach cycle begins , Or specify the loop start address as 32 Bit constant （ The first 10.9 section ）. for example ：
Use 32 Bit constant ：

    R1 = #20000;
    loop0(##start,R1) // LC0=20000, SA0=&start
    ...

7.2.2 The loop ends

The loop end instruction indicates the last packet in the hardware loop . It is expressed in assembly language , Put a symbol after the packet ":endloopN", among N Specify hardware cycle （0 or 1）. for example ：

    loop0(start,#3)
start:
    { R0 = mpyi(R0,R0) } :endloop0 // last packet in loop

The last instruction in the loop must always be represented as a packet in assembly language （ Use curly braces ）, Even if it is the only instruction in the packet .

Nested hardware loops can specify the same instruction as the end of internal and external loops . for example ：

// Sum the rows of a 100x200 matrix.
// Software pipeline the outer loop.
    p0 = cmp.gt(R0,R0) // p0 = false
    loop1(outer_start,#100)
outer_start:
    { if (p0) memw(R2++#4) = R0
    p0 = cmp.eq(R0,R0) // p0 = true
    R0 = #0
    loop0(inner_start,#200) }
inner_start:
    R3 = memw(R1++#4)
    { R0 = add(R0,R3) }:endloop0:endloop1
    memw(R2++#4) = R0

although endloopN The behavior of is similar to regular instructions （ By implementing loop testing and branching ）, But please note that it will not execute in any instruction slot , And it is not counted as the instruction in the packet . therefore , A single instruction package marked as the end of the loop can perform up to six operations ：

Four general instructions （ The normal limit of an instruction package ）
endloop0 Testing and branching
endloop1 Testing and branching

 Be careful  endloopN  Instructions are encoded in instruction packets （ The first  10.6  section ）.

7.2.3 Loop execution

After the hardware cycle is established , Regardless of the specified cycle count , The loop body is always executed at least once （ Because the loop count is not checked until the last instruction in the loop ）. therefore , If a loop needs to be optionally executed zero times , Then there must be an explicit conditional branch before it . for example ：

    loop0(start,R1)
    P0 = cmp.eq(R1,#0)
    if (P0) jump skip
start:
    { R0 = mpyi(R0,R0) } :endloop0
skip:

In this example , Use R1 The cycle count in sets a hardware cycle , But if R1 The value in is zero , Then the software branch will skip the loop body .

After executing the cycle end instruction of the hardware cycle ,Hexagon The processor will check the value in the corresponding cycle count register ：

If the value is greater than 1, Then the processor decrements the loop count register and performs a zero loop branch to the loop start address .
If the value is less than or equal to 1, Then the processor resumes program execution at the instruction immediately after the loop end instruction .

 Be careful   Because nested hardware loops can share the same loop end instruction , The processor can check two loop count registers in one operation .

7.2.4 Pipelined hardware loop

Software pipelining loops for Hexagon Processor, etc VLIW Architecture is common . They improve code performance in loops by overlapping multiple loop iterations .

The software pipeline consists of three parts ：

The beginning of the cycle
kernel （ Or steady state ） part
The end of pipeline exhaustion
It is best to illustrate this with a simple example , As shown below .

int foo(int *A, int *result)
{
    int i;
    for (i=0;i<100;i++) {
        result[i]= A[i]*A[i];
    }
}

foo:
{   R3 = R1
    loop0(.kernel,#98) // Decrease loop count by 2
}
    R1 = memw(R0++#4) // 1st prologue stage
{   R1 = memw(R0++#4) // 2nd prologue stage
    R2 = mpyi(R1,R1)
}
    .falign
.kernel:
{   R1 = memw(R0++#4) // kernel
    R2 = mpyi(R1,R1)
    memw(R3++#4) = R2
}:endloop0
{   R2 = mpyi(R1,R1) // 1st epilogue stage
    memw(R3++#4) = R2
}
    memw(R3++#4) = R2 // 2nd epilogue stage
    jumpr lr

In the above code , The kernel part of the pipelined loop executes three iterations of the loop in parallel ：

iteration N+2 The load of
iteration N+1 Multiplication of
iteration N The storage

One disadvantage of software pipelining is the extra code required for the preface and the end of the pipelining loop .
To solve this problem ,Hexagon The processor provides spNloop0 Instructions , Of which “N” Express 1-3 Number in range . for example ：

    P3 = sp2loop0(start,#10) // Set up pipelined loop

spNloop0 yes loop0 Variations of instructions ： It USES SA0 and LC0 Establish a normal hardware loop , But do the following additional actions ：

perform spNloop0 When the command , True value false Assign to condition register P3.
Associative loop execution N Next time ,P3 Automatically set to true.

This function （ It is called automatic condition control ） The stored instructions in the kernel part of the pipeline loop can be controlled by P3 To execute conditionally , So because spNloop0 control P3 The way - It will not be executed during the warm-up of the assembly line . This can reduce the code size of many software pipeline loops by eliminating the need for preamble code .

spNloop0 It cannot be used to eliminate the end code from the pipeline loop ; however , In some cases , You can do this by using programming techniques .

Usually , The problem that affects the deletion of the end code is load security . If the kernel part of the pipeline loop can safely access the end of its array —— Or because it is known to be safe , Or because the array has been filled at the end —— Then the ending code is unnecessary . however , If loading security cannot be ensured , You need explicit closing code to empty the software pipeline .

Software pipeline loop （ Use spNloop0）

int foo(int *A, int *result)
{
    int i;
    for (i=0;i<100;i++) {
        result[i]= A[i]*A[i];
    }
}

foo:
{ // load safety assumed
    P3 = sp2loop0(.kernel,#102) // set up pipelined loop
    R3 = R1
}
.falign
.kernel:
{   R1 = memw(R0++#4) // kernel
    R2 = mpyi(R1,R1)
    if (P3) memw(R3++#4) = R2
}:endloop0
    jumpr lr

 Be careful  spNloop0  Used to control  P3  The set count value is stored in the user status register  USR.LPCFG  in .

7.2.5 Cycle limit

Hardware cycling has the following limitations ：

loopN or spNloop0（ The first 7.2.4 section ） The loop setting packet in cannot contain speculative indirect jumps 、 New value comparison jump or dealloc_return.
The last packet in the hardware loop cannot contain any program flow instructions （ Including jump or call ）.
loop0 The loop end package in cannot contain any changes SA0 or LC0 Instructions . Again ,loop1 The loop end package in cannot contain any changes SA1 or LC1 Instructions .
spNloop0 The loop end package in cannot contain any changes P3 Instructions .

 Be careful  SA1  and  LC1  Can be in  loop0  Change at the end of , and  SA0  and  LC0  Can be in  loop1  Change at the end of .

原网站

版权声明
本文为[weixin_ thirty-eight million four hundred and ninety-eight thou]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207271822302626.html

当前位置：网站首页>Hexagon_ V65_ Programmers_ Reference_ Manual（8）

Hexagon_ V65_ Programmers_ Reference_ Manual（8）