当前位置:网站首页>Dynamic saving and recovery of FPU context under risc-v architecture

Dynamic saving and recovery of FPU context under risc-v architecture

2022-06-24 09:11:00 RT thread IOT operating system

This paper is written by RT-Thread Forum user @blta Original release :https://club.rt-thread.org/ask/article/248051628070d52e.html

stay RISC-V Transplant those things It is mentioned in the article that RISC-V framework FPU Optimization of the migration part , Looking for a job recently , It was done on and off , Finally done .

development environment

Hardware

This time I chose Nuclei Cooperating with China Mobile Xinsheng technology CM32M433R-START Of RISC-V Ecological development board , This development board just came out this year , Relatively new , Using core technology N308 kernel (RV32IMACFP) What suits us FPU Test requirements ,99 Yuan is also very cheap , Start testing decisively , By the way !

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-jl6JrnLU-1655860145499)(https://oss-club.rt-thread.org/uploads/20220620/41ca1bc4c2cd198af3df26cbf4d87a80.png.webp “image-20220611084812507.png”)]

more info , refer to https://www.rvmcu.com/quickstart-show-id-13.html

Software

because rt-thread and rt-thread studio The development board is not yet supported , Use the official first Nuclei Studio test

Nuclei Studio 2022.04

CM32M4xxR-Support-Pack-v1.0.2-win32-x32.zip

New project

1) be based on CM32M433R_START New project of development board

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-kanxKDV4-1655860145499)(https://oss-club.rt-thread.org/uploads/20220620/d6c5eb3e46bb229776aba52218402777.png.webp “image-20220613095756782.png”)]

2) be based on RT-Thread New project

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-4DYj5yKa-1655860145500)(https://oss-club.rt-thread.org/uploads/20220620/b574e837f0b8024bf33fcb21a97cbb7a.png.webp “image-20220613100215099.png”)]

3) Compile testing

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-Nd15mGRG-1655860145501)(https://oss-club.rt-thread.org/uploads/20220620/e13e559558691e66d33cf4ba0415a985.png.webp “image-20220613100807716.png”)]

This time it was used CMlink-OpenOCD The speed is very awesome , It will load successfully soon !

Create a new test thread

Because the official routine is not switched to FPU Partial context processing , So once there are multiple threads involved in floating-point register operations , May lead to FPU Inconsistent context .

.align 2
.global eclic_msip_handler
eclic_msip_handler:
    addi sp, sp, -RT_CONTEXT_SIZE
    STORE x1,  1  * REGBYTES(sp)    /* RA */
    STORE x5,  2  * REGBYTES(sp)
    STORE x6,  3  * REGBYTES(sp)
    STORE x7,  4  * REGBYTES(sp)
    STORE x8,  5  * REGBYTES(sp)
    STORE x9,  6  * REGBYTES(sp)
    STORE x10, 7  * REGBYTES(sp)
    STORE x11, 8  * REGBYTES(sp)
    STORE x12, 9  * REGBYTES(sp)
    STORE x13, 10 * REGBYTES(sp)
    STORE x14, 11 * REGBYTES(sp)
    STORE x15, 12 * REGBYTES(sp)
#ifndef __riscv_32e
    STORE x16, 13 * REGBYTES(sp)
    STORE x17, 14 * REGBYTES(sp)
    STORE x18, 15 * REGBYTES(sp)
    STORE x19, 16 * REGBYTES(sp)
    STORE x20, 17 * REGBYTES(sp)
    STORE x21, 18 * REGBYTES(sp)
    STORE x22, 19 * REGBYTES(sp)
    STORE x23, 20 * REGBYTES(sp)
    STORE x24, 21 * REGBYTES(sp)
    STORE x25, 22 * REGBYTES(sp)
    STORE x26, 23 * REGBYTES(sp)
    STORE x27, 24 * REGBYTES(sp)
    STORE x28, 25 * REGBYTES(sp)
    STORE x29, 26 * REGBYTES(sp)
    STORE x30, 27 * REGBYTES(sp)
    STORE x31, 28 * REGBYTES(sp)
#endif
    /* Push mstatus to stack */
    csrr t0, CSR_MSTATUS
    STORE t0,  (RT_SAVED_REGNUM - 1)  * REGBYTES(sp)
    
    ...
    
        /* Restore Registers from Stack */
    LOAD x1,  1  * REGBYTES(sp)    /* RA */
    LOAD x5,  2  * REGBYTES(sp)
    LOAD x6,  3  * REGBYTES(sp)
    LOAD x7,  4  * REGBYTES(sp)
    LOAD x8,  5  * REGBYTES(sp)
    LOAD x9,  6  * REGBYTES(sp)
    LOAD x10, 7  * REGBYTES(sp)
    LOAD x11, 8  * REGBYTES(sp)
    LOAD x12, 9  * REGBYTES(sp)
    LOAD x13, 10 * REGBYTES(sp)
    LOAD x14, 11 * REGBYTES(sp)
    LOAD x15, 12 * REGBYTES(sp)
#ifndef __riscv_32e
    LOAD x16, 13 * REGBYTES(sp)
    LOAD x17, 14 * REGBYTES(sp)
    LOAD x18, 15 * REGBYTES(sp)
    LOAD x19, 16 * REGBYTES(sp)
    LOAD x20, 17 * REGBYTES(sp)
    LOAD x21, 18 * REGBYTES(sp)
    LOAD x22, 19 * REGBYTES(sp)
    LOAD x23, 20 * REGBYTES(sp)
    LOAD x24, 21 * REGBYTES(sp)
    LOAD x25, 22 * REGBYTES(sp)
    LOAD x26, 23 * REGBYTES(sp)
    LOAD x27, 24 * REGBYTES(sp)
    LOAD x28, 25 * REGBYTES(sp)
    LOAD x29, 26 * REGBYTES(sp)
    LOAD x30, 27 * REGBYTES(sp)
    LOAD x31, 28 * REGBYTES(sp)
#endif

    addi sp, sp, RT_CONTEXT_SIZE
    mret
    

Floating point threads 1

thread1 A slightly more complex floating-point operation is added in

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-Rd3qPtDf-1655860145502)(https://oss-club.rt-thread.org/uploads/20220620/724f625649d04e77e92284514d459188.png.webp “image-20220613175241185.png”)]

Floating point threads 2

thread2 simple

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-ZAaM7vLl-1655860145502)(https://oss-club.rt-thread.org/uploads/20220620/ad36e2f662fce56f68451497988c474c.png.webp “image-20220613174511111.png”)]

Thread priority

thread1 and thread2 Have the same priority , Time slice polling execution .

thread1 = rt_thread_create("thread1",thread1_entry, NULL, 1024, 11, 1);
if(thread1 != NULL)
{
	rt_thread_startup(thread1);
}
else
{
	printf("\r\n create thread1 failed\r\n");
}

thread2 = rt_thread_create("thread2",thread2_entry, NULL, 1024, 11, 1);
if(thread2 != NULL)
{
	rt_thread_startup(thread2);
}
else
{
	printf("\r\n create thread2 failed\r\n");
}

Running results

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-pbVpuaGM-1655860145503)(https://oss-club.rt-thread.org/uploads/20220620/6bf93670b928e9a323af4a8fe780964d.png “image-20220613174640185.png”)]

Threads 1 After the first switch ,res Some exceptions occur , Let's look at the improvements later FPU Can this problem be solved after

FPU Static save

Reference resources rtthread ch32v307 BSP, Use FPU Static save , I.e. do not judge Mstatus.FS Save all fields directly FPU Partial register

Stack frame

First , modify stack_frame, increase 32 individual float registers f0~f31

struct rt_hw_stack_frame
{
    rt_ubase_t epc;        /*!< epc - epc    - program counter                     */
    rt_ubase_t ra;         /*!< x1  - ra     - return address for jumps            */
    rt_ubase_t t0;         /*!< x5  - t0     - temporary register 0                */
    rt_ubase_t t1;         /*!< x6  - t1     - temporary register 1                */
    rt_ubase_t t2;         /*!< x7  - t2     - temporary register 2                */
    rt_ubase_t s0_fp;      /*!< x8  - s0/fp  - saved register 0 or frame pointer   */
    rt_ubase_t s1;         /*!< x9  - s1     - saved register 1                    */
    rt_ubase_t a0;         /*!< x10 - a0     - return value or function argument 0 */
    rt_ubase_t a1;         /*!< x11 - a1     - return value or function argument 1 */
    rt_ubase_t a2;         /*!< x12 - a2     - function argument 2                 */
    rt_ubase_t a3;         /*!< x13 - a3     - function argument 3                 */
    rt_ubase_t a4;         /*!< x14 - a4     - function argument 4                 */
    rt_ubase_t a5;         /*!< x15 - a5     - function argument 5                 */
#ifndef __riscv_32e
    rt_ubase_t a6;         /*!< x16 - a6     - function argument 6                 */
    rt_ubase_t a7;         /*!< x17 - s7     - function argument 7                 */
    rt_ubase_t s2;         /*!< x18 - s2     - saved register 2                    */
    rt_ubase_t s3;         /*!< x19 - s3     - saved register 3                    */
    rt_ubase_t s4;         /*!< x20 - s4     - saved register 4                    */
    rt_ubase_t s5;         /*!< x21 - s5     - saved register 5                    */
    rt_ubase_t s6;         /*!< x22 - s6     - saved register 6                    */
    rt_ubase_t s7;         /*!< x23 - s7     - saved register 7                    */
    rt_ubase_t s8;         /*!< x24 - s8     - saved register 8                    */
    rt_ubase_t s9;         /*!< x25 - s9     - saved register 9                    */
    rt_ubase_t s10;        /*!< x26 - s10    - saved register 10                   */
    rt_ubase_t s11;        /*!< x27 - s11    - saved register 11                   */
    rt_ubase_t t3;         /*!< x28 - t3     - temporary register 3                */
    rt_ubase_t t4;         /*!< x29 - t4     - temporary register 4                */
    rt_ubase_t t5;         /*!< x30 - t5     - temporary register 5                */
    rt_ubase_t t6;         /*!< x31 - t6     - temporary register 6                */
#endif
    rt_ubase_t mstatus;    /*!<              - machine status register             */

/* float register */
#ifdef ARCH_RISCV_FPU
	rv_floatreg_t f0;      /* f0  */
	rv_floatreg_t f1;      /* f1  */
	rv_floatreg_t f2;      /* f2  */
	rv_floatreg_t f3;      /* f3  */
	rv_floatreg_t f4;      /* f4  */
	rv_floatreg_t f5;      /* f5  */
	rv_floatreg_t f6;      /* f6  */
	rv_floatreg_t f7;      /* f7  */
	rv_floatreg_t f8;      /* f8  */
	rv_floatreg_t f9;      /* f9  */
	rv_floatreg_t f10;     /* f10 */
	rv_floatreg_t f11;     /* f11 */
	rv_floatreg_t f12;     /* f12 */
	rv_floatreg_t f13;     /* f13 */
	rv_floatreg_t f14;     /* f14 */
	rv_floatreg_t f15;     /* f15 */
	rv_floatreg_t f16;     /* f16 */
	rv_floatreg_t f17;     /* f17 */
	rv_floatreg_t f18;     /* f18 */
	rv_floatreg_t f19;     /* f19 */
	rv_floatreg_t f20;     /* f20 */
	rv_floatreg_t f21;     /* f21 */
	rv_floatreg_t f22;     /* f22 */
	rv_floatreg_t f23;     /* f23 */
	rv_floatreg_t f24;     /* f24 */
	rv_floatreg_t f25;     /* f25 */
	rv_floatreg_t f26;     /* f26 */
	rv_floatreg_t f27;     /* f27 */
	rv_floatreg_t f28;     /* f28 */
	rv_floatreg_t f29;     /* f29 */
	rv_floatreg_t f30;     /* f30 */
	rv_floatreg_t f31;     /* f31 */
#endif
};

As for the other FPU control register fcsr, Just a few exception flags and rounding patterns , It hasn't been put into the stack yet

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-S3yN0phg-1655860145504)(https://oss-club.rt-thread.org/uploads/20220620/dc4aa963367b910a986d3055687ddc0e.png.webp “image-20220620222515275.png”)]

Stack Init

Set the initial values of floating-point registers to 0

rt_uint8_t *rt_hw_stack_init(void       *tentry,
                             void       *parameter,
                             rt_uint8_t *stack_addr,
                             void       *texit)
{
    struct rt_hw_stack_frame *frame;
    rt_uint8_t         *stk;
    int                i;

    stk  = stack_addr + sizeof(rt_ubase_t);
    stk  = (rt_uint8_t *)RT_ALIGN_DOWN((rt_ubase_t)stk, REGBYTES);
    stk -= sizeof(struct rt_hw_stack_frame);

    frame = (struct rt_hw_stack_frame *)stk;

    for (i = 0; i < sizeof(struct rt_hw_stack_frame) / sizeof(rt_ubase_t); i++)
    {
        if(i < 30)
            ((rt_ubase_t *)frame)[i] = 0xdeadbeef;
        else
            ((rv_floatreg_t *)frame)[i] = 0x00;
    }

    frame->ra      = (rt_ubase_t)texit;
    frame->a0      = (rt_ubase_t)parameter;
    frame->epc     = (rt_ubase_t)tentry;
    frame->mstatus = RT_INITIAL_MSTATUS;

    return stk;
}

Save FPU context

According to the stack frame order , stay eclic_msip_handler When the interrupt function enters, the floating-point register group is saved first

eclic_msip_handler:
#ifdef ARCH_RISCV_FPU
	addi sp, sp, -32*FREGBYTES

    FSTORE  f0, 0 * FREGBYTES(sp)
    FSTORE  f1, 1 * FREGBYTES(sp)
    FSTORE  f2, 2 * FREGBYTES(sp)
    FSTORE  f3, 3 * FREGBYTES(sp)
    FSTORE  f4, 4 * FREGBYTES(sp)
    FSTORE  f5, 5 * FREGBYTES(sp)
    FSTORE  f6, 6 * FREGBYTES(sp)
    FSTORE  f7, 7 * FREGBYTES(sp)
    FSTORE  f8, 8 * FREGBYTES(sp)
    FSTORE  f9, 9 * FREGBYTES(sp)
    FSTORE  f10, 10 * FREGBYTES(sp)
    FSTORE  f11, 11 * FREGBYTES(sp)
    FSTORE  f12, 12 * FREGBYTES(sp)
    FSTORE  f13, 13 * FREGBYTES(sp)
    FSTORE  f14, 14 * FREGBYTES(sp)
    FSTORE  f15, 15 * FREGBYTES(sp)
    FSTORE  f16, 16 * FREGBYTES(sp)
    FSTORE  f17, 17 * FREGBYTES(sp)
    FSTORE  f18, 18 * FREGBYTES(sp)
    FSTORE  f19, 19 * FREGBYTES(sp)
    FSTORE  f20, 20 * FREGBYTES(sp)
    FSTORE  f21, 21 * FREGBYTES(sp)
    FSTORE  f22, 22 * FREGBYTES(sp)
    FSTORE  f23, 23 * FREGBYTES(sp)
    FSTORE  f24, 24 * FREGBYTES(sp)
    FSTORE  f25, 25 * FREGBYTES(sp)
    FSTORE  f26, 26 * FREGBYTES(sp)
    FSTORE  f27, 27 * FREGBYTES(sp)
    FSTORE  f28, 28 * FREGBYTES(sp)
    FSTORE  f29, 29 * FREGBYTES(sp)
    FSTORE  f30, 30 * FREGBYTES(sp)
    FSTORE  f31, 31 * FREGBYTES(sp)
#endif
    addi sp, sp, -RT_CONTEXT_SIZE
    STORE x1,  1  * REGBYTES(sp)    /* RA */
    STORE x5,  2  * REGBYTES(sp)
    STORE x6,  3  * REGBYTES(sp)
    STORE x7,  4  * REGBYTES(sp)
    .....

Restore FPU Context

Restore in rt_hw_context_switch_to

rt_hw_context_switch_to:
    /* Setup Interrupt Stack using
       The stack that was used by main()
       before the scheduler is started is
       no longer required after the scheduler is started.
       Interrupt stack pointer is stored in CSR_MSCRATCH */
    la t0, _sp
    csrw CSR_MSCRATCH, t0
    LOAD sp, 0x0(a0)                /* Read sp from first TCB member(a0) */

    /* Pop PC from stack and set MEPC */
    LOAD t0,  0  * REGBYTES(sp)
    csrw CSR_MEPC, t0
    /* Pop mstatus from stack and set it */
    LOAD t0,  (RT_SAVED_REGNUM - 1)  * REGBYTES(sp)
    csrw CSR_MSTATUS, t0
    /* Interrupt still disable here */
    /* Restore Registers from Stack */
    LOAD x1,  1  * REGBYTES(sp)    /* RA */
    LOAD x5,  2  * REGBYTES(sp)
    LOAD x6,  3  * REGBYTES(sp)
    LOAD x7,  4  * REGBYTES(sp)
    LOAD x8,  5  * REGBYTES(sp)
    LOAD x9,  6  * REGBYTES(sp)
    LOAD x10, 7  * REGBYTES(sp)
    LOAD x11, 8  * REGBYTES(sp)
    LOAD x12, 9  * REGBYTES(sp)
    LOAD x13, 10 * REGBYTES(sp)
    LOAD x14, 11 * REGBYTES(sp)
    LOAD x15, 12 * REGBYTES(sp)
#ifndef __riscv_32e
    LOAD x16, 13 * REGBYTES(sp)
    LOAD x17, 14 * REGBYTES(sp)
    LOAD x18, 15 * REGBYTES(sp)
    LOAD x19, 16 * REGBYTES(sp)
    LOAD x20, 17 * REGBYTES(sp)
    LOAD x21, 18 * REGBYTES(sp)
    LOAD x22, 19 * REGBYTES(sp)
    LOAD x23, 20 * REGBYTES(sp)
    LOAD x24, 21 * REGBYTES(sp)
    LOAD x25, 22 * REGBYTES(sp)
    LOAD x26, 23 * REGBYTES(sp)
    LOAD x27, 24 * REGBYTES(sp)
    LOAD x28, 25 * REGBYTES(sp)
    LOAD x29, 26 * REGBYTES(sp)
    LOAD x30, 27 * REGBYTES(sp)
    LOAD x31, 28 * REGBYTES(sp)
#endif

    addi sp, sp, RT_CONTEXT_SIZE
 /* load float reg */
#ifdef ARCH_RISCV_FPU

    FLOAD   f0, 0 * FREGBYTES(sp)
    FLOAD   f1, 1 * FREGBYTES(sp)
    FLOAD   f2, 2 * FREGBYTES(sp)
    FLOAD   f3, 3 * FREGBYTES(sp)
    FLOAD   f4, 4 * FREGBYTES(sp)
    FLOAD   f5, 5 * FREGBYTES(sp)
    FLOAD   f6, 6 * FREGBYTES(sp)
    FLOAD   f7, 7 * FREGBYTES(sp)
    FLOAD   f8, 8 * FREGBYTES(sp)
    FLOAD   f9, 9 * FREGBYTES(sp)
    FLOAD   f10, 10 * FREGBYTES(sp)
    FLOAD   f11, 11 * FREGBYTES(sp)
    FLOAD   f12, 12 * FREGBYTES(sp)
    FLOAD   f13, 13 * FREGBYTES(sp)
    FLOAD   f14, 14 * FREGBYTES(sp)
    FLOAD   f15, 15 * FREGBYTES(sp)
    FLOAD   f16, 16 * FREGBYTES(sp)
    FLOAD   f17, 17 * FREGBYTES(sp)
    FLOAD   f18, 18 * FREGBYTES(sp)
    FLOAD   f19, 19 * FREGBYTES(sp)
    FLOAD   f20, 20 * FREGBYTES(sp)
    FLOAD   f21, 21 * FREGBYTES(sp)
    FLOAD   f22, 22 * FREGBYTES(sp)
    FLOAD   f23, 23 * FREGBYTES(sp)
    FLOAD   f24, 24 * FREGBYTES(sp)
    FLOAD   f25, 25 * FREGBYTES(sp)
    FLOAD   f26, 26 * FREGBYTES(sp)
    FLOAD   f27, 27 * FREGBYTES(sp)
    FLOAD   f28, 28 * FREGBYTES(sp)
    FLOAD   f29, 29 * FREGBYTES(sp)
    FLOAD   f30, 30 * FREGBYTES(sp)
    FLOAD   f31, 31 * FREGBYTES(sp)
    addi    sp, sp, 32 * FREGBYTES
#endif
    mret

Restore in eclic_msip_handler

eclic_msip_handler:
    ....

    /* Pop additional registers */

    /* Pop mstatus from stack and set it */
    LOAD t0,  (RT_SAVED_REGNUM - 1)  * REGBYTES(sp)
    csrw CSR_MSTATUS, t0
    /* Interrupt still disable here */
    /* Restore Registers from Stack */
    LOAD x1,  1  * REGBYTES(sp)    /* RA */
    LOAD x5,  2  * REGBYTES(sp)
    LOAD x6,  3  * REGBYTES(sp)
    LOAD x7,  4  * REGBYTES(sp)
    LOAD x8,  5  * REGBYTES(sp)
    LOAD x9,  6  * REGBYTES(sp)
    LOAD x10, 7  * REGBYTES(sp)
    LOAD x11, 8  * REGBYTES(sp)
    LOAD x12, 9  * REGBYTES(sp)
    LOAD x13, 10 * REGBYTES(sp)
    LOAD x14, 11 * REGBYTES(sp)
    LOAD x15, 12 * REGBYTES(sp)
#ifndef __riscv_32e
    LOAD x16, 13 * REGBYTES(sp)
    LOAD x17, 14 * REGBYTES(sp)
    LOAD x18, 15 * REGBYTES(sp)
    LOAD x19, 16 * REGBYTES(sp)
    LOAD x20, 17 * REGBYTES(sp)
    LOAD x21, 18 * REGBYTES(sp)
    LOAD x22, 19 * REGBYTES(sp)
    LOAD x23, 20 * REGBYTES(sp)
    LOAD x24, 21 * REGBYTES(sp)
    LOAD x25, 22 * REGBYTES(sp)
    LOAD x26, 23 * REGBYTES(sp)
    LOAD x27, 24 * REGBYTES(sp)
    LOAD x28, 25 * REGBYTES(sp)
    LOAD x29, 26 * REGBYTES(sp)
    LOAD x30, 27 * REGBYTES(sp)
    LOAD x31, 28 * REGBYTES(sp)
#endif

    addi sp, sp, RT_CONTEXT_SIZE
     /* load float reg */
#ifdef ARCH_RISCV_FPU

    FLOAD   f0, 0 * FREGBYTES(sp)
    FLOAD   f1, 1 * FREGBYTES(sp)
    FLOAD   f2, 2 * FREGBYTES(sp)
    FLOAD   f3, 3 * FREGBYTES(sp)
    FLOAD   f4, 4 * FREGBYTES(sp)
    FLOAD   f5, 5 * FREGBYTES(sp)
    FLOAD   f6, 6 * FREGBYTES(sp)
    FLOAD   f7, 7 * FREGBYTES(sp)
    FLOAD   f8, 8 * FREGBYTES(sp)
    FLOAD   f9, 9 * FREGBYTES(sp)
    FLOAD   f10, 10 * FREGBYTES(sp)
    FLOAD   f11, 11 * FREGBYTES(sp)
    FLOAD   f12, 12 * FREGBYTES(sp)
    FLOAD   f13, 13 * FREGBYTES(sp)
    FLOAD   f14, 14 * FREGBYTES(sp)
    FLOAD   f15, 15 * FREGBYTES(sp)
    FLOAD   f16, 16 * FREGBYTES(sp)
    FLOAD   f17, 17 * FREGBYTES(sp)
    FLOAD   f18, 18 * FREGBYTES(sp)
    FLOAD   f19, 19 * FREGBYTES(sp)
    FLOAD   f20, 20 * FREGBYTES(sp)
    FLOAD   f21, 21 * FREGBYTES(sp)
    FLOAD   f22, 22 * FREGBYTES(sp)
    FLOAD   f23, 23 * FREGBYTES(sp)
    FLOAD   f24, 24 * FREGBYTES(sp)
    FLOAD   f25, 25 * FREGBYTES(sp)
    FLOAD   f26, 26 * FREGBYTES(sp)
    FLOAD   f27, 27 * FREGBYTES(sp)
    FLOAD   f28, 28 * FREGBYTES(sp)
    FLOAD   f29, 29 * FREGBYTES(sp)
    FLOAD   f30, 30 * FREGBYTES(sp)
    FLOAD   f31, 31 * FREGBYTES(sp)
    addi    sp, sp, 32 * FREGBYTES
#endif
    mret

test

The same thread , There will be no problem after running

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-PPoAbPcN-1655860145505)(https://oss-club.rt-thread.org/uploads/20220620/95e56d2338626da2bd9ed22f9cb7e5f5.png “image-20220613211437848.png”)]

FPU Dynamic save

Although the current FPU The static saving scheme can solve FPU Context issues , But the price is still too high

Once enabled FPU, Whether or not the thread uses FPU, Context switching will save , restore 32 individual float registers

So I hope it can be like ARM In that way, whether to save the floating-point register group is determined according to the current program state , As below PendSV:

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-OzwjiP7c-1655860145505)(https://oss-club.rt-thread.org/uploads/20220620/c4bfe7f3ad2efba7e4ffaf4ebac8c123.png.webp “image-20220409172852155.png”)]

stay 《riscv-privileged-20211203.pdf》 file P26 Found in mstatus FS Definition of domain . We can totally rely on FS Whether it is Dirty state , To decide whether to save the register set , This is only when needed , Save extra 32 Floating point registers for , You can save a lot of time , Improve system switching efficiency .

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-UOhhWxlq-1655860145506)(https://oss-club.rt-thread.org/uploads/20220620/76228bf59859f535dd0a5463fa72e776.png.webp “image-20220409174652019.png”)]

Zephyr OS Reference resources

At present, only Zephyr OShttps://github.com/zephyrproject-rtos/zephyr/blob/main/arch/riscv/core/switch.S

Some code seems to be doing dynamic saving FPU Things about :

save:

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	/* Assess whether floating-point registers need to be saved. */
	lb t0, _thread_offset_to_user_options(a1)
	andi t0, t0, K_FP_REGS
	beqz t0, skip_store_fp_callee_saved

	frcsr t0
	sw t0, _thread_offset_to_fcsr(a1)
	DO_FP_CALLEE_SAVED(fsr, a1)
skip_store_fp_callee_saved:
#endif /* CONFIG_FPU && CONFIG_FPU_SHARING */

load:

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	/* Determine if we need to restore floating-point registers. */
	lb t0, _thread_offset_to_user_options(a0)
	li t1, MSTATUS_FS_INIT
	andi t0, t0, K_FP_REGS
	beqz t0, no_fp

	/* Enable floating point access */
	csrs mstatus, t1

	/* Restore FP regs */
	lw t1, _thread_offset_to_fcsr(a0)
	fscsr t1
	DO_FP_CALLEE_SAVED(flr, a0)
	j 1f

no_fp:
	/* Disable floating point access */
	csrc mstatus, t1
1:
#endif /* CONFIG_FPU && CONFIG_FPU_SHARING */

review Code discovery In fact, it is based on k_thread->base->user_options To judge directly

/* can be used for creating 'dummy' threads, e.g. for pending on objects */
struct _thread_base {

	/* this thread's entry in a ready/wait queue */
	union {
		sys_dnode_t qnode_dlist;
		struct rbnode qnode_rb;
	};

	/* wait queue on which the thread is pended (needed only for
	 * trees, not dumb lists)
	 */
	_wait_q_t *pended_on;

	/* user facing 'thread options'; values defined in include/kernel.h */
	uint8_t user_options;

	/* thread state */
	uint8_t thread_state;
	...

kernel.h Explanation

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-6ppqNVDv-1655860145507)(https://oss-club.rt-thread.org/uploads/20220620/acee11de6bfda5fbc44d84c90862cfb3.png.webp “image-20220614060737400.png”)]

FPU Test code

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-YJbG8oqK-1655860145508)(https://oss-club.rt-thread.org/uploads/20220620/ba4ee1e714282c3e64fcf6f4b56b6071.png.webp “image-20220614055552130.png”)]

summary :

  1. Zephyr There is one in the configuration CONFIG_FPU_SHARING, Global switch FPU
  2. When creating a task , You can choose whether to use CPU Of float registers
  3. choice K_FP_REGS after , Additional steps will be added during scheduling to save and restore the floating-point register context !
    [ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-nraVrEyb-1655860145509)(https://oss-club.rt-thread.org/uploads/20220620/8710a88206b0f9093368951764409adc.png.webp “image-20220614061301985.png”)]

Zephyr This kind of treatment , You need to be very careful when creating tasks thread_option Options , If not enabled K_FP_REGS Floating point operation is also involved in the task of , It is likely to cause floating-point context exceptions .

Because the compiler is global , Once on Hardware FPU, Floating point arithmetic has been discovered , It is possible to use floating-point instructions ( Sometimes I use lib Library optimization , Don't use ), It won't pay for your mistakes .

mstatus.fs

although Zephyr Dynamic storage of is not perfect , But it also provides us with a good reference , The following will be based on mstatus.fs Implementation dynamics FPU Save and restore

Again review 《riscv-privileged-20211203.pdf》 file ,Table3.4 given FPU Context in privileged mode mstatus.fs Action suggestions for domain saving and restoring

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-n9zJ0dSF-1655860145509)(https://oss-club.rt-thread.org/uploads/20220620/3c7dc0798d7a07eba71bf0f3e7edd425.png.webp “image-20220617181425195.png”)]

P27 There is a very important explanation on page

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-3G1q1Ckf-1655860145510)(https://oss-club.rt-thread.org/uploads/20220620/caea3e4f1936075665857ace085554b7.png “image-20220617184355389.png”)]

Context Save:

  • Dirty state : preservation FPU registers And switch to... After saving Clean state ,

  • Off,Init, Clean state : No need to save FPU registers, At the same time, the state remains unchanged

Context Restore:

  • Off state : No action

  • Init state : Immediate values can be loaded directly 0 To FPU registers , There is no need to Memory Load access

  • Clean state : Recover from the preserved memory stack

  • Dirty state : stay Context Save It has been cut to Clean Status quo , So there is no such state

About init The state is not memory visit And the immediate problem

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-uF7At5At-1655860145510)(https://oss-club.rt-thread.org/uploads/20220620/aa38d54984b7140b8c61ebb1fc878f59.png.webp “image-20220617210419348.png”)]

  1. First RV32F Command operated rd ,rs1, rs2 Most of them are floating-point registers

  2. There is only one direct assignment to a floating-point register flw Instructions : flw rd offset[11:0] (rs1)

    From the following assembly code, we can see that floating-point threads are very thread stack consuming , Register loading must be read from memory , Immediate numbers cannot be used .

    [ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-sdoTmFg3-1655860145511)(https://oss-club.rt-thread.org/uploads/20220620/0de6fa488a9f04b5a0c39980bf04dd49.png “image-20220618095506914.png”)]

  3. have access to fmv.w.x ftx, zero The handling instruction is completed , No access required Memory

Mstatus.fs And FPU The confusion of context

restore FPU Context(init)

according to riscv-privileged file ,rt_hw_context_switch_to need init FPU registers

load front

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-wxhxDKwT-1655860145512)(https://oss-club.rt-thread.org/uploads/20220620/2a9ddc8cbb6e23151639315f78ce33fe.png.webp “image-20220617211520703.png”)]

load after

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-NeAc3t87-1655860145513)(https://oss-club.rt-thread.org/uploads/20220620/074a6226034f0c125e24bf89c4a92227.png.webp “image-20220617211622742.png”)]

You can see an execution flw Instructions ,mstaus.fs Immediately from init Turned into dirty state , This is obviously not right :

Just put FPU registers Initialize to 0, mstatus.fs It becomes dirty, Even if the floating point operation is not involved in the thread , When you enter dirty 了 , Great injustice !

The current test found , Just execute floating-point instruction write FPU registers( contain fscr),mstatus.fs Will become dirty(off Except initial state ), therefore riscv-privileged in restore context The status should be dirty. If you want to continue init ,clean The status needs to be restored manually

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-K5tmYNFn-1655860145513)(https://oss-club.rt-thread.org/uploads/20220620/41baf55f219cd00abda38d4c20e410b4.png.webp “image-20220617222405518.png”)]

In this way, there are many differences between status and privilege documents , So we intend to simplify the code based on the measured results

Init state :

Personally, I don't think it needs to be saved and restored , It won't change fs The state of .

dirty Threads —>init Threads ( A thread that has not yet executed the floating-point part or is just a normal thread )

init The thread has no restore FPU action ,mstatus.fs Stay in init state , Even though FPU registers The scene of the last floating-point thread remains , And it won't have any impact :

  • Floating point write encountered in this execution , The compiler follows the caller principle , It's going to be in the stack , And load new value To FPU registers, mstatus.fs become dirty

  • Floating point write encountered during this non execution ,mstatus.fs keep init state

init Threads —>dirty Threads :

During the switching process restore dirty Thread FPU registers

Clean state :

The literal meaning of this state is easy to understand : The current thread has used FPU, Inside FPU registers Some have been used , But the life cycle of the floating-point register used is over ,FPU Part can be considered as a clean state , and init Almost , It's just FPU registers Not the initial constant .

But the tests meet these code Can't trigger :

1) Calling floating-point instructions outside the thread main loop , The main loop does not involve floating-point instructions

2) A thread calls a floating-point function , Itself does not involve floating-point instructions

The current... Is found in the test risc-v Of mstatus.fs Is and FPU registes Write the associated , once write FPU registers( contain fscr), Will lead to mstatus.fs become dirty(read No influence ), This is the simplest hardware implementation .

The problem is coming. , How to know that the floating-point register life cycle has ended , That's what compilers do , It must know , It's also easy to implement

1) Calling floating-point instructions outside the thread main loop In this case, the compiler is generally powerless , It cannot predict the following instructions .

2) A thread calls a floating-point function , Each function involving floating-point instructions is equivalent to a critical region , Inheritance in critical region mstatus.fs, Only when the function returns, the previous mstatus.fs

In fact, the stack is pressed directly when a function enters or encounters a floating-point instruction mstatus That's it , Here comes the question

1) If the call depth is too deep , A lot of thread stacks will be wasted ,ret Judge once before , It's not very concise .

2) The main thing is , If the thread is in user mode , Is not accessible mstatus CSR The register of .

Maybe for the above reasons , The compiler does not save the restore for you mstatus.fs state ,mstatus.fs The change of is the hardware behavior triggered by pure instructions . If it is implemented later , It is also estimated that ret Instruction triggered hardware behavior .

So this clean Status and confusion , It doesn't rule out that we haven't caught clean The special state of

Dirty state :

Based on the previous tests , Know that once the thread writes FPU registers It will trigger dirty, And then keep it that way .

according to riscv-privileged Table 3.4 Description of , If we have a floating point thread A, Now it's time dirty state , Being implemented :

Point in time 1: Be seized , Switch to another thread ,save FPU context after , Switch manually first mstatus.fs = clean,

Point in time 2: Scheduling happens , Floating point threads A The highest priority ,restore FPU context after , mstatus.fs from clean Turned into dirty, Switch manually again mstatus.fs = clean

Point in time 3: Floating point threads A Just switched over , Has just executed non FPU Write a few instructions , mstatus.fs Stay in clean state , Another high priority thread is ready , Scheduling occurs again

​ If you follow riscv-privileged Table 3.4,mstatus.fs = clean You don't need to save FPU context.

What happened

1) Multiple manual changes are required mstatus.fs = clean. Before and after recovery mstatus.fs Be consistent dirty, No problem , It's also logical , No problem .

2) Point in time 3 Scheduling that occurs , should save FPU context, This time and time 1 The comparison is just a few more instructions , There is no guarantee that the following will not continue FPU visit

​ Not saving directly may lead to FPU Operation exception .

Same thing , If there is Clean(None dirty, some clean) This state , It should also be preserved .

Sum up , If you try to follow risc-v spec file , Should be as follows

Scheme 1 ( Switch state manually )

current mstaus.fsoffinitcleandirty
save contextNONOYesYes
after save contextoffinitcleanclean
(switch to clean from dirty)
restore contextNONOYes/
after save contextoffinitclean
(switch to clean from dirty)
/

Option two ( Retain dirty state )

current mstaus.fsoffinitcleandirty
save contextNONOYesYes
after save contextoffinitcleandirty
restore contextNONOYesYes
after save contextoffinitdirtydirty

I have tested both schemes , It can operate normally , The back can speak .

Be suddenly enlightened

If clean It is a software definition

Although both options are normal , No problem found yet , But it always feels weird , Arguably an official document , There shouldn't be such an obvious mistake , After reading it online, some netizens also have questions about it

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-LdJgOxI6-1655860145514)(https://oss-club.rt-thread.org/uploads/20220620/e62c0a193c25412b1b3cd1cee2b5e34d.png.webp “image-20220619181958029.png”)]

puzzled , Today, when I took a shower, I suddenly felt bright :

If this clean Namely dirty A software definition state of the saved definition , After all, STM32F407 There is no such status on the , As long as there is change FPU registers All actions trigger FPU Used( amount to dirty), And then keep .

Refer again Zephyr OS

as for clean Unsaved in state FPU context Resulting problems , Reference resources Zephyr OS There is no problem with this approach , First look at how it saves and restores context

Important macros

kernel/include/gen_offset.h

* The macro "GEN_OFFSET_SYM(structure, member)" is used to generate a single
 * absolute symbol.  The absolute symbol will appear in the object module
 * generated from the source file that utilizes the GEN_OFFSET_SYM() macro.
 * Absolute symbols representing a structure member offset have the following
 * form:
 *
 *    __<structure>_<member>_OFFSET
 *
 * The macro "GEN_NAMED_OFFSET_SYM(structure, member, name)" is also provided
 * to create the symbol with the following form:
 *
 *    __<structure>_<name>_OFFSET
 *
 * This header also defines the GEN_ABSOLUTE_SYM macro to simply define an
 * absolute symbol, irrespective of whether the value represents a structure
 * or offset.

#ifndef ZEPHYR_KERNEL_INCLUDE_GEN_OFFSET_H_
#define ZEPHYR_KERNEL_INCLUDE_GEN_OFFSET_H_

#include <toolchain.h>
#include <stddef.h>

/* definition of the GEN_OFFSET_SYM() macros is toolchain independent  */

#define GEN_OFFSET_SYM(S, M) \
	GEN_ABSOLUTE_SYM(__##S##_##M##_##OFFSET, offsetof(S, M))

#define GEN_NAMED_OFFSET_SYM(S, M, N) \
	GEN_ABSOLUTE_SYM(__##S##_##N##_##OFFSET, offsetof(S, M))

#endif /* ZEPHYR_KERNEL_INCLUDE_GEN_OFFSET_H_ */

GEN_OFFSET_SYM and GEN_NAMED_OFFSET_SYM Macros will use assembly language to define a __##S####M####OFFSET and ##S####N##_##OFFSET Of the two symbol, Once declared, you can use them directly in the source file to get member variables M Relative to the structure S The offset , There will be many times , This and rt_container_of As important as .

riscv Under the abnormal caller Stack z_arch_esf_t

include/zephyr/arch/riscv/exp.h

struct __esf {
	ulong_t ra;		/* return address */

	ulong_t t0;		/* Caller-saved temporary register */
	ulong_t t1;		/* Caller-saved temporary register */
	ulong_t t2;		/* Caller-saved temporary register */
	ulong_t t3;		/* Caller-saved temporary register */
	ulong_t t4;		/* Caller-saved temporary register */
	ulong_t t5;		/* Caller-saved temporary register */
	ulong_t t6;		/* Caller-saved temporary register */
	
	ulong_t a0;		/* function argument/return value */
	ulong_t a1;		/* function argument */
	ulong_t a2;		/* function argument */
	ulong_t a3;		/* function argument */
	ulong_t a4;		/* function argument */
	ulong_t a5;		/* function argument */
	ulong_t a6;		/* function argument */
	ulong_t a7;		/* function argument */
	
	ulong_t mepc;		/* machine exception program counter */
	ulong_t mstatus;	/* machine status register */
	
	ulong_t s0;		/* callee-saved s0 */

#ifdef CONFIG_USERSPACE
	ulong_t sp;		/* preserved (user or kernel) stack pointer */
#endif

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	RV_FP_TYPE ft0;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft1;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft2;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft3;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft4;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft5;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft6;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft7;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft8;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft9;		/* Caller-saved temporary floating register */
	RV_FP_TYPE ft10;	/* Caller-saved temporary floating register */
	RV_FP_TYPE ft11;	/* Caller-saved temporary floating register */
	RV_FP_TYPE fa0;		/* function argument/return value */
	RV_FP_TYPE fa1;		/* function argument/return value */
	RV_FP_TYPE fa2;		/* function argument */
	RV_FP_TYPE fa3;		/* function argument */
	RV_FP_TYPE fa4;		/* function argument */
	RV_FP_TYPE fa5;		/* function argument */
	RV_FP_TYPE fa6;		/* function argument */
	RV_FP_TYPE fa7;		/* function argument */
#endif

#ifdef CONFIG_RISCV_SOC_CONTEXT_SAVE
	struct soc_esf soc_context;
#endif
} __aligned(16);
typedef struct __esf z_arch_esf_t;

Caller Registers The preservation of the

__z_arch_esf_t_xxx_OFFSET Namely z_arch_esf_t Members of xxx As opposed to z_arch_esf_t Offset of base address .DO_FP_CALLER_SAVED Save separately FPU context

arch/riscv/core/isr.S( The following code is also in this file )

#define DO_FP_CALLER_SAVED(op, reg) \
	op ft0, __z_arch_esf_t_ft0_OFFSET(reg)	 ;\
	op ft1, __z_arch_esf_t_ft1_OFFSET(reg)	 ;\
	op ft2, __z_arch_esf_t_ft2_OFFSET(reg)	 ;\
	op ft3, __z_arch_esf_t_ft3_OFFSET(reg)	 ;\
	op ft4, __z_arch_esf_t_ft4_OFFSET(reg)	 ;\
	op ft5, __z_arch_esf_t_ft5_OFFSET(reg)	 ;\
	op ft6, __z_arch_esf_t_ft6_OFFSET(reg)	 ;\
	op ft7, __z_arch_esf_t_ft7_OFFSET(reg)	 ;\
	op ft8, __z_arch_esf_t_ft8_OFFSET(reg)	 ;\
	op ft9, __z_arch_esf_t_ft9_OFFSET(reg)	 ;\
	op ft10, __z_arch_esf_t_ft10_OFFSET(reg) ;\
	op ft11, __z_arch_esf_t_ft11_OFFSET(reg) ;\
	op fa0, __z_arch_esf_t_fa0_OFFSET(reg)	 ;\
	op fa1, __z_arch_esf_t_fa1_OFFSET(reg)	 ;\
	op fa2, __z_arch_esf_t_fa2_OFFSET(reg)	 ;\
	op fa3, __z_arch_esf_t_fa3_OFFSET(reg)	 ;\
	op fa4, __z_arch_esf_t_fa4_OFFSET(reg)	 ;\
	op fa5, __z_arch_esf_t_fa5_OFFSET(reg)	 ;\
	op fa6, __z_arch_esf_t_fa6_OFFSET(reg)	 ;\
	op fa7, __z_arch_esf_t_fa7_OFFSET(reg)	 ;

#define DO_CALLER_SAVED_T0T1(op) \
	op t0, __z_arch_esf_t_t0_OFFSET(sp)		;\
	op t1, __z_arch_esf_t_t1_OFFSET(sp)

#define DO_CALLER_SAVED_REST(op) \
	op t2, __z_arch_esf_t_t2_OFFSET(sp)		;\
	op t3, __z_arch_esf_t_t3_OFFSET(sp)		;\
	op t4, __z_arch_esf_t_t4_OFFSET(sp)		;\
	op t5, __z_arch_esf_t_t5_OFFSET(sp)		;\
	op t6, __z_arch_esf_t_t6_OFFSET(sp)		;\
	op a0, __z_arch_esf_t_a0_OFFSET(sp)		;\
	op a1, __z_arch_esf_t_a1_OFFSET(sp)		;\
	op a2, __z_arch_esf_t_a2_OFFSET(sp)		;\
	op a3, __z_arch_esf_t_a3_OFFSET(sp)		;\
	op a4, __z_arch_esf_t_a4_OFFSET(sp)		;\
	op a5, __z_arch_esf_t_a5_OFFSET(sp)		;\
	op a6, __z_arch_esf_t_a6_OFFSET(sp)		;\
	op a7, __z_arch_esf_t_a7_OFFSET(sp)		;\
	op ra, __z_arch_esf_t_ra_OFFSET(sp)

__irq_wrapper Interrupt function

The interrupt processing includes exception/interrupt/fault, More complicated , We are only concerned with... For task switching system call part

/*
 * Handler called upon each exception/interrupt/fault
 * In this architecture, system call (ECALL) is used to perform context
 * switching or IRQ offloading (when enabled).
 */
SECTION_FUNC(exception.entry, __irq_wrapper)

1)z_arch_esf_t The save of does not distinguish between interrupt sources , but FPU Of caller stay mstatus.fs Nonzero ( Not closed ) Save next

	/* Save caller-saved registers on current thread stack. */
	addi sp, sp, -__z_arch_esf_t_SIZEOF
	DO_CALLER_SAVED_T0T1(sr)		;
3:	DO_CALLER_SAVED_REST(sr)		;

	/* Save s0 in the esf and load it with &_current_cpu. */
	sr s0, __z_arch_esf_t_s0_OFFSET(sp)
	GET_CURRENT_CPU(s0, t0)

#ifdef CONFIG_USERSPACE
	/*
	 * The scratch register now contains either the user mode stack
	 * pointer, or 0 if entered from kernel mode. Retrieve that value
	 * and zero the scratch register as we are in kernel mode now.
	 */
	csrrw t0, mscratch, zero
	bnez t0, 1f

	/* came from kernel mode: adjust stack value */
	add t0, sp, __z_arch_esf_t_SIZEOF
1:
	/* save stack value to be restored later */
	sr t0, __z_arch_esf_t_sp_OFFSET(sp)

#if !defined(CONFIG_SMP)
	/* Clear user mode variable */
	la t0, is_user_mode
	sw zero, 0(t0)
#endif
#endif

	/* Save MEPC register */
	csrr t0, mepc
	sr t0, __z_arch_esf_t_mepc_OFFSET(sp)

	/* Save MSTATUS register */
	csrr t4, mstatus
	sr t4, __z_arch_esf_t_mstatus_OFFSET(sp)

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	/* Assess whether floating-point registers need to be saved. */
	li t1, MSTATUS_FS_INIT
	and t0, t4, t1
	beqz t0, skip_store_fp_caller_saved
	DO_FP_CALLER_SAVED(fsr, sp)
skip_store_fp_caller_saved:
#endif /* CONFIG_FPU && CONFIG_FPU_SHARING */

2)z_riscv_switch be used for caller register Of save , restore

reschedule:

	/* Get pointer to current thread on this CPU */
	lr a1, ___cpu_t_current_OFFSET(s0)

	/*
	 * Get next thread to schedule with z_get_next_switch_handle().
	 * We pass it a NULL as we didn't save the whole thread context yet.
	 * If no scheduling is necessary then NULL will be returned.
	 */
	addi sp, sp, -16
	sr a1, 0(sp)
	mv a0, zero
	call z_get_next_switch_handle
	lr a1, 0(sp)
	addi sp, sp, 16
	beqz a0, no_reschedule

	/*
	 * Perform context switch:
	 * a0 = new thread
	 * a1 = old thread
	 */
	call z_riscv_switch

z_riscv_switch I said before , according to _thread_t Of user_options Is it possible K_FP_REGS, Decide whether to save FPU caller part

/* Convenience macros for loading/storing register states. */

#define DO_CALLEE_SAVED(op, reg) \
	op ra, _thread_offset_to_ra(reg)	;\
	op tp, _thread_offset_to_tp(reg)	;\
	op s0, _thread_offset_to_s0(reg)	;\
	op s1, _thread_offset_to_s1(reg)	;\
	op s2, _thread_offset_to_s2(reg)	;\
	op s3, _thread_offset_to_s3(reg)	;\
	op s4, _thread_offset_to_s4(reg)	;\
	op s5, _thread_offset_to_s5(reg)	;\
	op s6, _thread_offset_to_s6(reg)	;\
	op s7, _thread_offset_to_s7(reg)	;\
	op s8, _thread_offset_to_s8(reg)	;\
	op s9, _thread_offset_to_s9(reg)	;\
	op s10, _thread_offset_to_s10(reg)	;\
	op s11, _thread_offset_to_s11(reg)

#define DO_FP_CALLEE_SAVED(op, reg) \
	op fs0, _thread_offset_to_fs0(reg)	;\
	op fs1, _thread_offset_to_fs1(reg)	;\
	op fs2, _thread_offset_to_fs2(reg)	;\
	op fs3, _thread_offset_to_fs3(reg)	;\
	op fs4, _thread_offset_to_fs4(reg)	;\
	op fs5, _thread_offset_to_fs5(reg)	;\
	op fs6, _thread_offset_to_fs6(reg)	;\
	op fs7, _thread_offset_to_fs7(reg)	;\
	op fs8, _thread_offset_to_fs8(reg)	;\
	op fs9, _thread_offset_to_fs9(reg)	;\
	op fs10, _thread_offset_to_fs10(reg)	;\
	op fs11, _thread_offset_to_fs11(reg)

GTEXT(z_riscv_switch)
GTEXT(z_thread_mark_switched_in)
GTEXT(z_riscv_configure_stack_guard)

/* void z_riscv_switch(k_thread_t *switch_to, k_thread_t *switch_from) */
SECTION_FUNC(TEXT, z_riscv_switch)

	/* Save the old thread's callee-saved registers */
	DO_CALLEE_SAVED(sr, a1)

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	/* Assess whether floating-point registers need to be saved. */
	lb t0, _thread_offset_to_user_options(a1)
	andi t0, t0, K_FP_REGS
	beqz t0, skip_store_fp_callee_saved

	frcsr t0
	sw t0, _thread_offset_to_fcsr(a1)
	DO_FP_CALLEE_SAVED(fsr, a1)
skip_store_fp_callee_saved:
#endif /* CONFIG_FPU && CONFIG_FPU_SHARING */

	/* Save the old thread's stack pointer */
	sr sp, _thread_offset_to_sp(a1)

	/* Set thread->switch_handle = thread to mark completion */
	sr a1, ___thread_t_switch_handle_OFFSET(a1)

	/* Get the new thread's stack pointer */
	lr sp, _thread_offset_to_sp(a0)

#ifdef CONFIG_PMP_STACK_GUARD
	/* Preserve a0 across following call. s0 is not yet restored. */
	mv s0, a0
	call z_riscv_configure_stack_guard
	mv a0, s0
#endif

#ifdef CONFIG_USERSPACE
	lb t0, _thread_offset_to_user_options(a0)
	andi t0, t0, K_USER
	beqz t0, not_user_task
	mv s0, a0
	call z_riscv_configure_user_allowed_stack
	mv a0, s0
not_user_task:
#endif

#if CONFIG_INSTRUMENT_THREAD_SWITCHING
	mv s0, a0
	call z_thread_mark_switched_in
	mv a0, s0
#endif

	/* Restore the new thread's callee-saved registers */
	DO_CALLEE_SAVED(lr, a0)

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	/* Determine if we need to restore floating-point registers. */
	lb t0, _thread_offset_to_user_options(a0)
	li t1, MSTATUS_FS_INIT
	andi t0, t0, K_FP_REGS
	beqz t0, no_fp

	/* Enable floating point access */
	csrs mstatus, t1

	/* Restore FP regs */
	lw t1, _thread_offset_to_fcsr(a0)
	fscsr t1
	DO_FP_CALLEE_SAVED(flr, a0)
	j 1f

no_fp:
	/* Disable floating point access */
	csrc mstatus, t1
1:
#endif /* CONFIG_FPU && CONFIG_FPU_SHARING */

	ret

_thread_t Will include a struct _callee_saved Members of ,

struct _callee_saved {
	ulong_t sp;	/* Stack pointer, (x2 register) */
	ulong_t ra;	/* return address */
	ulong_t tp;	/* thread pointer */

	ulong_t s0;	/* saved register/frame pointer */
	ulong_t s1;	/* saved register */
	ulong_t s2;	/* saved register */
	ulong_t s3;	/* saved register */
	ulong_t s4;	/* saved register */
	ulong_t s5;	/* saved register */
	ulong_t s6;	/* saved register */
	ulong_t s7;	/* saved register */
	ulong_t s8;	/* saved register */
	ulong_t s9;	/* saved register */
	ulong_t s10;	/* saved register */
	ulong_t s11;	/* saved register */

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	uint32_t fcsr;		/* Control and status register */
	RV_FP_TYPE fs0;		/* saved floating-point register */
	RV_FP_TYPE fs1;		/* saved floating-point register */
	RV_FP_TYPE fs2;		/* saved floating-point register */
	RV_FP_TYPE fs3;		/* saved floating-point register */
	RV_FP_TYPE fs4;		/* saved floating-point register */
	RV_FP_TYPE fs5;		/* saved floating-point register */
	RV_FP_TYPE fs6;		/* saved floating-point register */
	RV_FP_TYPE fs7;		/* saved floating-point register */
	RV_FP_TYPE fs8;		/* saved floating-point register */
	RV_FP_TYPE fs9;		/* saved floating-point register */
	RV_FP_TYPE fs10;	/* saved floating-point register */
	RV_FP_TYPE fs11;	/* saved floating-point register */
#endif
};

3)z_arch_esf_t The recovery of

FPU In the case of opening , recovery caller FPU context,

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
	/*
	 * Determine if we need to restore FP regs based on the previous
	 * (before the csr above) mstatus value available in t5.
	 */
	li t1, MSTATUS_FS_INIT
	and t0, t5, t1
	beqz t0, no_fp

	/* make sure FP is enabled in the restored mstatus */
	csrs mstatus, t1
	DO_FP_CALLER_SAVED(flr, sp)
	j 1f

no_fp:	/* make sure this is reflected in the restored mstatus */
	csrc mstatus, t1
1:
#endif /* CONFIG_FPU && CONFIG_FPU_SHARING */

#ifdef CONFIG_USERSPACE
	/*
	 * Check if we are returning to user mode. If so then we must
	 * set is_user_mode to true and load the scratch register with
	 * the stack pointer to be used with the next exception to come.
	 */
	li t1, MSTATUS_MPP
	and t0, t4, t1
	bnez t0, 1f

#if !defined(CONFIG_SMP)
	/* Set user mode variable */
	li t0, 1
	la t1, is_user_mode
	sw t0, 0(t1)
#endif

	/* load scratch reg with stack pointer for next exception entry */
	add t0, sp, __z_arch_esf_t_SIZEOF
	csrw mscratch, t0
1:
#endif

	/* Restore s0 (it is no longer ours) */
	lr s0, __z_arch_esf_t_s0_OFFSET(sp)

	/* Restore caller-saved registers from thread stack */
	DO_CALLER_SAVED_T0T1(lr)
	DO_CALLER_SAVED_REST(lr)

#ifdef CONFIG_USERSPACE
	/* retrieve saved stack pointer */
	lr sp, __z_arch_esf_t_sp_OFFSET(sp)
#else
	/* remove esf from the stack */
	addi sp, sp, __z_arch_esf_t_SIZEOF
#endif

	/* Call SOC_ERET to exit ISR */
	SOC_ERET

To sum up

Zephyr OS The context is saved based on the structure (caller, callee They all contain FPU part ), In this way, we don't care about the priority of registers .Callee Some are even directly defined in _thread_t in , Static allocation , This is for clean This state , Provides a method of not saving . In a nutshell , Allocate in advance FPU registers Space , According to mstatus.fs Decide whether to use .

Of course Zephyr OS Or according to FPU To decide whether to handle FPU Context , There are no subdivisions involving specific states . But refer to its architecture , It is easy to optimize , such as :

A reasonable floating-point thread scheduling process :

  1. The floating point thread just entered ,fs = init, After floating-point register changes ,fs =dirty
  2. be in dirty The state thread is scheduled , After the save , Manually switch to clean state
  3. The next time you switch to this thread , When found clean state , from thread_t->callee_saved Load directly from , Then manually switch to clean state
  4. There is no floating-point register write operation in the thread execution , That is to say, keep at clean state , You don't need to save the update the next time you switch thread_t->callee_saved Of FPU part
  5. There are floating-point register write operations in thread execution , The state becomes dirty, Back again 2

Definition clean The benefits of status are mainly reflected in 4 On , No floating-point register change occurred , There is no need to save it again FPU register, Continue using memory on next load thread_t->callee_saved Saved . So floating point context save ,restore It is completely consistent with riscv-privileged Table 3.4 The requirements of

current mstaus.fsoffinitcleandirty
save contextNONONOYes
after save contextoffinitcleanclean
(switch to clean from dirty manually)
restore contextNOyesYes/
after save contextoffinit
(switch to init from dirty manually)
clean
(switch to clean from dirty manually)
/on

Any writing FPU register instruction will cause mstatus.fs = dirty, reading not

Restore init with fmv.w.x ftxx, zero

With a never-ending attitude , After a careful look riscv-privileged-20211203.pdf, Two other descriptions were found

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-IZVFydx6-1655860145514)(https://oss-club.rt-thread.org/uploads/20220620/b1f5af773d2aa98685c440516236a5fa.png.webp “image-20220620224559314.png”)]

It is emphasized here again ,

  1. FS To reduce FPU save ,restore Involved
  2. mstatus.fs Yes. setting, So let's change it manually fs The status is legal

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-IDWpag4o-1655860145516)(https://oss-club.rt-thread.org/uploads/20220620/ce692fb58c2424a1f149e7424df627d9.png.webp “image-20220620225135650.png”)]

This paragraph mentioned many times last context save, Is there a deja vu . Now we are basically sure that our conjecture is correct , It's been a long time , I dug the pit myself , Didn't read the document carefully , Didn't understand well .

Of course , I deduced and tested it myself , It really deepened the understanding , Can confidently determine the final plan .

原网站

版权声明
本文为[RT thread IOT operating system]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240742405586.html