当前位置:网站首页>Talk about SOC startup (11) kernel initialization
Talk about SOC startup (11) kernel initialization
2022-07-07 11:26:00 【lgjjeff】
This paper is based on the following software and hardware assumptions :
framework :AARCH64
Kernel version :5.14.0-rc5
In the last article, we discussed some background knowledge about exception level and page table creation during kernel initialization , In addition, it is natural to start a three piece set : The parameter save 、cpu System register initialization and c Language runtime environment preparation . because kaslr Mainly used to enhance kernel security , It is not necessary in the start-up process , Therefore, in this article, we will not introduce the process related to this feature . in summary , The overall process of kernel initialization is shown in the following figure :
1 Kernel entry function
armv8 The entry function of the architecture kernel is located in arch/arm64/kernel/head.S, It is the starting point of kernel startup , Its definition is as follows :
__HEAD
efi_signature_nop (1)
b primary_entry (2)
(1)EFI First of all by intel To support PC Start a set of standard firmware interfaces , It is mainly used in servers . In order to seize the server market ,armv8 The architecture adds a pair of EFI Support for , General embedded systems do not need to support this interface
(2) This function is the main process of kernel initialization , The function is defined as follows :
SYM_CODE_START(primary_entry)
bl preserve_boot_args (1)
bl init_kernel_el (2)
adrp x23, __PHYS_OFFSET
and x23, x23, MIN_KIMG_ALIGN - 1 (3)
bl set_cpu_boot_mode_flag (4)
bl __create_page_tables (5)
bl __cpu_setup (6)
b __primary_switch (7)
SYM_CODE_END(primary_entry)
The main process of code execution is as follows , It includes the following parts :
(1) preservation uboot Incoming startup parameters
(2) Exception level initialization
(3) preservation __PHYS_OFFSET The physical address of
(4) Set startup mode
(5) Create initialization page table
(6) Initialize the processor to open MMU To prepare for
(7) Can make MMU, Set the exception vector table , Stack ,BSS Duan et al , And then jump to C Language functions
1.1 Start parameter saving
armv8 All configuration information of the architecture is located in dtb in , therefore bootloader Only need to dtb The address information can be passed to the kernel . This parameter needs to be saved after the kernel starts , For the later modules , The following is its code implementation :
SYM_CODE_START_LOCAL(preserve_boot_args)
mov x21, x0 (1)
adr_l x0, boot_args (2)
stp x21, x1, [x0]
stp x2, x3, [x0, #16] (3)
dmb sy
add x1, x0, #0x20
b dcache_inval_poc (4)
SYM_CODE_END(preserve_boot_args)
(1) take dtb Save address to callee register x21 in , To free up x0 register
(2) Get the variable address where the startup parameters are saved
(3) although armv8 Now only one register is used to pass the start parameter , But the kernel still supports passing up to four parameters . This instruction is used to save parameters to global variables boot_args in
(4) invalid boot_args Corresponding to memory cache. And that's because right now cache Not enabled yet , if bootloader stay cache There is residual data in , Then when cache After opening cpu Will use cache Residual data in , This leads to data inconsistency
1.2 Exception level initialization
The initialization function of exception level is init_kernel_el, The background amount process has been introduced in detail in the previous article , I won't repeat it here
1.3 Set startup mode
It is used to keep cpu Abnormal level at startup , stay smp Except for primary cpu outside , There are still several secondary cpu, And these cpu Should start with the same exception level . In order to judge their startup EL Are they the same? , The kernel uses an array as shown below to record the system cpu Start mode of , Its definition is as follows :
SYM_DATA_START(__boot_cpu_mode)
.long BOOT_CPU_MODE_EL2
.long BOOT_CPU_MODE_EL1
SYM_DATA_END(__boot_cpu_mode)
When __boot_cpu_mode[0] Is initialized as el2,__boot_cpu_mode[1] For the initial el1,cpu When starting, it will start according to the system EL Modify the value of the corresponding element of the array . The rules are as follows :
(1) start-up EL by EL1
take __boot_cpu_mode[0] It is amended as follows el1
(2) start-up EL by EL2
take __boot_cpu_mode[1] It is amended as follows el2
It actually implements a simple state machine , Only all cpu Start of EL Phase at the same time ,__boot_cpu_mode[0] and __boot_cpu_mode[1] Is the same , Otherwise their values will be different . Let's briefly simulate the operation of the state machine :
(1) Suppose there are four cpu Are subject to EL1 start-up , Then the state transition table is as follows :
Initial state | cpu0 boot | cpu1 boot | cpu1 boot | cpu1 boot | |
---|---|---|---|---|---|
__boot_cpu_mode[0] | el2 | el1 | el1 | el1 | el1 |
__boot_cpu_mode[1] | el1 | el1 | el1 | el1 | el1 |
(2) Suppose there are four cpu Are subject to EL2 start-up , Then the state transition table is as follows :
Initial state | cpu0 boot | cpu1 boot | cpu1 boot | cpu1 boot | |
---|---|---|---|---|---|
__boot_cpu_mode[0] | el2 | el2 | el2 | el2 | el2 |
__boot_cpu_mode[1] | el1 | el2 | el2 | el2 | el2 |
(3) Suppose there are four cpu Are subject to EL1 – EL2 – EL2 – EL2 start-up , Then the state transition table is as follows :
Initial state | cpu0 boot | cpu1 boot | cpu1 boot | cpu1 boot | |
---|---|---|---|---|---|
__boot_cpu_mode[0] | el1 | el1 | el1 | el1 | el1 |
__boot_cpu_mode[1] | el1 | el2 | el2 | el2 | el2 |
The following sets the function for this array element :
SYM_FUNC_START_LOCAL(set_cpu_boot_mode_flag)
adr_l x1, __boot_cpu_mode
cmp w0, #BOOT_CPU_MODE_EL2
b.ne 1f
add x1, x1, #4
1: str w0, [x1]
dmb sy
dc ivac, x1 line
ret
SYM_FUNC_END(set_cpu_boot_mode_flag)
2 Create page table
The last one has analyzed , In the open MMU The former kernel needs to use a linear mapping method of idmap Segment address creation identity map A page table , And create for the entire kernel image init_pg_dir A page table . The page table creation process is relatively simple , In fact, it is based on the value of the virtual address , Find the corresponding... In the page tables at all levels entry, Then point it to the corresponding next level page table .
The creation process of the kernel image initialization page table is as follows :
SYM_FUNC_START_LOCAL(__create_page_tables)
mov x28, lr
adrp x0, init_pg_dir
adrp x1, init_pg_end
bl dcache_inval_poc (1)
adrp x0, init_pg_dir
adrp x1, init_pg_end
sub x1, x1, x0
1: stp xzr, xzr, [x0], #16
stp xzr, xzr, [x0], #16
stp xzr, xzr, [x0], #16
stp xzr, xzr, [x0], #16
subs x1, x1, #64
b.ne 1b (2)
mov x7, SWAPPER_MM_MMUFLAGS (3)
adrp x0, idmap_pg_dir
adrp x3, __idmap_text_start
#ifdef CONFIG_ARM64_VA_BITS_52
mrs_s x6, SYS_ID_AA64MMFR2_EL1
and x6, x6, #(0xf << ID_AA64MMFR2_LVA_SHIFT)
mov x5, #52
cbnz x6, 1f (4)
#endif
mov x5, #VA_BITS_MIN
1:
adr_l x6, vabits_actual
str x5, [x6] (5)
dmb sy
dc ivac, x6
adrp x5, __idmap_text_end (6)
clz x5, x5
cmp x5, TCR_T0SZ(VA_BITS_MIN)
b.ge 1f (7)
adr_l x6, idmap_t0sz
str x5, [x6] (8)
dmb sy
dc ivac, x6
#if (VA_BITS < 48) (9)
#define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
#define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
#if VA_BITS != EXTRA_SHIFT
#error "Mismatch between VA_BITS and page size/number of translation levels"
#endif
mov x4, EXTRA_PTRS
create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
#else
mov x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
str_l x4, idmap_ptrs_per_pgd, x5
#endif
1:
ldr_l x4, idmap_ptrs_per_pgd
adr_l x6, __idmap_text_end (10)
map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14 (11)
adrp x0, init_pg_dir
mov_q x5, KIMAGE_VADDR
add x5, x5, x23
mov x4, PTRS_PER_PGD
adrp x6, _end
adrp x3, _text
sub x6, x6, x3
add x6, x6, x5 (12)
map_memory x0, x1, x5, x6, x7, x3, x4, x10, x11, x12, x13, x14 (13)
dmb sy
adrp x0, idmap_pg_dir
adrp x1, idmap_pg_end
bl dcache_inval_poc
adrp x0, init_pg_dir
adrp x1, init_pg_end
bl dcache_inval_poc (14)
ret x28
SYM_FUNC_END(__create_page_tables)
(1 - 2) For failure init A page table pgd Of cache And the data
(12 - 13) Used to create init A page table , The process is simple , It is to map the kernel image to the virtual address specified by its link script .
(14) For failure idmap Page table and init A page table pgd Of cache
(3 -11) Used to create idmap A page table
idmap Is a segment in the kernel image , Its definition lies in arch/arm64/kernel/vmlinux.lds.S in :
#define IDMAP_TEXT \ . = ALIGN(SZ_4K); \ VMLINUX_SYMBOL(__idmap_text_start) = .; \ *(.idmap.text) \ VMLINUX_SYMBOL(__idmap_text_end) = .;
That is to say, it defines a concept with __idmap_text_start Start ,__idmap_text_end The ending paragraph , It will be placed in vmlinux In the code snippet of . When the kernel is initialized, it is mapped to as part of the kernel image init Page table , In addition, through idmap The page table is mapped separately again . The following is the mapping diagram :
We can see from the picture idmap The mapped virtual address is equal to the physical address , because armv8 The upper physical address is less than or equal to 48 Bit , Therefore, its virtual address is located in 0x0000 0000 0000 0000 To 0xffff 0000 0000 0000 Between , Its pgd The base address will be put into ttbr0_el1 In the register . meanwhile , We see it as kernel image Part of will also be mapped to 0xffff 0000 0000 0000 and 0xffff ffff ffff ffff Between ( This is a armv8 The range of kernel addresses in ), This mapping relationship is defined by init Page table mapping implementation , Accordingly pgd The base address of will be put in ttbr1_el1 In the register .
The reason why we need idmap When mapping, because modern processors have pipelines , Branch prediction and other functions , stay MMU On , open MMU Instruction execution , Subsequent instructions may have been fetched , And its address is still a physical address . and MMU After enabling , In fact, the system has been running in virtual address mode , If no corresponding measures are taken , At this point, these instructions that have been fetched will execute incorrectly . Therefore, the kernel adopts idmap The mapped physical address and virtual address are set equal , Thus avoiding the above problems .
because idmap The mapping rule of is that virtual address equals physical address , So when the virtual address is less than 48 position , The physical address is 48 When a , if idmap The corresponding physical address is in a higher address space ( Its value may be greater than 2^n, among n Is the number of virtual address digits ), Then according to the mapping rule , This will cause the mapping to fail .
Therefore, the virtual address of the kernel is less than 48 Bit time has some special processing , In the beginning 3 Add another layer to the level page table extra mapping , Its function is to expand the virtual address range to 48 position
3 initialization cpu state
Although it has been set in the exception initialization process sctlr_el1 Wait for the system control register , But it's on mmu Other preparations are needed before
mmu After startup, the kernel will officially switch to virtual memory mode , In order to improve the efficiency of page table access , It needs to be increased by one level tlb cache . In order to prevent tlb There is bootloader Dirty data left , Need to start mmu Invalidate the contents before .
armv8 Memory can be divided into device memory and normal memory, They can have different properties , Such as device memory Different... Can be configured nGnRnE attribute , To determine its behavior when accessing memory .normal memory There can be different ones cache Strategy , Such as cache Write back to 、cache Write through or non cache etc. . register MAIR_EL1 Used to set the memory property table , It is divided into eight groups of attributes in groups of eight , Its definition is shown in the figure below :
We can preset several sets of attribute tables to this register , After it is set successfully , Then you can use one in the page table 3 position [0:2] To get one of these attributes , This value can be stored in the last level page table PTE In the flag bit of , And can be loaded into TLB in
SYM_FUNC_START(__cpu_setup)
tlbi vmalle1 (1)
dsb nsh
mov x1, #3 << 20
msr cpacr_el1, x1 (2)
mov x1, #1 << 12
msr mdscr_el1, x1 (3)
isb
enable_dbg (4)
reset_pmuserenr_el0 x1 (5)
reset_amuserenr_el0 x1 (6)
mair .req x17
tcr .req x16
mov_q mair, MAIR_EL1_SET
mov_q tcr, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
TCR_TBI0 | TCR_A1 | TCR_KASAN_SW_FLAGS
…
tcr_clear_errata_bits tcr, x9, x5
#ifdef CONFIG_ARM64_VA_BITS_52
ldr_l x9, vabits_actual
sub x9, xzr, x9
add x9, x9, #64
tcr_set_t1sz tcr, x9
#else
ldr_l x9, idmap_t0sz
#endif
tcr_set_t0sz tcr, x9
tcr_compute_pa_size tcr, #TCR_IPS_SHIFT, x5, x6
…
msr mair_el1, mair (7)
msr tcr_el1, tcr (8)
mov_q x0, INIT_SCTLR_EL1_MMU_ON
ret
.unreq mair
.unreq tcr
SYM_FUNC_END(__cpu_setup)
(1) invalid tlb
(2) take EL0 and EL1 Next pair SIMD and FP The access of registers is trapped EL1
(3) take EL0 Yes debug communication channel Access to register fell into EL1
(4) eliminate PSTATE Of D Bits to enable debug
(5) close EL0 For performance measurement unit PMU The interview of
(6) close EL0 For behavioral measurement units AMU The interview of
(7) Set up mair_el1 Memory attribute table
For example, we set six groups of attributes , Among them, the attribute values are pressed 2 Hexadecimal digits are represented as follows :
0b0000 0000
0b0000 0100
0b0000 1100
0b0100 0100
0b1111 1111
0b1011 1011
According to the following attribute definitions, we can know ,0b0000 0000 The format is 0b0000 dd00, And dd The value of is 0b00, So it means nGnRnE Properties of device memory ,0b0000 0100 The format is 0b0000 dd00, And dd The value of is 0b01, therefore nGnRE Properties of device memory, etc , The rest of the attributes are analogized
(8) Set up el1 Control register of tcr_el1, The register is defined as follows . It's mainly used to set up ttbr0,ttbr1 Related properties , Physical address range, etc . The code above the instruction mainly implements a series of calculations , To configure the value of this register . We simply post the definitions of the two fields in the following figure , For more relevant definitions, please refer to armv8 Technical manual .
4 C Runtime environment initialization
__primary_switch It is mainly used to set c Runtime environment , If enabled MMU, Set the exception vector table , Stack ,BSS Duan et al , Finally jump to C Language functions start_kernel
The function is as follows , Let's skip those less relevant to the main process kaslr Code related to kernel redirection , Then look at this code, it's very simple .
SYM_FUNC_START_LOCAL(__primary_switch)
…
adrp x1, init_pg_dir
bl __enable_mmu (1)
…
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
br x8 (2)
SYM_FUNC_END(__primary_switch)
(1) Can make mmu
(2) Jump to __primary_switched function
SYM_FUNC_START_LOCAL(__primary_switched)
adr_l x4, init_task
init_cpu_task x4, x5, x6 (1)
adr_l x8, vectors
msr vbar_el1, x8 (2)
isb
stp x29, x30, [sp, #-16]!
mov x29, sp
str_l x21, __fdt_pointer, x5 (3)
ldr_l x4, kimage_vaddr
sub x4, x4, x0
str_l x4, kimage_voffset, x5 (4)
adr_l x0, __bss_start
mov x1, xzr
adr_l x2, __bss_stop
sub x2, x2, x0
bl __pi_memset (5)
…
mov x0, x21
bl early_fdt_map (6)
…
bl switch_to_vhe (7)
ldp x29, x30, [sp], #16
bl start_kernel (8)
ASM_BUG()
SYM_FUNC_END(__primary_switched)
(1) This function is used to set swapper Initialization stack frame of the process
(2) Set the exception vector table
(3) take fdt The address is saved to the global variable __fdt_pointer in
(4) Calculate the difference between the virtual address and the physical address of the kernel , And save it to a global variable kimage_voffset in
(5) Empty bss The content of the paragraph
(6) It initializes first fixmap, And then through fixmap by fdt Create a page table
(7) After returning to the abnormal level initialization process , In this process, we will pass hcr_el2.e2h Judge whether it will enter vhe Pattern , And this sign is through HCR_HOST_NVHE_FLAGS The initialization of the . Therefore, if this flag is not set e2h position , Even if the system supports vhe It will not actually enter this mode . because vhe The advantages of the model , So the kernel here will give another chance to enter this mode
(8) Finally we see the familiar start_kernel, You can leave the compilation of headache
5 Postscript
Get into start kernel after , The kernel will continue to initialize subsystems and modules , And finally start init The process enters user space . The core module of the kernel includes memory management subsystem 、 Interrupt subsystem 、 Time subsystem . Process subsystem, etc . Because each subsystem is relatively independent , Therefore, the following contents will be discussed in each subsystem
边栏推荐
猜你喜欢
普通测试年薪15w,测试开发年薪30w+,二者差距在哪?
Use metersphere to keep your testing work efficient
Apprentissage comparatif non supervisé des caractéristiques visuelles par les assignations de groupes de contrôle
Vscode 尝试在目标目录创建文件时发生一个错误:拒绝访问【已解决】
The database synchronization tool dbsync adds support for mongodb and es
LeetCode - 面试题17.24 最大子矩阵
Design intelligent weighing system based on Huawei cloud IOT (STM32)
聊聊SOC启动(十) 内核启动先导知识
【C#】WinForm运行缩放(变糊)的解决方法
Verilog realizes nixie tube display driver [with source code]
随机推荐
关于在云服务器上(这里用腾讯云)安装mysql8.0并使本地可以远程连接的方法
About the application of writing shell script JSON in JMeter
STM32 entry development uses IIC hardware timing to read and write AT24C08 (EEPROM)
Still cannot find RPC dispatcher table failed to connect in virtual KD
Hash / (understanding, implementation and application)
uniapp 在onLaunch中跳转页面后,点击事件失效解决方法
Web端自动化测试失败的原因
對比學習之 Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
RationalDMIS2022阵列工件测量
STM32入门开发 采用IIC硬件时序读写AT24C08(EEPROM)
MIF file format record
uniCloud
聊聊SOC启动(十) 内核启动先导知识
Two week selection of tdengine community issues | phase II
[C #] the solution of WinForm operation zoom (blur)
electron添加SQLite数据库
The database synchronization tool dbsync adds support for mongodb and es
Force buckle 1002 Find common characters
When initializing 'float', what is the difference between converting to 'float' and adding 'f' as a suffix?
数据库同步工具 DBSync 新增对MongoDB、ES的支持