当前位置:网站首页>What happened to the kernel after malloc() was transferred? Attached malloc () and free () implementation source
What happened to the kernel after malloc() was transferred? Attached malloc () and free () implementation source
2022-07-06 19:50:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack .
Hereby declare : In this paper , Quote another article and post , General understanding of combination malloc() Function implementation mechanism .
We are often in C Call in program malloc() Function dynamically allocates a contiguous memory space and uses them . that , What kind of response will these user space events cause in kernel space ?
malloc() It's a API, This function encapsulates the system call in the Library brk. So suppose you call malloc, Then first of all, it will cause brk System call running process .
brk() The corresponding system call service routine in the kernel is SYSCALL_DEFINE1(brk, unsigned long, brk). Parameters brk Used to specify heap Segment new end address . That is, another designation mm_struct The structure of the brk Field .
brk The system call service routine will first determine heap The starting address of the segment min_brk. Then check the resource constraints . next , Old and new heap The addresses are aligned according to the page size , The aligned addresses are stored in newbrk and okdbrk in .
brk() The system call itself can reduce the heap size . It can also expand the heap size . The function of reducing heap is through calling do_munmap() It's over . Suppose you want to expand the size of the pile . Then you must pass find_vma_intersection() Check whether the expanded heap coincides with an existing virtual memory , How to overlap, then exit directly . otherwise , call do_brk() Carry out all kinds of work to expand the heap next .
<span style="font-size:18px;">SYSCALL_DEFINE1(brk, unsigned long, brk)
{
unsigned long rlim, retval;
unsigned long newbrk, oldbrk;
struct mm_struct *mm = current->mm;
unsigned long min_brk;
down_write(&mm->mmap_sem);
#ifdef CONFIG_COMPAT_BRK
min_brk = mm->end_code;
#else
min_brk = mm->start_brk;
#endif
if (brk < min_brk)
goto out;
rlim = rlimit(RLIMIT_DATA);
if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
(mm->end_data - mm->start_data) > rlim)
newbrk = PAGE_ALIGN(brk);
oldbrk = PAGE_ALIGN(mm->brk);
if (oldbrk == newbrk)
goto set_brk;
if (brk brk) {
if (!do_munmap(mm, newbrk, oldbrk-newbrk))
goto set_brk;
goto out;
}
if (find_vma_intersection(mm, oldbrk, newbrk+PAGE_SIZE))
goto out;
if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)
goto out;
set_brk:
mm->brk = brk;
out:
retval = mm->brk;
up_write(&mm->mmap_sem);
return retval;
}</span>
brk The system call service routine will finally return the new end address of the heap .
User process call malloc() Will make the kernel call brk System call service routine . because malloc Always allocate memory space dynamically , Therefore, the service routine will now enter the second running path , That is, expand a lot .do_brk() Mainly finish the following work :
1. adopt get_unmapped_area() Find a match in the address space of the current process len Linear interval of size . And the linear interval must be in addr After the address . Suppose we find this linear interval of leisure , Then the starting address of the interval is returned , Otherwise, the error code is returned -ENOMEM;
2. adopt find_vma_prepare() Traverse each one in turn in the red black tree composed of all linear regions of the current process vma. To determine the position of the linear area object before the new area found in the previous step . hypothesis addr Located in an existing vma in , Call do_munmap() Delete this linear area . Assuming the deletion is successful, continue to search , Otherwise, the error code is returned .
3. At present, we have found a suitable size of leisure linear area , Next through vma_merge() Try to merge the current linear area with the adjacent linear area . Suppose the merge succeeds . Then the function returns prev This linear region vm_area_struct Structure pointer . At the same time do_brk(). otherwise , Continue to allocate new linear regions .
4. Next through kmem_cache_zalloc() In particular slab Fast cache vm_area_cachep For this linear region vm_area_struct Descriptive narrator of structure .
5. initialization vma Each field in the structure .
6. to update mm_struct The structure of the vm_total Field , It is used for the current process of the same level vma Number .
7. Assuming the current vma Set up VM_LOCKED Field . Then through the mlock_vma_pages_range() Now assign a physical page box to this linear area .
otherwise ,do_brk() end .
Be able to see ,do_brk() It is mainly to allocate a new linear region for the current process . In no setting VM_LOCKED In the case of signs , It will not immediately assign physical page boxes to the linear area . But through vma Always delay the allocation of physical memory , Until page missing exception occurs .
After the above process ,malloc() Linear address returned , Suppose the user process accesses this linear address at this time , Then a page missing exception will occur (Page Fault). The whole process of handling page missing exceptions is very complicated , We are only concerned with malloc() The relevant running path .
When CPU When an exception is generated , It will jump to the whole process of exception handling . For page missing exceptions ,CPU Will jump to page_fault In the exception handler .
The exception handler will call do_page_fault() function , This function reads CR2 Register gets the linear address that causes the missing page . Infer through various conditions in order to determine a suitable scheme to deal with this exception .
do_page_fault() function :
This function detects the current abnormal situation through various conditions , But at least do_page_fault() It will distinguish the two situations that cause missing pages : An exception is thrown by a programming error , And caused by linear addresses in the process address space that have not allocated physical memory .
In the latter case , It is usually divided into page faults caused by user space and kernel space .
The exception thrown by the kernel is caused by vmalloc() Produced , It is only used for kernel space memory allocation .
obviously , What we need to pay attention to here is the exception caused by user space . This part of the work starts from do_page_fault() Medium good_area Start running at the label , Mainly through handle_mm_fault() complete .
<span style="font-size:18px;">dotraplinkage void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code){…… ……good_area: write = error_code & PF_WRITE; if (unlikely(access_error(error_code, write, vma))) { bad_area_access_error(regs, error_code, address); return; } fault = handle_mm_fault(mm, vma, address, write ? FAULT_FLAG_WRITE : 0);}</span>
handle_mm_fault() function :
The main function of this function is to allocate a physical page box to the process causing the page defect , It first determines whether the page folder items at all levels corresponding to the linear address that caused the missing page exist , How to allocate if it doesn't exist . How to allocate this page box in detail is through the call handle_pte_fault() It's over .
<span style="font-size:18px;">int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
…… ……
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
if (!pud)
return VM_FAULT_OOM;
pmd = pmd_alloc(mm, pud, address);
if (!pmd)
return VM_FAULT_OOM;
pte = pte_alloc_map(mm, pmd, address);
if (!pte)
return VM_FAULT_OOM;
return handle_pte_fault(mm, vma, address, pte, pmd, flags);
}</span>
handle_pte_fault() function :
This function is based on page table entries pte Whether the described physical page box is in physical memory , There are two main categories :
Request page change : The accessed page frame is no longer in main memory , Then a page box must be allocated .
When writing copy : The visited page exists , But this page is only read . The kernel needs to write to this page , At this time, the kernel copies the existing data in the read only page to a new page box .
User process access is provided by malloc() The allocated memory space belongs to the first case . For request paging .handle_pte_fault() It is still subdivided into three cases :
1. Assume that the page table entry is indeed empty (pte_none(entry)), Then the page box must be allocated .
Suppose the current process implements vma Operate on fault Hook function , Then this situation belongs to file based memory mapping . It calls do_linear_fault() Allocate physical page boxes .
otherwise . The kernel will call functions that allocate physical page boxes for anonymous mappings do_anonymous_page().
2. Suppose that the table entry of this page is detected as a nonlinear mapping (pte_file(entry)), Call do_nonlinear_fault() Assign physical pages .
3. Suppose that the page box is allocated in advance , But now it has been swapped out from main memory to external memory . Call do_swap_page() Finish page box allocation .
from malloc The allocated memory will call do_anonymous_page() Allocate physical page boxes .
<span style="font-size:18px;">static inline int handle_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pte_t *pte, pmd_t *pmd, unsigned int flags)
{
…… ……
if (!pte_present(entry)) {
if (pte_none(entry)) {
if (vma->vm_ops) {
if (likely(vma->vm_ops->fault))
return do_linear_fault(mm, vma, address,
pte, pmd, flags, entry);
}
return do_anonymous_page(mm, vma, address,
pte, pmd, flags);
}
if (pte_file(entry))
return do_nonlinear_fault(mm, vma, address,
pte, pmd, flags, entry);
return do_swap_page(mm, vma, address,
pte, pmd, flags, entry);
}
…… ……
}</span>
do_anonymous_page() function :
here , The missing page exception handler will eventually allocate a physical page frame to the current process . It passes through alloc_zeroed_user_highpage_movable() To finish the process .
Let's peel off the function layer by layer , I found that it finally called alloc_pages().
<span style="font-size:18px;">static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *page_table, pmd_t *pmd,
unsigned int flags)
{
…… ……
if (unlikely(anon_vma_prepare(vma)))
goto oom;
page = alloc_zeroed_user_highpage_movable(vma, address);
if (!page)
goto oom;
…… ……
}</span>
After such a complex process , The linear address accessed by the user process finally corresponds to a piece of physical memory . Attached below is what I think is relatively intact malloc() and free() Function source code :
<span style="font-size:18px;">#include <unistd.h>
#include <stdlib.h>
// Block head
union header
{
struct{
union header *next;// Pointer to the next free time
unsigned int size;// The size of the free block
}s;
long x;// alignment
};
typedef union header Header;
#define NALLOC 1024;// Minimum number of units requested , The size of each page is 1KB
static Header* moreSys(unsigned int num);// Request a piece of memory from the system
void* userMalloc(unsigned int nbytes);// Request memory from the user management area
void userFree(void *ap);// Free memory , Put it into the user management area
static Header base;// Define the leisure chain header
static Header *free_list = NULL;// The starting query pointer of the leisure linked list
void* userMalloc(unsigned int nbytes)
{
Header *p;
Header *prev;
unsigned int unitNum;
// Number of bytes to be requested nbytes convert to unitNum First unit of block , Calculate one more as the first management block
unitNum = (nbytes + sizeof(Header) - 1)/sizeof(Header) + 1;
if ((prev = free_list) == NULL)// Suppose there is no free linked list , Define the leisure linked list
{
base.s.next = free_list = prev = &base;
base.s.size = 1;
}
for (p = prev->s.next; ; p = p->s.next, prev = p)
{
if (p->s.size >= unitNum)// The leisure block is big enough
{
if (p->s.size <= (unitNum + 1))
{
prev->s.next = p->s.next;
}
else// Larger , Cut out the necessary piece
{
p->s.size = unitNum;
p += p->s.size;
p->s.size = unitNum;
}
free_list = prev;
return (void *)(p+1);
}
if (p == free_list)
{
if ((p = moreSys(unitNum)) == NULL)// No suitable block . Apply to the system
{
return NULL;
}
}
}
}
static Header* moreSys(unsigned int num)
{
char *cp;
Header *up;
if(num < NALLOC)
num = NALLOC;// The minimum quantity applied to the system
cp = sbrk(num * sizeof(Header));
if (cp == (char *)-1)
{
return NULL;// No free pages . Return empty address
}
up = (Header *)cp;
up->s.size = num;
userFree(up + 1);
return free_list;
}</span>
<span style="font-size:18px;">// Reclaim memory to the leisure chain
void Free(void *ap)
{
Header *bp, *p;
bp = (Header *)ap - 1; // Point to the beginning of the block
for(p = free_list; !(bp>p && bp<p->s.next); p = p->s.next) // Locate the leisure block by address in the linked list
// Position in
if(p>=p->s.next && (bp>p || bp<p->s.next))
break; // The leisure block is at both ends
if(bp + bp->s.size == p->s.next) { // See whether the free block is adjacent to the existing block , Merge adjacent
bp->s.size += p->s.next->s.size;
bp->s.next = p->s.next->s.next;
}
else
bp->s.next = p->s.next;
if(p + p->s.size == bp) {
p->s.size += bp->s.size;
p->s.next = bp->s.next;
}
else
p->s.next = bp;
free_list = p;
}</span>
Copyright notice : This article is the original article of the blogger , Blog , Without consent , Shall not be reproduced .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/117132.html Link to the original text :https://javaforall.cn
边栏推荐
- How to customize animation avatars? These six free online cartoon avatar generators are exciting at a glance!
- Lick the dog until the last one has nothing (simple DP)
- Test Li hi
- 如何自定义动漫头像?这6个免费精品在线卡通头像生成器,看一眼就怦然心动!
- From spark csc. csr_ Matrix generate adjacency matrix
- Analysis of rainwater connection
- CF960G - Bandit Blues(第一类斯特林数+OGF)
- LeetCode_ Gray code_ Medium_ 89. Gray code
- 新一代垃圾回收器—ZGC
- LeetCode_格雷编码_中等_89.格雷编码
猜你喜欢
Analysis of rainwater connection
VMware virtual machine cannot open the kernel device "\.\global\vmx86"
Spark foundation -scala
Transformer model (pytorch code explanation)
Leetcode 30. Concatenate substrings of all words
利用 clip-path 绘制不规则的图形
语音识别(ASR)论文优选:全球最大的中英混合开源数据TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech
蓝桥杯 微生物增殖 C语言
社招面试心得,2022最新Android高频精选面试题分享
深度剖析原理,看完这一篇就够了
随机推荐
语音识别(ASR)论文优选:全球最大的中英混合开源数据TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech
Low CPU load and high loadavg processing method
Lick the dog until the last one has nothing (simple DP)
ZABBIX proxy server and ZABBIX SNMP monitoring
激进技术派 vs 项目保守派的微服务架构之争
范式的数据库具体解释
350. 两个数组的交集 II
精彩编码 【进制转换】
AsyncHandler
Example of applying fonts to flutter
121. The best time to buy and sell stocks
Recursive implementation of department tree
How to access localhost:8000 by mobile phone
数据的同步为每个站点创建触发器同步表
LeetCode_格雷编码_中等_89.格雷编码
学习探索-函数防抖
121. 买卖股票的最佳时机
Analysis of rainwater connection
Vscode debug run fluent message: there is no extension for debugging yaml. Should we find yaml extensions in the market?
【翻译】供应链安全项目in-toto移至CNCF孵化器