当前位置:网站首页>What happened to the kernel after malloc() was transferred? Attached malloc () and free () implementation source
What happened to the kernel after malloc() was transferred? Attached malloc () and free () implementation source
2022-07-06 19:50:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack .
Hereby declare : In this paper , Quote another article and post , General understanding of combination malloc() Function implementation mechanism .
We are often in C Call in program malloc() Function dynamically allocates a contiguous memory space and uses them . that , What kind of response will these user space events cause in kernel space ?
malloc() It's a API, This function encapsulates the system call in the Library brk. So suppose you call malloc, Then first of all, it will cause brk System call running process .
brk() The corresponding system call service routine in the kernel is SYSCALL_DEFINE1(brk, unsigned long, brk). Parameters brk Used to specify heap Segment new end address . That is, another designation mm_struct The structure of the brk Field .
brk The system call service routine will first determine heap The starting address of the segment min_brk. Then check the resource constraints . next , Old and new heap The addresses are aligned according to the page size , The aligned addresses are stored in newbrk and okdbrk in .
brk() The system call itself can reduce the heap size . It can also expand the heap size . The function of reducing heap is through calling do_munmap() It's over . Suppose you want to expand the size of the pile . Then you must pass find_vma_intersection() Check whether the expanded heap coincides with an existing virtual memory , How to overlap, then exit directly . otherwise , call do_brk() Carry out all kinds of work to expand the heap next .
<span style="font-size:18px;">SYSCALL_DEFINE1(brk, unsigned long, brk)
{
unsigned long rlim, retval;
unsigned long newbrk, oldbrk;
struct mm_struct *mm = current->mm;
unsigned long min_brk;
down_write(&mm->mmap_sem);
#ifdef CONFIG_COMPAT_BRK
min_brk = mm->end_code;
#else
min_brk = mm->start_brk;
#endif
if (brk < min_brk)
goto out;
rlim = rlimit(RLIMIT_DATA);
if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
(mm->end_data - mm->start_data) > rlim)
newbrk = PAGE_ALIGN(brk);
oldbrk = PAGE_ALIGN(mm->brk);
if (oldbrk == newbrk)
goto set_brk;
if (brk brk) {
if (!do_munmap(mm, newbrk, oldbrk-newbrk))
goto set_brk;
goto out;
}
if (find_vma_intersection(mm, oldbrk, newbrk+PAGE_SIZE))
goto out;
if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)
goto out;
set_brk:
mm->brk = brk;
out:
retval = mm->brk;
up_write(&mm->mmap_sem);
return retval;
}</span>brk The system call service routine will finally return the new end address of the heap .
User process call malloc() Will make the kernel call brk System call service routine . because malloc Always allocate memory space dynamically , Therefore, the service routine will now enter the second running path , That is, expand a lot .do_brk() Mainly finish the following work :
1. adopt get_unmapped_area() Find a match in the address space of the current process len Linear interval of size . And the linear interval must be in addr After the address . Suppose we find this linear interval of leisure , Then the starting address of the interval is returned , Otherwise, the error code is returned -ENOMEM;
2. adopt find_vma_prepare() Traverse each one in turn in the red black tree composed of all linear regions of the current process vma. To determine the position of the linear area object before the new area found in the previous step . hypothesis addr Located in an existing vma in , Call do_munmap() Delete this linear area . Assuming the deletion is successful, continue to search , Otherwise, the error code is returned .
3. At present, we have found a suitable size of leisure linear area , Next through vma_merge() Try to merge the current linear area with the adjacent linear area . Suppose the merge succeeds . Then the function returns prev This linear region vm_area_struct Structure pointer . At the same time do_brk(). otherwise , Continue to allocate new linear regions .
4. Next through kmem_cache_zalloc() In particular slab Fast cache vm_area_cachep For this linear region vm_area_struct Descriptive narrator of structure .
5. initialization vma Each field in the structure .
6. to update mm_struct The structure of the vm_total Field , It is used for the current process of the same level vma Number .
7. Assuming the current vma Set up VM_LOCKED Field . Then through the mlock_vma_pages_range() Now assign a physical page box to this linear area .
otherwise ,do_brk() end .
Be able to see ,do_brk() It is mainly to allocate a new linear region for the current process . In no setting VM_LOCKED In the case of signs , It will not immediately assign physical page boxes to the linear area . But through vma Always delay the allocation of physical memory , Until page missing exception occurs .
After the above process ,malloc() Linear address returned , Suppose the user process accesses this linear address at this time , Then a page missing exception will occur (Page Fault). The whole process of handling page missing exceptions is very complicated , We are only concerned with malloc() The relevant running path .
When CPU When an exception is generated , It will jump to the whole process of exception handling . For page missing exceptions ,CPU Will jump to page_fault In the exception handler .
The exception handler will call do_page_fault() function , This function reads CR2 Register gets the linear address that causes the missing page . Infer through various conditions in order to determine a suitable scheme to deal with this exception .
do_page_fault() function :
This function detects the current abnormal situation through various conditions , But at least do_page_fault() It will distinguish the two situations that cause missing pages : An exception is thrown by a programming error , And caused by linear addresses in the process address space that have not allocated physical memory .
In the latter case , It is usually divided into page faults caused by user space and kernel space .
The exception thrown by the kernel is caused by vmalloc() Produced , It is only used for kernel space memory allocation .
obviously , What we need to pay attention to here is the exception caused by user space . This part of the work starts from do_page_fault() Medium good_area Start running at the label , Mainly through handle_mm_fault() complete .
<span style="font-size:18px;">dotraplinkage void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code){…… ……good_area: write = error_code & PF_WRITE; if (unlikely(access_error(error_code, write, vma))) { bad_area_access_error(regs, error_code, address); return; } fault = handle_mm_fault(mm, vma, address, write ? FAULT_FLAG_WRITE : 0);}</span>handle_mm_fault() function :
The main function of this function is to allocate a physical page box to the process causing the page defect , It first determines whether the page folder items at all levels corresponding to the linear address that caused the missing page exist , How to allocate if it doesn't exist . How to allocate this page box in detail is through the call handle_pte_fault() It's over .
<span style="font-size:18px;">int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
…… ……
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
if (!pud)
return VM_FAULT_OOM;
pmd = pmd_alloc(mm, pud, address);
if (!pmd)
return VM_FAULT_OOM;
pte = pte_alloc_map(mm, pmd, address);
if (!pte)
return VM_FAULT_OOM;
return handle_pte_fault(mm, vma, address, pte, pmd, flags);
}</span>handle_pte_fault() function :
This function is based on page table entries pte Whether the described physical page box is in physical memory , There are two main categories :
Request page change : The accessed page frame is no longer in main memory , Then a page box must be allocated .
When writing copy : The visited page exists , But this page is only read . The kernel needs to write to this page , At this time, the kernel copies the existing data in the read only page to a new page box .
User process access is provided by malloc() The allocated memory space belongs to the first case . For request paging .handle_pte_fault() It is still subdivided into three cases :
1. Assume that the page table entry is indeed empty (pte_none(entry)), Then the page box must be allocated .
Suppose the current process implements vma Operate on fault Hook function , Then this situation belongs to file based memory mapping . It calls do_linear_fault() Allocate physical page boxes .
otherwise . The kernel will call functions that allocate physical page boxes for anonymous mappings do_anonymous_page().
2. Suppose that the table entry of this page is detected as a nonlinear mapping (pte_file(entry)), Call do_nonlinear_fault() Assign physical pages .
3. Suppose that the page box is allocated in advance , But now it has been swapped out from main memory to external memory . Call do_swap_page() Finish page box allocation .
from malloc The allocated memory will call do_anonymous_page() Allocate physical page boxes .
<span style="font-size:18px;">static inline int handle_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pte_t *pte, pmd_t *pmd, unsigned int flags)
{
…… ……
if (!pte_present(entry)) {
if (pte_none(entry)) {
if (vma->vm_ops) {
if (likely(vma->vm_ops->fault))
return do_linear_fault(mm, vma, address,
pte, pmd, flags, entry);
}
return do_anonymous_page(mm, vma, address,
pte, pmd, flags);
}
if (pte_file(entry))
return do_nonlinear_fault(mm, vma, address,
pte, pmd, flags, entry);
return do_swap_page(mm, vma, address,
pte, pmd, flags, entry);
}
…… ……
}</span>do_anonymous_page() function :
here , The missing page exception handler will eventually allocate a physical page frame to the current process . It passes through alloc_zeroed_user_highpage_movable() To finish the process .
Let's peel off the function layer by layer , I found that it finally called alloc_pages().
<span style="font-size:18px;">static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *page_table, pmd_t *pmd,
unsigned int flags)
{
…… ……
if (unlikely(anon_vma_prepare(vma)))
goto oom;
page = alloc_zeroed_user_highpage_movable(vma, address);
if (!page)
goto oom;
…… ……
}</span>After such a complex process , The linear address accessed by the user process finally corresponds to a piece of physical memory . Attached below is what I think is relatively intact malloc() and free() Function source code :
<span style="font-size:18px;">#include <unistd.h>
#include <stdlib.h>
// Block head
union header
{
struct{
union header *next;// Pointer to the next free time
unsigned int size;// The size of the free block
}s;
long x;// alignment
};
typedef union header Header;
#define NALLOC 1024;// Minimum number of units requested , The size of each page is 1KB
static Header* moreSys(unsigned int num);// Request a piece of memory from the system
void* userMalloc(unsigned int nbytes);// Request memory from the user management area
void userFree(void *ap);// Free memory , Put it into the user management area
static Header base;// Define the leisure chain header
static Header *free_list = NULL;// The starting query pointer of the leisure linked list
void* userMalloc(unsigned int nbytes)
{
Header *p;
Header *prev;
unsigned int unitNum;
// Number of bytes to be requested nbytes convert to unitNum First unit of block , Calculate one more as the first management block
unitNum = (nbytes + sizeof(Header) - 1)/sizeof(Header) + 1;
if ((prev = free_list) == NULL)// Suppose there is no free linked list , Define the leisure linked list
{
base.s.next = free_list = prev = &base;
base.s.size = 1;
}
for (p = prev->s.next; ; p = p->s.next, prev = p)
{
if (p->s.size >= unitNum)// The leisure block is big enough
{
if (p->s.size <= (unitNum + 1))
{
prev->s.next = p->s.next;
}
else// Larger , Cut out the necessary piece
{
p->s.size = unitNum;
p += p->s.size;
p->s.size = unitNum;
}
free_list = prev;
return (void *)(p+1);
}
if (p == free_list)
{
if ((p = moreSys(unitNum)) == NULL)// No suitable block . Apply to the system
{
return NULL;
}
}
}
}
static Header* moreSys(unsigned int num)
{
char *cp;
Header *up;
if(num < NALLOC)
num = NALLOC;// The minimum quantity applied to the system
cp = sbrk(num * sizeof(Header));
if (cp == (char *)-1)
{
return NULL;// No free pages . Return empty address
}
up = (Header *)cp;
up->s.size = num;
userFree(up + 1);
return free_list;
}</span><span style="font-size:18px;">// Reclaim memory to the leisure chain
void Free(void *ap)
{
Header *bp, *p;
bp = (Header *)ap - 1; // Point to the beginning of the block
for(p = free_list; !(bp>p && bp<p->s.next); p = p->s.next) // Locate the leisure block by address in the linked list
// Position in
if(p>=p->s.next && (bp>p || bp<p->s.next))
break; // The leisure block is at both ends
if(bp + bp->s.size == p->s.next) { // See whether the free block is adjacent to the existing block , Merge adjacent
bp->s.size += p->s.next->s.size;
bp->s.next = p->s.next->s.next;
}
else
bp->s.next = p->s.next;
if(p + p->s.size == bp) {
p->s.size += bp->s.size;
p->s.next = bp->s.next;
}
else
p->s.next = bp;
free_list = p;
}</span>Copyright notice : This article is the original article of the blogger , Blog , Without consent , Shall not be reproduced .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/117132.html Link to the original text :https://javaforall.cn
边栏推荐
- Alibaba数据源Druid可视化监控配置
- 腾讯架构师首发,2022Android面试笔试总结
- [infrastructure] deployment and configuration of Flink / Flink CDC (MySQL / es)
- 腾讯T4架构师,android面试基础
- Application of clock wheel in RPC
- Zero foundation entry polardb-x: build a highly available system and link the big data screen
- 信息系统项目管理师---第八章 项目质量管理
- OceanBase社区版之OBD方式部署方式单机安装
- 理解 YOLOV1 第二篇 预测阶段 非极大值抑制(NMS)
- 腾讯字节阿里小米京东大厂Offer拿到手软,老师讲的真棒
猜你喜欢

Transformer model (pytorch code explanation)

Mysql Information Schema 学习(二)--Innodb表

Swiftui game source code Encyclopedia of Snake game based on geometryreader and preference

(3) Web security | penetration testing | basic knowledge of network security construction, IIS website construction, EXE backdoor generation tool quasar, basic use of

Hudi vs Delta vs Iceberg

如何自定义动漫头像?这6个免费精品在线卡通头像生成器,看一眼就怦然心动!
In depth analysis, Android interview real problem analysis is popular all over the network

Systematic and detailed explanation of redis operation hash type data (with source code analysis and test results)

Example of shutter text component
腾讯T4架构师,android面试基础
随机推荐
小微企业难做账?智能代账小工具快用起来
潇洒郎: AttributeError: partially initialized module ‘cv2‘ has no attribute ‘gapi_wip_gst_GStreamerPipe
LeetCode_双指针_中等_61. 旋转链表
【翻译】数字内幕。KubeCon + CloudNativeCon在2022年欧洲的选择过程
How to access localhost:8000 by mobile phone
测试用里hi
激进技术派 vs 项目保守派的微服务架构之争
PHP与EXCEL PHPExcel
Transformer model (pytorch code explanation)
腾讯Android面试必问,10年Android开发经验
MySQL must know and learn
如何自定义动漫头像?这6个免费精品在线卡通头像生成器,看一眼就怦然心动!
腾讯云数据库公有云市场稳居TOP 2!
Web开发小妙招:巧用ThreadLocal规避层层传值
MySQL information schema learning (I) -- general table
数据的同步为每个站点创建触发器同步表
HDU 1026 Ignatius and the Princess I 迷宫范围内的搜索剪枝问题
POJ1149 PIGS 【最大流量】
It's enough to read this article to analyze the principle in depth
How to do smoke test