当前位置:网站首页>What happened to the kernel after malloc() was transferred? Attached malloc () and free () implementation source

What happened to the kernel after malloc() was transferred? Attached malloc () and free () implementation source

2022-07-06 19:50:00 Full stack programmer webmaster

Hello everyone , I meet you again , I'm the king of the whole stack .

Hereby declare : In this paper , Quote another article and post , General understanding of combination malloc() Function implementation mechanism .

We are often in C Call in program malloc() Function dynamically allocates a contiguous memory space and uses them . that , What kind of response will these user space events cause in kernel space ?

malloc() It's a API, This function encapsulates the system call in the Library brk. So suppose you call malloc, Then first of all, it will cause brk System call running process .

brk() The corresponding system call service routine in the kernel is SYSCALL_DEFINE1(brk, unsigned long, brk). Parameters brk Used to specify heap Segment new end address . That is, another designation mm_struct The structure of the brk Field .

brk The system call service routine will first determine heap The starting address of the segment min_brk. Then check the resource constraints . next , Old and new heap The addresses are aligned according to the page size , The aligned addresses are stored in newbrk and okdbrk in .

brk() The system call itself can reduce the heap size . It can also expand the heap size . The function of reducing heap is through calling do_munmap() It's over . Suppose you want to expand the size of the pile . Then you must pass find_vma_intersection() Check whether the expanded heap coincides with an existing virtual memory , How to overlap, then exit directly . otherwise , call do_brk() Carry out all kinds of work to expand the heap next .

<span style="font-size:18px;">SYSCALL_DEFINE1(brk, unsigned long, brk)
{
        unsigned long rlim, retval;
        unsigned long newbrk, oldbrk;
        struct mm_struct *mm = current->mm;
        unsigned long min_brk;

        down_write(&mm->mmap_sem);
 
#ifdef CONFIG_COMPAT_BRK
        min_brk = mm->end_code;
#else
        min_brk = mm->start_brk;
#endif
        if (brk < min_brk)
                goto out;
 
        rlim = rlimit(RLIMIT_DATA);
        if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
                        (mm->end_data - mm->start_data) > rlim)
 
        newbrk = PAGE_ALIGN(brk);
        oldbrk = PAGE_ALIGN(mm->brk);
        if (oldbrk == newbrk)
                goto set_brk;
        if (brk brk) {
                if (!do_munmap(mm, newbrk, oldbrk-newbrk))
                        goto set_brk;
                goto out;
        }
 
        if (find_vma_intersection(mm, oldbrk, newbrk+PAGE_SIZE))
                goto out;
 
        if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)
                goto out;
set_brk:
        mm->brk = brk;
out:
        retval = mm->brk;
        up_write(&mm->mmap_sem);
        return retval;
}</span>

brk The system call service routine will finally return the new end address of the heap .

User process call malloc() Will make the kernel call brk System call service routine . because malloc Always allocate memory space dynamically , Therefore, the service routine will now enter the second running path , That is, expand a lot .do_brk() Mainly finish the following work :

1. adopt get_unmapped_area() Find a match in the address space of the current process len Linear interval of size . And the linear interval must be in addr After the address . Suppose we find this linear interval of leisure , Then the starting address of the interval is returned , Otherwise, the error code is returned -ENOMEM;

2. adopt find_vma_prepare() Traverse each one in turn in the red black tree composed of all linear regions of the current process vma. To determine the position of the linear area object before the new area found in the previous step . hypothesis addr Located in an existing vma in , Call do_munmap() Delete this linear area . Assuming the deletion is successful, continue to search , Otherwise, the error code is returned .

3. At present, we have found a suitable size of leisure linear area , Next through vma_merge() Try to merge the current linear area with the adjacent linear area . Suppose the merge succeeds . Then the function returns prev This linear region vm_area_struct Structure pointer . At the same time do_brk(). otherwise , Continue to allocate new linear regions .

4. Next through kmem_cache_zalloc() In particular slab Fast cache vm_area_cachep For this linear region vm_area_struct Descriptive narrator of structure .

5. initialization vma Each field in the structure .

6. to update mm_struct The structure of the vm_total Field , It is used for the current process of the same level vma Number .

7. Assuming the current vma Set up VM_LOCKED Field . Then through the mlock_vma_pages_range() Now assign a physical page box to this linear area .

otherwise ,do_brk() end .

Be able to see ,do_brk() It is mainly to allocate a new linear region for the current process . In no setting VM_LOCKED In the case of signs , It will not immediately assign physical page boxes to the linear area . But through vma Always delay the allocation of physical memory , Until page missing exception occurs .

After the above process ,malloc() Linear address returned , Suppose the user process accesses this linear address at this time , Then a page missing exception will occur (Page Fault). The whole process of handling page missing exceptions is very complicated , We are only concerned with malloc() The relevant running path .

When CPU When an exception is generated , It will jump to the whole process of exception handling . For page missing exceptions ,CPU Will jump to page_fault In the exception handler .

The exception handler will call do_page_fault() function , This function reads CR2 Register gets the linear address that causes the missing page . Infer through various conditions in order to determine a suitable scheme to deal with this exception .

do_page_fault() function :

This function detects the current abnormal situation through various conditions , But at least do_page_fault() It will distinguish the two situations that cause missing pages : An exception is thrown by a programming error , And caused by linear addresses in the process address space that have not allocated physical memory .

In the latter case , It is usually divided into page faults caused by user space and kernel space .

The exception thrown by the kernel is caused by vmalloc() Produced , It is only used for kernel space memory allocation .

obviously , What we need to pay attention to here is the exception caused by user space . This part of the work starts from do_page_fault() Medium good_area Start running at the label , Mainly through handle_mm_fault() complete .

<span style="font-size:18px;">dotraplinkage void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code){…… ……good_area:        write = error_code & PF_WRITE;         if (unlikely(access_error(error_code, write, vma))) {                bad_area_access_error(regs, error_code, address);                return;        }        fault = handle_mm_fault(mm, vma, address, write ? FAULT_FLAG_WRITE : 0);}</span>

handle_mm_fault() function :

The main function of this function is to allocate a physical page box to the process causing the page defect , It first determines whether the page folder items at all levels corresponding to the linear address that caused the missing page exist , How to allocate if it doesn't exist . How to allocate this page box in detail is through the call handle_pte_fault() It's over .

<span style="font-size:18px;">int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags)
{
        pgd_t *pgd;
        pud_t *pud;
        pmd_t *pmd;
        pte_t *pte;
        …… ……
        pgd = pgd_offset(mm, address);
        pud = pud_alloc(mm, pgd, address);
        if (!pud)
                return VM_FAULT_OOM;
        pmd = pmd_alloc(mm, pud, address);
        if (!pmd)
                return VM_FAULT_OOM;
        pte = pte_alloc_map(mm, pmd, address);
        if (!pte)
                return VM_FAULT_OOM;
          return handle_pte_fault(mm, vma, address, pte, pmd, flags);
}</span>

handle_pte_fault() function :

This function is based on page table entries pte Whether the described physical page box is in physical memory , There are two main categories :

Request page change : The accessed page frame is no longer in main memory , Then a page box must be allocated .

When writing copy : The visited page exists , But this page is only read . The kernel needs to write to this page , At this time, the kernel copies the existing data in the read only page to a new page box .

User process access is provided by malloc() The allocated memory space belongs to the first case . For request paging .handle_pte_fault() It is still subdivided into three cases :

1. Assume that the page table entry is indeed empty (pte_none(entry)), Then the page box must be allocated .

Suppose the current process implements vma Operate on fault Hook function , Then this situation belongs to file based memory mapping . It calls do_linear_fault() Allocate physical page boxes .

otherwise . The kernel will call functions that allocate physical page boxes for anonymous mappings do_anonymous_page().

2. Suppose that the table entry of this page is detected as a nonlinear mapping (pte_file(entry)), Call do_nonlinear_fault() Assign physical pages .

3. Suppose that the page box is allocated in advance , But now it has been swapped out from main memory to external memory . Call do_swap_page() Finish page box allocation .

from malloc The allocated memory will call do_anonymous_page() Allocate physical page boxes .

<span style="font-size:18px;">static inline int handle_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pte_t *pte, pmd_t *pmd, unsigned int flags)
{
        …… ……
        if (!pte_present(entry)) {
                if (pte_none(entry)) {
                        if (vma->vm_ops) {
                                if (likely(vma->vm_ops->fault))
                                        return do_linear_fault(mm, vma, address,
                                                pte, pmd, flags, entry);
                        }
                        return do_anonymous_page(mm, vma, address,
                                                 pte, pmd, flags);
                }
                if (pte_file(entry))
                        return do_nonlinear_fault(mm, vma, address,
                                        pte, pmd, flags, entry);
                return do_swap_page(mm, vma, address,
                                        pte, pmd, flags, entry);
        }
…… ……
}</span>

do_anonymous_page() function :

here , The missing page exception handler will eventually allocate a physical page frame to the current process . It passes through alloc_zeroed_user_highpage_movable() To finish the process .

Let's peel off the function layer by layer , I found that it finally called alloc_pages().

<span style="font-size:18px;">static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
                unsigned long address, pte_t *page_table, pmd_t *pmd,
                unsigned int flags)
{
…… ……
        if (unlikely(anon_vma_prepare(vma)))
                goto oom;
        page = alloc_zeroed_user_highpage_movable(vma, address);
        if (!page)
                goto oom;
…… ……
}</span>

After such a complex process , The linear address accessed by the user process finally corresponds to a piece of physical memory . Attached below is what I think is relatively intact malloc() and free() Function source code :

<span style="font-size:18px;">#include <unistd.h>
#include <stdlib.h>
// Block head 
union header
{
	struct{
		union header *next;// Pointer to the next free time 
		unsigned int size;// The size of the free block 
	}s;
	long x;// alignment 
};
typedef union header Header;

#define NALLOC 1024;// Minimum number of units requested , The size of each page is 1KB
static Header* moreSys(unsigned int num);// Request a piece of memory from the system 
void* userMalloc(unsigned int nbytes);// Request memory from the user management area 
void userFree(void *ap);// Free memory , Put it into the user management area 

static Header base;// Define the leisure chain header 
static Header *free_list = NULL;// The starting query pointer of the leisure linked list 

void* userMalloc(unsigned int nbytes)
{
	Header *p;
	Header *prev;
	unsigned int unitNum;
	// Number of bytes to be requested nbytes convert to unitNum First unit of block , Calculate one more as the first management block 
	unitNum = (nbytes + sizeof(Header) - 1)/sizeof(Header) + 1;
	if ((prev = free_list) == NULL)// Suppose there is no free linked list , Define the leisure linked list 
	{
		base.s.next = free_list = prev = &base;
		base.s.size = 1;
	}
	for (p = prev->s.next; ; p = p->s.next, prev = p)
	{
		if (p->s.size >= unitNum)// The leisure block is big enough 
		{
			if (p->s.size <= (unitNum + 1))
			{
				prev->s.next = p->s.next;
			}
			else// Larger , Cut out the necessary piece 
			{
				p->s.size = unitNum;
				p += p->s.size;
				p->s.size = unitNum;
		    }
			free_list = prev;
			return (void *)(p+1);
		}
		if (p == free_list)
		{
			if ((p = moreSys(unitNum)) == NULL)// No suitable block . Apply to the system 
			{
				return NULL;
			}
		}
	}
}

static Header* moreSys(unsigned int num)
{
	char *cp;
	Header *up;

	if(num < NALLOC)
		num = NALLOC;// The minimum quantity applied to the system 
	cp = sbrk(num * sizeof(Header));
	if (cp == (char *)-1)
	{
		return NULL;// No free pages . Return empty address 
	}
	up = (Header *)cp;
	up->s.size = num;
	userFree(up + 1);
	return free_list;
}</span>
<span style="font-size:18px;">// Reclaim memory to the leisure chain 
void Free(void *ap)
{
Header *bp, *p;
bp = (Header *)ap - 1;	 // Point to the beginning of the block 

for(p = free_list; !(bp>p && bp<p->s.next); p = p->s.next)	// Locate the leisure block by address in the linked list 
// Position in 
if(p>=p->s.next && (bp>p || bp<p->s.next))
break;	 // The leisure block is at both ends 
if(bp + bp->s.size == p->s.next) {	 // See whether the free block is adjacent to the existing block , Merge adjacent 
bp->s.size += p->s.next->s.size;
bp->s.next = p->s.next->s.next;
}
else
bp->s.next = p->s.next;

if(p + p->s.size == bp) {
p->s.size += bp->s.size;
p->s.next = bp->s.next;
}
else 
p->s.next = bp;

free_list = p;
}</span>

Copyright notice : This article is the original article of the blogger , Blog , Without consent , Shall not be reproduced .

Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/117132.html Link to the original text :https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207061146428289.html