当前位置:网站首页>Detailed explanation of opengauss multi thread architecture startup process
Detailed explanation of opengauss multi thread architecture startup process
2022-06-11 15:50:00 【Gauss squirrel Club】
Catalog
openGauss Why use multithreading Architecture
openGauss What are the main threads
Think about how to add a worker thread
openGauss Database is a single process multithreaded database , The client can use JDBC/ODBC/Libpq/Psycopg Wait for the driver , towards openGauss The main thread (Postmaster) Initiate connection request .

openGauss Why use multithreading Architecture
With the development of multi-core technology in computer field , How to make full and effective use of multi-core parallel processing ability , Is a problem that every server-side application must consider . Because there are a lot of data sharing and synchronization between the service processes or threads of the database server , Multithreading can make full use of multithreading CPU To perform multiple strongly related tasks in parallel , For example, the execution engine can make full use of the concurrent execution of threads to provide performance . Under the multithreaded architecture , Data sharing is more efficient , It can improve the efficiency and performance of server access , At the same time, the maintenance cost and complexity are lower , This is very important to improve the parallel processing ability of database system .
Three main advantages of multithreading :
Advantage 1 : The thread startup cost is much less than the process startup cost . Compared with the process , It's a very “ frugal ” Multi task operation mode of . stay Linux Under the system , Starting a new process must be assigned a separate address space , Build a large number of data tables to maintain its code segments 、 Stack and data segments , This is a kind of “ expensive ” The multitasking way of working . And multiple threads running in one process , They use the same address space as each other , Share most of the data , Starting a thread takes far less space than starting a process .
Advantage 2 : Convenient communication mechanism between threads : For different processes , They have independent data space , The transmission of data can only be done by means of communication , This way is not only time-consuming , And it's inconvenient . Threads are not , Because the threads under the same process share data space , So the data of one thread can be directly used by other threads , It's not only fast , And convenient .
Advantage three : Thread switching overhead is less than process switching overhead , about Linux In terms of system , Process switching is divided into two steps :1. Switch page directories to use the new address space ;2. Switch between kernel stack and hardware context . Switch threads , The first 1 Step is not necessary , The first 2 Step is what both processes and threads have to do , So obviously, the thread switching overhead is small .
openGauss What are the main threads
Background thread | Function is introduced |
Postmaster The main thread | Entry function PostmasterMain, Mainly responsible for memory 、 Global information 、 The signal 、 Initialization of thread pool, etc , Start worker threads and monitor thread status , Loop listening to receive new connections |
Walwriter Log writing thread | Entry function WalWriterMain, Refresh the pre write log page data in memory to the pre write log file , Keep a permanent record of what has been submitted , Will not be lost |
Startup Database startup thread | Entry function StartupProcessMain, When the database starts Postmaster The first child thread pulled up by the main thread , It mainly completes the log of the database REDO( redo ) operation , Recovery of database . journal REDO End of operation , After the database is restored , If it's not a standby machine ,Startup The thread exits . If it is a standby machine , that Startup Threads are always running ,REDO The standby machine receives a new log |
Bgwriter Background data writing thread | Entry function BackgroundWriterMain, Footwall dirty page data of shared buffer |
PageWriter | Entry function ckpt_pagewriter_main, Copy the dirty page data to the double write area and drop it |
Checkpointer Checkpoint threads | Entry function CheckpointerMain, Periodic checkpoints , All data files are updated , Flush dirty pages of data to disk , Ensure database consistency ; After the crash reply , done checkpointer Changes do not need to be recovered from the pre write log |
StatCollector Count threads | Entry function PgstatCollectorMain, Statistics , Including objects 、sql、 conversation 、 Locks, etc. , Save to pgstat.stat In file , For performance 、 fault 、 State analysis |
WalSender Log sending thread | Entry function WalSenderMain, The host sends the pre write log |
WalReceiver Log receiving thread | Entry function WalReceiverMain, The standby machine receives the pre write log |
Postgres Business processing thread | Entry function PostgresMain: Handle client connection requests , Execution related SQL Business |
After the database starts , You can use operating system commands ps View thread information ( The process number is 17012)

openGauss The boot process
Here is the main introduction openGauss Database startup process , Including the main thread , Starting process of auxiliary thread and business processing thread .
gs_ctl Start database
gs_ctl yes openGauss The database service control tool provided , It can be used to start and stop database services and query database status . It is mainly used by the database management module , Start the database and use the following command :
gs_ctl start -D /opt/software/data -Z single_nodegs_ctl The entry function of is in “src/bin/pg_ctl/pg_ctl.cpp”,gs_ctl process fork A process to run gaussdb process , adopt shell Command to start .

In the picture above cmd by “/opt/software/openGauss/bin/gaussdb -D /opt/software/openGauss/data”, Enter into The first function called by the database operation is main function , stay “src/gausskernel/process/main/main.cpp” In file , stay main.cpp In file , Mainly complete the example Context( Context ) The initialization 、 Locale settings , according to main.cpp The entry parameter of the file calls BootStrapProcessMain function 、GucInfoMain function 、PostgresMain Functions and PostmasterMain function .BootStrapProcessMain Functions and PostgresMain The function is in initdb In this scenario, the database is initialized .GucInfoMain The function displays GUC(grand unified configuration, Configuration parameters , In the database, it refers to the operating parameters ) parameter information . Normal database startup will enter PostmasterMain function . This function is described in more detail below .

1.MemoryContextInit: Memory context system initialization , Mainly complete the right ThreadTopMemoryContext,ErrorContext,AlignContext and ProfileLogging Initialization of global variables
2.pg_perm_setlocale: Set the global variables related to the program locale
3.check_root: Confirm that the program operator has no operating system root jurisdiction , Prevent accidental file overwriting and other problems
4. If gaussdb The first parameter after is —boot, Then initialize the database , If gaussdb The first parameter after is --single, Call PostgresMain(), Get into ( Local ) Single user server program . after , Similar to ordinary server-side threads , Loop waiting for user input SQL sentence , Until the user enters EOF(Ctrl+D), Exit procedure . If no additional startup options are specified , Program entry PostmasterMain function , Start a series of normal initialization on the server side .
PostmasterMain function
Here is a detailed introduction PostmasterMain.

1. Set global variables related to thread number MyProcPid、PostmasterPid、MyProgName Global variables related to the program running environment IsPostmasterEnvironment
2. call postmaster_mem_cxt = AllocSetContextCreate(t_thrd.top_mem_cxt,...), In the current thread top_mem_cxt Create postmaster_mem_cxt Global variables and corresponding memory context
3. MemoryContextSwitchTo(postmaster_mem_cxt) Switch to postmaster_mem_cxt Memory context
4. call getInstallationPaths(), Set up my_exec_path( Generally, it is gaussdb The path of the executable )
5. call InitializeGUCOptions(), According to each in the code GUC The default value of the parameter generates ConfigureNamesBool、ConfigureNamesInt、ConfigureNamesReal、ConfigureNamesString、ConfigureNamesEnum etc. GUC Global variable array of parameters , And unified management GUC Parametric guc_variables、num_guc_variables、size_guc_variables Global variables , And set the settings related to the specific operating system environment GUC Parameters
6. while (opt = ...) SetConfigOption, If it's starting gaussdb A non default is specified in the GUC Parameters , Then it will be loaded into the global variable created in the previous step
7. call checkDataDir(), Confirm that the database installation is successful and PGDATA Validity of the directory
8. call CreateDataDirLockFile(), Create a lock file for the data directory
9. call process_shared_preload_libraries(), Processing preloaded Libraries
10. For each ListenSocket Create a listener
11. reset_shared, Set shared memory and signal , It mainly includes page cache pool 、 Various lock cache pools 、WAL Log cache pool 、 Transaction log cache pool 、 Business ( Number ) Overview cache pool 、 Each background thread ( Lock use ) Overview cache pool 、 Each background thread waiting and running state cache pool 、 Two phase state cache pool 、 Checkpoint cache pool 、WAL Log copy and receive cache pool 、 Data page copying and receiving cache pool, etc . The shared memory space is used by the client background thread and each worker thread created in the subsequent stage , No longer open up alone
12. Will be set manually at startup GUC Parameters are saved in file form , For subsequent background server thread startup
13. Set... For different signals handler
14. call pgstat_init(), Initialize the status collection subsystem ;
15. call load_hba(), load pg_hba.conf file , This file records the allowed connections ( Specify or all ) The address and port of the client physical machine of the database ; call load_ident(), load pg_ident.conf file , This file records the corresponding relationship between the operating system user name and the database system user name , In order to handle the identity authentication of the client connection later
16. call StartupPID = initialize_util_thread(STARTUP), Perform data consistency verification . For the server host , see pg_control file , If the last closing status is DB_SHUTDOWNED And recovery.conf The file is not specified for recovery , It is considered that the data consistency is established ; otherwise , according to pg_control Of checkpoints in redo Location or recovery.conf The location specified in the file , Read WAL Log or archive log replay( The playback ), Until the data reach the expected consistency , Main function StartupXLOG
17. The last to enter ServerLoop() function , Cyclic response to client connection request .
ServerLoop function
Let's talk about ServerLoop Function main flow .

1. call gs_signal_setmask(&UnBlockSig, NULL) and gs_signal_unblock_sigusr2(), Enables threads to respond to requests from users or other threads 、 Specified signal set
2. every other PM_POLL_TIMEOUT_MINUTE Time Modify a socket Document and socket Access and modification time of lock file , To avoid being eliminated by the operating system
3. Determine thread state (pmState), if PM_WAIT_DEAD_END, Then sleep 100 millisecond , And does not receive any connections ; otherwise , By system call poll() or select() To block reading the incoming data on the listening port , Maximum blocking time PM_POLL_TIMEOUT_SECOND
4. call gs_signal_setmask(&BlockSig, NULL) and gs_signal_block_sigusr2() No longer receive external signals
5. Judge poll() or select() The return value of the function , If less than zero , Listening error , Server process exit ; If greater than zero , Then create a connection ConnCreate(), And enter the background service thread to start the process BackendStartup(). For the parent thread , namely postmaster Threads , In the end BackendStartup() After the call of , Would call ConnFree(), Clear connection information ; if poll() or select() The return value of is zero , That is, no information is passed in , Nothing is done
6. call ADIO_RUN()、ADIO_END() , if AioCompleters Has not started , Then start it
7. Check whether the thread number of each worker thread is zero , If zero , Call initialize_util_thread start-up
Take the non linear process pool mode as an example , Introduce the starting logic of thread .BackendStartup Function by calling initialize_worker_thread(WORKE,port) Create a background thread to process customer requests . Startup function of background thread initialize_util_thread and The startup function of the worker thread initialize_worker_thread, Finally, it's all called initialize_thread function Finish starting the thread .

1.initialize_thread Function call gs_thread_create Function to create a thread , call InternalThreadFunc Function processing thread .
ThreadId initialize_thread(ThreadArg* thr_argv)
{
gs_thread_t thread;
int error_code = gs_thread_create(&thread, InternalThreadFunc, 1, (void*)thr_argv);
if (error_code != 0) {
ereport(LOG,
(errmsg("can not fork thread[%s], errcode:%d, %m",
GetThreadName(thr_argv->m_thd_arg.role), error_code)));
gs_thread_release_args_slot(thr_argv);
return InvalidTid;
}
return gs_thread_id(thread);
}2.InternalThreadFunc The function calls... According to the role GetThreadEntry function ,GetThreadEntry The function takes the role as the subscript directly , Return the corresponding GaussdbThreadEntryGate The elements of the array . The elements of the array are pointers to callback functions that handle specific tasks , The function pointed to by the pointer is GaussDbThreadMain.
static void* InternalThreadFunc(void* args)
{
knl_thread_arg* thr_argv = (knl_thread_arg*)args;
gs_thread_exit((GetThreadEntry(thr_argv->role))(thr_argv));
return (void*)NULL;
}
GaussdbThreadEntry GetThreadEntry(knl_thread_role role)
{
Assert(role > MASTER && role < THREAD_ENTRY_BOUND);
return GaussdbThreadEntryGate[role];
}
static GaussdbThreadEntry GaussdbThreadEntryGate[] = {GaussDbThreadMain<MASTER>,
GaussDbThreadMain<WORKER>,
GaussDbThreadMain<THREADPOOL_WORKER>,
GaussDbThreadMain<THREADPOOL_LISTENER>,
......};3. stay GaussDbThreadMain Function , First, initialize the basic information of the thread ,Context And signal processing functions , Then according to thread_role Different roles call processing functions of different roles , Enter the of each thread main function , The role of WORKER Will enter PostgresMain function .
PostgresMain function
Here is a detailed introduction PostgresMain function .

1 process_postgres_switches(), Load the incoming startup options and GUC Parameters
2. Set... For different signals handler
3. call sigdelset(&BlockSig, SIGQUIT), Allow response SIGQUIT The signal
4. call BaseInit(), Initialize the storage management system and page cache pool count
5. call on_shmem_exit(), Set the memory cleaning action required before the thread exits . These cleaning actions form a linked list (on_shmem_exit_list Global variables ), Every time this function is called, a node is added to the end of the linked list , The length of the linked list is determined by on_shmem_exit_index Record , And not more than MAX_ON_EXITS macro . On thread exit , Call the actions in each node from the back to the front ( A function pointer ), Finish cleaning
6. call gs_signal_setmask (&UnBlockSig), Set the signal type of shielding
7. call InitBackendWorker Initialize the statistical system 、syscache Initialization work
8. BeginReportingGUCOptions Print... If necessary GUC Parameters
9. call on_proc_exit(), Set the thread cleaning action required before the thread exits . Setting and invoking mechanisms are similar to on_shmem_exit() similar
10. call process_local_preload_libraries(), Handle GUC Preload library after parameter setting
11. AllocSetContextCreate establish MessageContext、RowDescriptionContext、MaskPasswordCtx Context
12. call sigsetjmp(), Set up longjump spot , If there is an error in subsequent query execution , In some cases, you can go back here and start again
13. call gs_signal_unblock_sigusr2(), Allows a thread to respond to a specified semaphore set
14. Then enter for loop , Perform query execution

1. call pgstat_report_activity()、pgstat_report_waitstatus(), Tell the statistics system that the background thread is be in idle state
2. Set global variables DoingCommandRead = true
3. call ReadCommand(), Read the client SQL sentence
4. Set global variables DoingCommandRead=false
5. If received in the above process SIGHUP The signal , Indicates that the thread needs to reload the modified postgresql.conf The configuration file
6. Get into switch (firstchar), Make branch judgment according to the received information
Think about how to add a worker thread
If you are interested in the above analysis , Think about how to add a worker thread , You can refer to other threads to complete .
Involves modifying the file | Postmaster.cpp |
Involves modifying functions | GaussdbThreadGate – Definition Serverloop – Start thread Reaper – Recycle thread GaussDBThreadMain – Entry function |
边栏推荐
- 了解下openGauss的密态支持函数/存储过程
- What is the future of software testing in 2022? Do you need to understand the code?
- [Yugong series] June 2022 Net architecture class 077 distributed middleware schedulemaster loading assembly timing task
- Database design recommendations
- AI4DB:人工智能之慢SQL根因分析
- 【愚公系列】2022年06月 .NET架构班 076-分布式中间件 ScheduleMaster的执行原理
- 从内核代码了解SQL如何解析
- Dapr mind map
- [Yugong series] June 2022 Net architecture class 079 cluster principle of distributed middleware schedulemaster
- 向数据库导入数据?试试COPY FROM STDIN语句
猜你喜欢

openGauss数据库ODBC环境连接配置(Windows)

Performance of MOS transistor 25n120 of asemi in different application scenarios

让快递快到来不及退款的,真的不是人

Why are bugs changing more and more?

同学,你听说过MOT吗?

What is the future of software testing in 2022? Do you need to understand the code?

Go language - array

Find combination number (function)

Using cloud DB to build app quick start - quick application

【0006】title、关键字及页面描述
随机推荐
数据库密态等值查询概述及操作
【愚公系列】2022年06月 .NET架构班 077-分布式中间件 ScheduleMaster加载程序集定时任务
Import data to the database? Try the copy from stdin statement
从屡遭拒稿到90后助理教授,罗格斯大学王灏:好奇心驱使我不断探索
From digital twinning to digital immortality, the "three-stage theory" of the development of the meta universe
Application of AI in index recommendation
Learn automatic testing of postman interface from 0 to 1
Zero foundation self-study software test, I spent 7 days sorting out a set of learning routes, hoping to help you
码农必备SQL调优(下)
openGauss简单查询SQL的执行流程解析
dapr 思维导图
[Yugong series] June 2022 Net architecture class 076- execution principle of distributed middleware schedulemaster
导入数据:gs_restore or MERGE INTO? 看看哪款更适合你
[digital signal processing] correlation function (correlation function property | conjugate symmetry property of correlation function | even symmetry of real signal autocorrelation function | conjugat
YEF 2022昨日开幕,多网络平台全程免费直播,开启在线技术盛宴!
Selenium-- display waiting (medium) -- detailed explanation
openGauss数据库性能调优概述及实例分析
Cf662b graph coloring problem solution
GO语言-Slice切片
What is the future of software testing in 2022? Do you need to understand the code?