当前位置:网站首页>Greenplum Database Fault Analysis - Can a Soft Connection Be Made to the Database Base Folder?
Greenplum Database Fault Analysis - Can a Soft Connection Be Made to the Database Base Folder?
2022-08-05 01:52:00 【Fat Uncle】
案例背景
Field projectGreenplum数据库Standby MasterThe node is down and has not been successfully rebuilt,Project operation and maintenance contactDBA团队接口人,团队DBAA colleague referred the fault to me for analysis,作为团队入职两年的小开发一枚,秉承通过故障分析才能快速切入学习数据库路径的原则,接下了这个活.通过分析HA组件(用于在MasterRaised when a node hangsStandby节点;StandbyThe node is initialized and activated when it hangsStandby节点)日志发现,HAComponent reporting fixesStandby Master成功,但是gpstate工具显示Standby MasterThe node is actually still down.HA组件执行gpinitstandbyThe log is shown below:
gpinitstandby:xxx:gpadmin-[INFO]:-Warm master standby removal parameters
gpinitstandby:xxx:gpadmin-[INFO]:-------------------------------------------
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Removing standby master from catalog...
gpinitstandby:xxx:gpadmin-[INFO]:-Database catalog updated successfully.
gpinitstandby:xxx:gpadmin-[INFO]:-Removing data directory on standby master...
gpinitstandby:xxx:gpadmin-[INFO]:-Successfully removed standby master.
gpinitstandby:xxx:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
gpinitstandby:xxx:gpadmin-[INFO]:-Checking for data directory /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-------------------------------------------
gpinitstandby:xxx:gpadmin-[INFO]:Greenplum standby master initialization parameters
gpinitstandby:xxx:gpadmin-[INFO]:-------------------------------------------
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum update system catalog = On
gpinitstandby:xxx:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
gpinitstandby:xxx:gpadmin-[INFO]:-The packages on xxx are consistent
gpinitstandby:xxx:gpadmin-[INFO]:-Adding standby master to catalog...
gpinitstandby:xxx:gpadmin-[INFO]:-Database catalog updated successfully.
gpinitstandby:xxx:gpadmin-[INFO]:-Updating pg_hba.conf file...
gpinitstandby:xxx:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
gpinitstandby:xxx:gpadmin-[INFO]:-Starting standby master
gpinitstandby:xxx:gpadmin-[INFO]:-Checking if standby master is running on host: xxx in directory: /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[WARNING]-Unable to cleanup previously started standby: 'Authorized only. All activity will be monitored and reported
gpinitstandby:xxx:gpadmin-[WARNING]-Could not start standby master
gpinitstandby:xxx:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
It can be found from the above loggpinitstandbyScript initialization is completestandby节点之后,不能启动standby master节点.
分析过程
不能启动standby master节点,Take a look firstpg_log下的startup.log文件,The file time does not match the current cluster time,说明postmasterNo error message is printed when the daemon starts.Now we can only see if some files are missingCould not start standby master.首先看一下master节点和standby master节点gpseg-1Disk data volume comparison,使用du -sh,The size difference can be seen10个G左右(我们知道pg_basebackupSome files are excluded when doing a base backup,这个10GData discrepancies are possible).There is no clue here,Say goodbye directlymaster节点和standby master节点gpseg-1Directory to see if there are any missing directories.This comparison found,standby master节点gpseg-1Actually not inbase数据目录.OMG,This is a big problem,If it has been fixed now,After the customer business volume up,masterNode can't hold up,没有备份standby,那不是DBAnightmare.
base目录是pg_basebackup从masterNode pulled,That is, there may be a problempg_basebackup流程.从standby master节点的/home/gpAdminLogs下面查看pg_basebackup日志,发现如下警告.从日志看出pg_basebackup说baseDirectories are special files,WTF,啥意思.Feel the lookmaster节点gpseg-1下面base是个什么情况.
pg_basebackup: initiating base backup, waiting for checkpoint to complete
WARNING: skipping special file "./base"
pg_basebackup: checkpoint completed
transaction log start point: 0/30000028 on timeline 1
...
transaction log end point: 0/300000D0
pg_basebackup: sync the target data directory
pg_basebackup: base backup completed
master节点gpseg-1下面baseNot a directory but a soft link,由此可以发现pg_basebackupIn fact, the processing of soft links is not in place(In fact, from a development point of view, this is normal,因为standby masterNode disk is unknown,I don't know if it will be successful to rebuild the soft connection,Better to just ignore it).After negotiating with business,Because the business has not been carried out on the system tablevaccum操作,Causes the metadata directory to be too large,Exaggerated to the topTB级别.最关键的是masterThe data directory of the node is under the system disk,In the case of system disk emergency,Say that our database team has provided a soft link solution,将baseThe data is copied to the external disk,Then made a soft link,Their test department tests are fine...
从standby master中找到base链接的路径,Check and find that the data inside is aboveTB的,And the file date is a bit old,It can be guessed that the business must be in the database master和standby masterIt is the processing operation of the shutdown soft link directory that is normally done,Haven't dealt with it since.gpinitstandbyThe script will have a flowRemoving data directory on standby master...
,But why not delete itstandby master中baseWhat about the link path.The specific code to delete is shown below,Used when visiblersync进行删除的,经过测试发现rsyncDeleting the file will not delete the directory linked by the soft link.That's where the problem came to light.
unique_dir = "/tmp/emptyForRemove%s" %uuid.uuid4()
if [ -d {
target_dir} ]; then
mkdir -p {
unique_dir} && rsync -a --delete {
unique_dir} {
target_dir} && rmdir {
target_dir} {
unique_dir};
fi
产生原因
- rsyncDeleting the file will not delete the directory linked by the soft link
- pg_basebackupIn fact, the processing of soft links is not in place
解决方案
From the above process, we can see that it is correctbase上一层做/home/gpadmin/data/master/default/gpseg-1
软连接,在重建standby master时,gpinitstandbyThe script won't install it for us eithermasterConfigured soft links are rebuiltstandby mastersoft link on ,Instead, create the directory directly,Therefore, the directory will still be in the system disk,instead of our newly mounted directory.只有对/home/gpadmin/data/master/default
This level of directory is a soft link,Do not database functions help us rebuild and create,This ensures that no matter how the active and standby switches are switched,The data will always exist in our linked directory.
边栏推荐
猜你喜欢
[Word] #() error occurs after Word formula is exported to PDF
linux(centOs7)部署mysql(8.0.20)数据库
day14--postman接口测试
迁移学习——Joint Geometrical and Statistical Alignment for Visual Domain Adaptation
Day Fourteen & Postman
[Machine Learning] 21-day Challenge Study Notes (2)
CNI (Container Network Plugin)
MySQL学习
4. PCIe interface timing
Dynamic Programming/Knapsack Problem Summary/Summary - 01 Knapsack, Complete Knapsack
随机推荐
How DHCP works
手把手基于YOLOv5定制实现FacePose之《YOLO结构解读、YOLO数据格式转换、YOLO过程修改》
Three handshake and four wave in tcp
fragment可见性判断
[Redis] Redis installation under Linux
Gartner Hype Cycle:超融合技术将在2年内到达“生产力成熟期”
【翻译】CNCF对OpenTracing项目的存档
迁移学习——Joint Geometrical and Statistical Alignment for Visual Domain Adaptation
跨域解决方案
Leetcode brushing questions - 22. Bracket generation
亚马逊云科技 + 英特尔 + 中科创达为行业客户构建 AIoT 平台
如何看待自己的羞愧感
【PyQT5 绑定函数的传参】
(17) 51 MCU - AD/DA conversion
原生js实现多选框全部选中和取消效果
[parameters of PyQT5 binding functions]
(十七)51单片机——AD/DA转换
Chapter 09 Use of Performance Analysis Tools [2. Index and Tuning] [MySQL Advanced]
EBS uses virtual columns and hint hints to optimize sql case
A new technical director, who calls DDD a senior, is convinced