当前位置:网站首页>Greenplum Database Fault Analysis - Can a Soft Connection Be Made to the Database Base Folder?
Greenplum Database Fault Analysis - Can a Soft Connection Be Made to the Database Base Folder?
2022-08-05 01:52:00 【Fat Uncle】
案例背景
Field projectGreenplum数据库Standby MasterThe node is down and has not been successfully rebuilt,Project operation and maintenance contactDBA团队接口人,团队DBAA colleague referred the fault to me for analysis,作为团队入职两年的小开发一枚,秉承通过故障分析才能快速切入学习数据库路径的原则,接下了这个活.通过分析HA组件(用于在MasterRaised when a node hangsStandby节点;StandbyThe node is initialized and activated when it hangsStandby节点)日志发现,HAComponent reporting fixesStandby Master成功,但是gpstate工具显示Standby MasterThe node is actually still down.HA组件执行gpinitstandbyThe log is shown below:
gpinitstandby:xxx:gpadmin-[INFO]:-Warm master standby removal parameters
gpinitstandby:xxx:gpadmin-[INFO]:-------------------------------------------
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Removing standby master from catalog...
gpinitstandby:xxx:gpadmin-[INFO]:-Database catalog updated successfully.
gpinitstandby:xxx:gpadmin-[INFO]:-Removing data directory on standby master...
gpinitstandby:xxx:gpadmin-[INFO]:-Successfully removed standby master.
gpinitstandby:xxx:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
gpinitstandby:xxx:gpadmin-[INFO]:-Checking for data directory /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-------------------------------------------
gpinitstandby:xxx:gpadmin-[INFO]:Greenplum standby master initialization parameters
gpinitstandby:xxx:gpadmin-[INFO]:-------------------------------------------
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master hostname = xxx
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master port = 5432
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum standby master data directory = /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[INFO]:-Greenplum update system catalog = On
gpinitstandby:xxx:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
gpinitstandby:xxx:gpadmin-[INFO]:-The packages on xxx are consistent
gpinitstandby:xxx:gpadmin-[INFO]:-Adding standby master to catalog...
gpinitstandby:xxx:gpadmin-[INFO]:-Database catalog updated successfully.
gpinitstandby:xxx:gpadmin-[INFO]:-Updating pg_hba.conf file...
gpinitstandby:xxx:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
gpinitstandby:xxx:gpadmin-[INFO]:-Starting standby master
gpinitstandby:xxx:gpadmin-[INFO]:-Checking if standby master is running on host: xxx in directory: /home/gpadmin/data/master/default/gpseg-1
gpinitstandby:xxx:gpadmin-[WARNING]-Unable to cleanup previously started standby: 'Authorized only. All activity will be monitored and reported
gpinitstandby:xxx:gpadmin-[WARNING]-Could not start standby master
gpinitstandby:xxx:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
It can be found from the above loggpinitstandbyScript initialization is completestandby节点之后,不能启动standby master节点.
分析过程
不能启动standby master节点,Take a look firstpg_log下的startup.log文件,The file time does not match the current cluster time,说明postmasterNo error message is printed when the daemon starts.Now we can only see if some files are missingCould not start standby master.首先看一下master节点和standby master节点gpseg-1Disk data volume comparison,使用du -sh,The size difference can be seen10个G左右(我们知道pg_basebackupSome files are excluded when doing a base backup,这个10GData discrepancies are possible).There is no clue here,Say goodbye directlymaster节点和standby master节点gpseg-1Directory to see if there are any missing directories.This comparison found,standby master节点gpseg-1Actually not inbase数据目录.OMG,This is a big problem,If it has been fixed now,After the customer business volume up,masterNode can't hold up,没有备份standby,那不是DBAnightmare.
base目录是pg_basebackup从masterNode pulled,That is, there may be a problempg_basebackup流程.从standby master节点的/home/gpAdminLogs下面查看pg_basebackup日志,发现如下警告.从日志看出pg_basebackup说baseDirectories are special files,WTF,啥意思.Feel the lookmaster节点gpseg-1下面base是个什么情况.
pg_basebackup: initiating base backup, waiting for checkpoint to complete
WARNING: skipping special file "./base"
pg_basebackup: checkpoint completed
transaction log start point: 0/30000028 on timeline 1
...
transaction log end point: 0/300000D0
pg_basebackup: sync the target data directory
pg_basebackup: base backup completed
master节点gpseg-1下面baseNot a directory but a soft link,由此可以发现pg_basebackupIn fact, the processing of soft links is not in place(In fact, from a development point of view, this is normal,因为standby masterNode disk is unknown,I don't know if it will be successful to rebuild the soft connection,Better to just ignore it).After negotiating with business,Because the business has not been carried out on the system tablevaccum操作,Causes the metadata directory to be too large,Exaggerated to the topTB级别.最关键的是masterThe data directory of the node is under the system disk,In the case of system disk emergency,Say that our database team has provided a soft link solution,将baseThe data is copied to the external disk,Then made a soft link,Their test department tests are fine...
从standby master中找到base链接的路径,Check and find that the data inside is aboveTB的,And the file date is a bit old,It can be guessed that the business must be in the database master和standby masterIt is the processing operation of the shutdown soft link directory that is normally done,Haven't dealt with it since.gpinitstandbyThe script will have a flowRemoving data directory on standby master...
,But why not delete itstandby master中baseWhat about the link path.The specific code to delete is shown below,Used when visiblersync进行删除的,经过测试发现rsyncDeleting the file will not delete the directory linked by the soft link.That's where the problem came to light.
unique_dir = "/tmp/emptyForRemove%s" %uuid.uuid4()
if [ -d {
target_dir} ]; then
mkdir -p {
unique_dir} && rsync -a --delete {
unique_dir} {
target_dir} && rmdir {
target_dir} {
unique_dir};
fi
产生原因
- rsyncDeleting the file will not delete the directory linked by the soft link
- pg_basebackupIn fact, the processing of soft links is not in place
解决方案
From the above process, we can see that it is correctbase上一层做/home/gpadmin/data/master/default/gpseg-1
软连接,在重建standby master时,gpinitstandbyThe script won't install it for us eithermasterConfigured soft links are rebuiltstandby mastersoft link on ,Instead, create the directory directly,Therefore, the directory will still be in the system disk,instead of our newly mounted directory.只有对/home/gpadmin/data/master/default
This level of directory is a soft link,Do not database functions help us rebuild and create,This ensures that no matter how the active and standby switches are switched,The data will always exist in our linked directory.
边栏推荐
- 硬实力和软实力,哪个对测试人来说更重要?
- (17) 51 MCU - AD/DA conversion
- AI+小核酸药物|Eleven完成2200万美元种子轮融资
- ORA-00604 ORA-02429
- Live playback including PPT download | Build Online Deep Learning based on Flink & DeepRec
- Opencv - video frame skipping processing
- Use of pytorch: Convolutional Neural Network Module
- [Word] #() error occurs after Word formula is exported to PDF
- day14--postman interface test
- 释放技术创新引擎,英特尔携手生态合作伙伴推动智慧零售蓬勃发展
猜你喜欢
迁移学习——Joint Geometrical and Statistical Alignment for Visual Domain Adaptation
MySQL learning
iNFTnews | 对体育行业和球迷来说,NFT可以带来什么?
习题:选择结构(一)
【Endnote】Word插入自定义形式的Endnote文献格式
Gartner Hype Cycle:超融合技术将在2年内到达“生产力成熟期”
ExcelPatternTool: Excel表格-数据库互导工具
行业现状?互联网公司为什么宁愿花20k招人,也不愿涨薪留住老员工~
The use of pytorch: temperature prediction using neural networks
Lattice PCIe Learning 1
随机推荐
Method Overriding and Object Class
iNFTnews | 对体育行业和球迷来说,NFT可以带来什么?
fragment可见性判断
优化Feed流遭遇拦路虎,是谁帮百度打破了“内存墙”?
Greenplum数据库故障分析——版本升级后gpstart -a为何返回失败
KingbaseES V8 GIS数据迁移方案(2. Kingbase GIS能力介绍)
第十四天&postman
MySQL学习
Opencv - video frame skipping processing
HOG特征学习笔记
1349. Maximum number of students taking the exam Status Compression
ExcelPatternTool: Excel表格-数据库互导工具
如何看待自己的羞愧感
张驰咨询:揭晓六西格玛管理(6 Sigma)长盛不衰的秘密
[Word] #() error occurs after Word formula is exported to PDF
Knowledge Points for Network Planning Designers' Morning Questions in November 2021 (Part 1)
深度学习:使用nanodet训练自己制作的数据集并测试模型,通俗易懂,适合小白
【MySQL series】- Does LIKE query start with % will make the index invalid?
详细全面的postman接口测试实战教程
The difference between a process in user mode and kernel mode [exclusive analysis]