当前位置:网站首页>20220211 failure - maximum amount of data supported by mongodb
20220211 failure - maximum amount of data supported by mongodb
2022-07-06 21:06:00 【It migrant worker brother goldfish】
Deal with various needs and faults in daily work , In particular, fault response is particularly important .
If the fault is not handled in time , The loss caused , In my previous sentence : Every second is money .
Brother goldfish suddenly thought , Make a new column to record the faults encountered in daily work , I hope my sharing and summary can enlighten you .
Maybe it's fate , I was in my new unit yesterday ( Outsourcing to a large company ) First day of mobilization , Not familiar with various environments , They were taken up to follow up the faults encountered .
Fortunately, , Although I already have 11 I haven't worked in the front line of technology for months , But my technical skills have always been . Again reflected in the foundation , thinking , Experience is really important , Let me locate the problem and let relevant personnel communicate the subsequent solutions .
Fault description :
At midday , I went to... With the delivery manager XX courtyard 5 In the information center of the building , What we are dealing with is developers ( ji ao) Development and company support engineer , I learned the relevant information from everyone :
Application and use MongoDB 3.4, It used to be 3 Set up a cluster of machines + 1. Arbitration node , On Thursday, developers tried to take a node for single node test , Therefore, one node data has been deleted . But I took a snapshot on Thursday night ( Use XX cloud ), So there is a snapshot backup .
The current situation of the day :
1. One master one slave one arbitration node
2. Master and slave nodes cannot start at the same time
2. When starting the master node first , The node status is SECONDARY, Restart the slave node , The slave node cannot start , The process is shut down when an exception occurs directly .Test results :
Start the master node as a stand-alone , Can't read or write , When writing, the same exception occurs as the slave node mentioned above . Subsequently, delete some data of the master node , The master node is readable and writable .
Here are some related logs ( Because I just entered , So I didn't get any more information , Now you can only check the relevant logs on site )
The primary node is always secondary The state of , So you can only read , Can't write , There are problems in the whole cluster .
Therefore, relevant uploading operations cannot be carried out in business ( Can't write ), However, various queries can be made .
Make complaints about it :
o((⊙﹏⊙))o Go to the scene , The discovery libraries are all running in windows server On , How painful it is to view logs ...
Field personnel are not familiar with their own application architecture , Ask whether the concurrency of applications is large ? I don't know. ( If concurrency is not great , Cluster wool )... So you need to know all kinds of things when you go to the scene , Time consuming ...
The above two points hinder the efficiency of troubleshooting ...*️*️*️
then , Why are all kinds of problems on the cluster still tangled on the scene , Why not take restoring business first as the guide ?*️*️*️
It also shows that there is no corresponding emergency plan . You can also think about how many things you need to improve in your future work .
Test and try :
Looking at them struggling with the cluster , I'm new here , I wanted to look around , But thinking about , Since I follow you , If they can't handle , It will only definitely work overtime ... You know, it's far away , this XX The academy is in Science City , I added Guangyuan Road in Baiyun District ... So I couldn't help making a noise :
“ First, it is oriented to restore business , Don't worry about the problems in the cluster , First restart the service with a single node , Resume business . According to your feedback , Using concurrency is not big , Single node is ok , Recover first .”
It is suggested that the developer find a single node with data , Let him modify different ports to start the database .
After starting , There was a phenomenon of speaking on Thursday night , Delete some data of the node , Nodes are readable and writable .
Let development test phenomena , After deleting some files , Indeed, you can upload files , But when uploading large files , You can't continue uploading , It will cause the database to collapse , Restart the library can only read , Writing again will collapse .
From the log , There is no effective error reporting information at all .(win It's painful to read the log ... Why don't linux,low Is it like this ?)
See this phenomenon , I'm by the side , I suggest you test it again .
At this time, I will record the approximate capacity , Look at the size of the file he deleted , Then compare the uploaded capacity .
During the test , Just upload the size , No more than the deleted capacity can be uploaded , exceed , It just broke down . And the disk capacity is enough .
See here , A sudden inspiration , Is there any limitation ?
This is the time , Look up your notes and find online articles immediately .
Fault analysis :
Check your notes , There is a passage :
Limit
MongoDB Generally applicable to 64 Bit operating system ,32 Bit systems can only address 4GB Memory , It means that the data set contains metadata and stores up to 4GB,Mongodb You can't store additional data , Strongly recommended 32 Bit system uses Mongodb You can test it yourself , The production environment is used everywhere 64 Bit operating system .
The maximum document size helps ensure that a single document does not use too much RAM Or occupy too much bandwidth in the transmission process . To store documents larger than the maximum size ,MongoDB Provides GridFS API.MongoDB Support BSON The nesting level of documents should not exceed 100.
Replica set
stay Mongodb3.0 Replica set members in support of 50 individual , That is to say, the replica set is supported 50 Nodes , Replica set data support for each node 32T, It is recommended that the data of each instance of the replica set should not exceed 4T, Large amount of data, backup and recovery time will be long .
coming , See the point ,“ It is recommended that the data of each instance of the replica set should not exceed 4T”, Then immediately let the corresponding development check ,3915G, At this time, I finally know the problem , No wonder some data can be rewritten after being deleted , Because it has reached the upper limit of others , So I can't write any more .
Check the website (https://docs.mongodb.com/v3.4/reference/limits/) Specific description :
Storage limits
Slicing will use the default chunk The size is 64M, If our slice key ( Chip key )values The value is 512 byte , Sharding nodes support maximum 32768 individual , The maximum set size is 1TB.
The size of a chip key cannot exceed 512 byte .
If you don't make relevant settings , By default chunk The size is 64M Under the circumstances , Use 128 byte , The biggest is just 4TB The capacity of .
Don't think about it , Certainly not considering all kinds of , and , I can't use 64 Bytes are so small values value .
Therefore, it is inferred that the problem is caused by the fast reaching capacity limit of a single instance .
Use Alibaba cloud's MongoDB Specifications can also be seen :
Fault handling :
Do this , It's what I can do on my first day .
Then let the developer communicate with the operation and maintenance manager of the whole group to solve the subsequent problem .( I'm new here , The environment is completely unfamiliar )
The person in charge of the developer , Operations Manager , user , After discussion and communication , Back up the data first ( Yes 2 The data of nodes is , And there are snapshots , but zheng wu Snapshots of clouds once had pits , In my heart , There is still a physical backup conservative point , But the backup must be too long because there are too many small files , So for the time being ), Then count 2020 Years ago ( After statistics , Only 200 many G), Then delete the data , In order to solve the urgent problem .
This problem ultimately requires developers to deal with the corresponding logic .
When choosing a database , Also evaluate the characteristics of the database , In the past, the amount of data can meet , But I certainly didn't expect it to be close 2 The data volume in has exploded .
I also heard that the subsequent developers will carry out major rectification .
Then this matter will come to an end ...
summary
That's all 【 Brother goldfish 】 The trouble I encountered on my first day at work ( I haven't started to understand anything , Just ...). I hope it can be helpful to the little friends who see this article .
If this article 【 article 】 It helps you , I hope I can give 【 Brother goldfish 】 Point a praise , It's not easy to create , Compared with the official statement , I prefer to use 【 Easy to understand 】 To explain every point of knowledge with your writing , If there is a pair of 【 Operation and maintenance technology 】 Interested in , You are welcome to pay attention to ️️️ 【 Brother goldfish 】️️️, I will bring you great 【 Harvest and surprise 】!
边栏推荐
- 华为设备命令
- PG basics -- Logical Structure Management (transaction)
- [DIY]自己设计微软MakeCode街机,官方开源软硬件
- What key progress has been made in deep learning in 2021?
- Mécanisme de fonctionnement et de mise à jour de [Widget Wechat]
- 【mysql】触发器
- Swagger UI教程 API 文档神器
- User defined current limiting annotation
- C # use Oracle stored procedure to obtain result set instance
- Spiral square PTA
猜你喜欢
3D人脸重建:从基础知识到识别/重建方法!
数据湖(八):Iceberg数据存储格式
Aike AI frontier promotion (7.6)
(work record) March 11, 2020 to March 15, 2021
2022 fields Award Announced! The first Korean Xu Long'er was on the list, and four post-80s women won the prize. Ukrainian female mathematicians became the only two women to win the prize in history
ICML 2022 | flowformer: task generic linear complexity transformer
Interviewer: what is the internal implementation of ordered collection in redis?
OAI 5g nr+usrp b210 installation and construction
1500萬員工輕松管理,雲原生數據庫GaussDB讓HR辦公更高效
[MySQL] trigger
随机推荐
SAP Fiori应用索引大全工具和 SAP Fiori Tools 的使用介绍
硬件开发笔记(十): 硬件开发基本流程,制作一个USB转RS232的模块(九):创建CH340G/MAX232封装库sop-16并关联原理图元器件
Reviewer dis's whole research direction is not just reviewing my manuscript. What should I do?
Comprehensive evaluation and recommendation of the most comprehensive knowledge base management tools in the whole network: flowus, baklib, jiandaoyun, ones wiki, pingcode, seed, mebox, Yifang cloud,
Entity alignment two of knowledge map
[200 opencv routines] 220 Mosaic the image
[DSP] [Part 1] start DSP learning
##无yum源安装spug监控
Regular expression collection
Notes - detailed steps of training, testing and verification of yolo-v4-tiny source code
Reflection operation exercise
[diy] how to make a personalized radio
Aiko ai Frontier promotion (7.6)
监控界的最强王者,没有之一!
OAI 5g nr+usrp b210 installation and construction
How to turn a multi digit number into a digital list
Tips for web development: skillfully use ThreadLocal to avoid layer by layer value transmission
What are RDB and AOF
OneNote 深度评测:使用资源、插件、模版
Introduction to the use of SAP Fiori application index tool and SAP Fiori tools