当前位置：网站首页>20220211 failure - maximum amount of data supported by mongodb

20220211 failure - maximum amount of data supported by mongodb

2022-07-06 21:06:00 【It migrant worker brother goldfish】

Deal with various needs and faults in daily work , In particular, fault response is particularly important .
If the fault is not handled in time , The loss caused , In my previous sentence ： Every second is money .
Brother goldfish suddenly thought , Make a new column to record the faults encountered in daily work , I hope my sharing and summary can enlighten you .

Maybe it's fate , I was in my new unit yesterday （ Outsourcing to a large company ） First day of mobilization , Not familiar with various environments , They were taken up to follow up the faults encountered .
Fortunately, , Although I already have 11 I haven't worked in the front line of technology for months , But my technical skills have always been . Again reflected in the foundation , thinking , Experience is really important , Let me locate the problem and let relevant personnel communicate the subsequent solutions .

Fault description ：

At midday , I went to... With the delivery manager XX courtyard 5 In the information center of the building , What we are dealing with is developers （ ji ao） Development and company support engineer , I learned the relevant information from everyone ：

Application and use MongoDB 3.4, It used to be 3 Set up a cluster of machines + 1. Arbitration node , On Thursday, developers tried to take a node for single node test , Therefore, one node data has been deleted . But I took a snapshot on Thursday night （ Use XX cloud ）, So there is a snapshot backup .
The current situation of the day ：
1. One master one slave one arbitration node
2. Master and slave nodes cannot start at the same time
2. When starting the master node first , The node status is SECONDARY, Restart the slave node , The slave node cannot start , The process is shut down when an exception occurs directly .
Test results ：
Start the master node as a stand-alone , Can't read or write , When writing, the same exception occurs as the slave node mentioned above . Subsequently, delete some data of the master node , The master node is readable and writable .

Here are some related logs （ Because I just entered , So I didn't get any more information , Now you can only check the relevant logs on site ）

The primary node is always secondary The state of , So you can only read , Can't write , There are problems in the whole cluster .
Therefore, relevant uploading operations cannot be carried out in business （ Can't write ）, However, various queries can be made .

Make complaints about it ：

o((⊙﹏⊙))o Go to the scene , The discovery libraries are all running in windows server On , How painful it is to view logs ...
Field personnel are not familiar with their own application architecture , Ask whether the concurrency of applications is large ？ I don't know. （ If concurrency is not great , Cluster wool ）... So you need to know all kinds of things when you go to the scene , Time consuming ...
The above two points hinder the efficiency of troubleshooting ...‍*️‍*️‍*️

then , Why are all kinds of problems on the cluster still tangled on the scene , Why not take restoring business first as the guide ？‍*️‍*️‍*️
It also shows that there is no corresponding emergency plan . You can also think about how many things you need to improve in your future work .

Test and try ：

Looking at them struggling with the cluster , I'm new here , I wanted to look around , But thinking about , Since I follow you , If they can't handle , It will only definitely work overtime ... You know, it's far away , this XX The academy is in Science City , I added Guangyuan Road in Baiyun District ... So I couldn't help making a noise ：

“ First, it is oriented to restore business , Don't worry about the problems in the cluster , First restart the service with a single node , Resume business . According to your feedback , Using concurrency is not big , Single node is ok , Recover first .”

It is suggested that the developer find a single node with data , Let him modify different ports to start the database .
After starting , There was a phenomenon of speaking on Thursday night , Delete some data of the node , Nodes are readable and writable .

Let development test phenomena , After deleting some files , Indeed, you can upload files , But when uploading large files , You can't continue uploading , It will cause the database to collapse , Restart the library can only read , Writing again will collapse .
From the log , There is no effective error reporting information at all .（win It's painful to read the log ... Why don't linux,low Is it like this ？）

See this phenomenon , I'm by the side , I suggest you test it again .
At this time, I will record the approximate capacity , Look at the size of the file he deleted , Then compare the uploaded capacity .
During the test , Just upload the size , No more than the deleted capacity can be uploaded , exceed , It just broke down . And the disk capacity is enough .

See here , A sudden inspiration , Is there any limitation ？
This is the time , Look up your notes and find online articles immediately .

Fault analysis ：

Check your notes , There is a passage ：

Limit

MongoDB Generally applicable to 64 Bit operating system ,32 Bit systems can only address 4GB Memory , It means that the data set contains metadata and stores up to 4GB,Mongodb You can't store additional data , Strongly recommended 32 Bit system uses Mongodb You can test it yourself , The production environment is used everywhere 64 Bit operating system .
The maximum document size helps ensure that a single document does not use too much RAM Or occupy too much bandwidth in the transmission process . To store documents larger than the maximum size ,MongoDB Provides GridFS API.
MongoDB Support BSON The nesting level of documents should not exceed 100.

Replica set

stay Mongodb3.0 Replica set members in support of 50 individual , That is to say, the replica set is supported 50 Nodes , Replica set data support for each node 32T, It is recommended that the data of each instance of the replica set should not exceed 4T, Large amount of data, backup and recovery time will be long .

coming , See the point ,“ It is recommended that the data of each instance of the replica set should not exceed 4T”, Then immediately let the corresponding development check ,3915G, At this time, I finally know the problem , No wonder some data can be rewritten after being deleted , Because it has reached the upper limit of others , So I can't write any more .

Check the website （https://docs.mongodb.com/v3.4/reference/limits/） Specific description ：

Storage limits

Slicing will use the default chunk The size is 64M, If our slice key （ Chip key ）values The value is 512 byte , Sharding nodes support maximum 32768 individual , The maximum set size is 1TB.
The size of a chip key cannot exceed 512 byte .

If you don't make relevant settings , By default chunk The size is 64M Under the circumstances , Use 128 byte , The biggest is just 4TB The capacity of .
Don't think about it , Certainly not considering all kinds of , and , I can't use 64 Bytes are so small values value .

Therefore, it is inferred that the problem is caused by the fast reaching capacity limit of a single instance .

Use Alibaba cloud's MongoDB Specifications can also be seen ：

Fault handling ：

Do this , It's what I can do on my first day .
Then let the developer communicate with the operation and maintenance manager of the whole group to solve the subsequent problem .（ I'm new here , The environment is completely unfamiliar ）

The person in charge of the developer , Operations Manager , user , After discussion and communication , Back up the data first （ Yes 2 The data of nodes is , And there are snapshots , but zheng wu Snapshots of clouds once had pits , In my heart , There is still a physical backup conservative point , But the backup must be too long because there are too many small files , So for the time being ）, Then count 2020 Years ago （ After statistics , Only 200 many G）, Then delete the data , In order to solve the urgent problem .

This problem ultimately requires developers to deal with the corresponding logic .
When choosing a database , Also evaluate the characteristics of the database , In the past, the amount of data can meet , But I certainly didn't expect it to be close 2 The data volume in has exploded .

I also heard that the subsequent developers will carry out major rectification .
Then this matter will come to an end ...

summary

That's all 【 Brother goldfish 】 The trouble I encountered on my first day at work （ I haven't started to understand anything , Just ...）. I hope it can be helpful to the little friends who see this article .

If this article 【 article 】 It helps you , I hope I can give 【 Brother goldfish 】 Point a praise , It's not easy to create , Compared with the official statement , I prefer to use 【 Easy to understand 】 To explain every point of knowledge with your writing , If there is a pair of 【 Operation and maintenance technology 】 Interested in , You are welcome to pay attention to ️️️ 【 Brother goldfish 】️️️, I will bring you great 【 Harvest and surprise 】！