当前位置:网站首页>I want to say more about this communication failure
I want to say more about this communication failure
2022-07-06 17:58:00 【Fresh jujube class】
This few days , Everyone is paying attention to Japanese telecom operators KDDI Large scale communication failure .
This fault has a great impact , Involving the whole territory of Japan , common 3915 Million users . and , The fault lasts for a long time , It took almost two days , It's basically recovered .
The specific cause of the failure , I see many official account have been written , I will not repeat the analysis .
Today's article , I want to enlarge the topic , Have an in-depth chat with you —— all 2022 Years. , Why are there so many failures in our communication network , as well as , Do we have the ultimate solution .
█ Communication failure : A game that lasts for a hundred years
Fault is the natural attribute of communication network . Just like people get sick , Since the birth of communication network , It is accompanied by failure . Or say , We are in the process of troubleshooting , To create a communication network .
After solving countless troubles, father bell , Just invented the telephone
For more than 100 years , Countless correspondents , They are fighting and playing games with the fault unremittingly . They have worked hard to develop various technologies , Various means have been used , Fight against communication failure .
On a macro level , The effect of the struggle is remarkable . With the continuous accumulation of experience , With the continuous progress of Technology , The probability of communication network failure is declining .
Young readers may not know ,20 Many years ago , The landline cannot be dialed ( There are not many families with telephones ), It's the same as cutting off water and power , It's a common phenomenon .10 Many years ago , The mobile phone cannot be dialed , Don't go online , It is also a common phenomenon .
In the past ten years , These phenomena are becoming increasingly rare . Once in a while , Instead, people will feel very strange . The Internet is down , The first reaction of many people is that their mobile phone is broken , Or I owe you , Restart or recharge quickly . isn't it? ?
We are now in an information society , Communication network is the same as hydropower , Is an important infrastructure . Our work and life , And the operation of all walks of life , Can not be separated from the communication network .
Under such premise , As a state-owned enterprise , As the construction and maintenance of the network , We will always put the security and stability of the network first .
For network stability , The Ministry of industry and information technology has set strict assessment indicators for operators . If there is a network failure in a province or city , The top leaders must bear the responsibility , Career worries .
Pressure from operator leaders , Will be passed on to employees , It will also be passed on to equipment manufacturers and outsourcers .
Now the market competition is so fierce , Once something happens , Or huge compensation , Or lose the market share of this province , This is an unbearable loss for equipment manufacturers and outsourcers .
So , The entire communication industry is concerned about the security and stability of the communication network , Attention must be enough . The key , It's still a question of ability and execution .
█ The weakness of communication network , Where on earth ?
First , I want to talk about the definition of security level of communication network .
Depending on the scene , The security of communication network is divided into different levels . From low to high , They are family level 、 Enterprise class 、 Telecommunication level .
Security level of communication system
Like the router we use at home , All belong to family level . The safety and reliability of this equipment is very low , Bad is bad , It is easy to cause network interruption .
Enterprise level , It is the network equipment used in the unit . According to the network size and the number of users , Enterprise level equipment has high safety and reliability , It is not easy to interrupt the service .
Requirements for carrier grade , Even higher . Like moving 、 telecom 、 Unicom , Their network , To provide services for hundreds of millions of users , It is absolutely not allowed to break down easily . Generally speaking , Carrier level reliability , To achieve 5 individual 9 The above criteria .
Today, Xiaozao Jun talked about communication network , It refers to the public communication network of operators facing the public , Including cellular mobile communication network , It also includes fixed line broadband network . They all belong to the carrier class .
The architecture of cellular mobile communication network and fixed broadband network is similar , The main difference is that Access network part .
Cellular mobile communication network is a wireless access network , The access device is a base station . The fixed broadband network is a wired access network , The access device is PON equipment ( Passive optical network equipment , Including the light cat ).
Let's take the cellular mobile communication network as an example , Analyze .
Public communication network , It serves hundreds of millions of user groups , therefore , A pyramid level architecture is usually used , The core network is the core , Transmission network ( Bearer network ) As the backbone , The access network is limb .
You can see it at a glance , This architecture , The biggest weakness , It lies in the core network and transmission network ( Especially the backbone network ).
The core network is the management center , It is the heart and brain of the network , Once you hang up , Just hang up the whole network . therefore , Core network engineer ( For example, when I was ) It is the post with the greatest risk and pressure .
Core network machine room
Transmission network ( Bearer network ) Well , It is the blood vessel and nerve of communication network . It's easy to say at the end , Broken at most affects a small piece , however , If the cardiovascular and cerebrovascular system breaks down , What do I do ? That is also complete paralysis .
Optical transmission equipment
This time, KDDI Failure occurred , also 2021 year 10 month DoCoMo Failure occurred , as well as 2020 The breakdown of the four major operators in the UK ,2020 In the U.S. CenturyLink Failure of , Are related to the core router . To put it bluntly , There is something wrong with cardio cerebral vessels , The whole person ( The Internet ) He collapsed .
by comparison , The probability of major problems in the access network is very low . Individual base stations “ Drop the station ”, It affects hundreds of thousands of people at most , no room to swing a cat in , Complaints are controllable .
Base station equipment
If there is a large-scale failure in the access network , It is most likely the software version of the equipment manufacturer , Or hardware batch problem . The probability of this situation is extremely low .
█ In order to prevent failure , What did the correspondents do ?
that , In order to ensure the safe and smooth operation of the communication network , Prevent failure , What methods have our correspondents adopted ?
First , It is the perfection of the top-level architecture design .
The architecture of the network , It is the foundation of network security . A good architecture , Consider both performance and capacity , Also consider the cost , Also consider safety and redundancy .
Please remember one thing about big housework here : Communication equipment as a complex product , No matter how you design or stack , It has the possibility of failure , Just the probability 、 The question of time .
For possible faults , Instead of strictly guarding against , It is better to focus on the failure , What should I do .
therefore , Introduce backup mechanism , It is the most effective means to deal with faults .
Backup mechanism
Everyone has learned “ Probability and Statistics ”,1 If the failure probability of a device is 1%, that , Probability of simultaneous failure of two devices , Namely 1%×1%=0.01%. That's right. ?
To ensure absolute safety , Network architecture design , Will be used POOL( pool ) Networking mode , Here's the picture :
Several devices work together to form a pool (POOL), Each is responsible for the business , If one breaks , Others immediately top , Ensure that the business is not affected .
Core equipment , There are usually two or more , In different areas of the provincial capital , Physically, it's far away .
Besides , When doing network architecture design , Important device network elements are usually placed in the core computer room with a higher security level .
Core machine room
for example , The most important thing in mobile communication network 、 Responsible for storing and managing user data HSS( It's the old HLR, There is the mobile phone number of each user 、 Authentication data 、 Business information, etc ), It is stored in the core computer room of the provincial capital . meanwhile , Maintenance personnel will conduct physical remote isolation backup of data on a regular basis .
In recent years , Because of geological disasters , Plus factors such as war or terrorist attack , Operators even began to do Different provinces Backup of .
for example , Last year's Zhengzhou flood , At that time, the core computer room was flooded ,HLR Withdrawal , It is urgent to use the HLR, Realize the temporary recovery of business .
Different disaster recovery levels
The second way , The underlying active / standby mechanism .
Just now we are talking about the redundancy mechanism of top-level design . Specific to the machine room 、 frame 、 Veneer 、 Cable , There are also active and standby designs , It can be called the underlying active / standby mechanism .
If you have been to the computer room , You'll find out , The frame on the cabinet , There are all kinds of boards inserted . And these boards , Basically, they all appear in pairs .
A manufacturer 3G Front appearance of the equipment
in other words , A certain type of board , Usually there are two pieces .
The same is true of network cable and optical fiber , You can hardly see a single cable , It's all in pairs .
A manufacturer 4G Front appearance of the equipment
The reason for this , Just to back up each other . If a board breaks , Then another board can continue to work , Ensure that the business is not affected . meanwhile , The system will alarm , Remind the staff to replace as soon as possible .
Power supply is the same , All cabinet equipment in telecommunication machine room , There must be at least two power inputs .
Multiple power input ( One red and one blue is the way )
Except that the city electricity thought , Important machine rooms will also be equipped with batteries 、UPS、 Generators and other emergency power supply equipment .
Battery pack in the machine room
Third , Perfect management system and regulations .
Technology is never the only factor that affects network security and stability . The biggest threat to the communication network , It's actually people , Not technology .
For this point , Jujube Jun believes that every correspondent will have the same feeling .
In terms of management process and system , In terms of engineering technical specifications , We have learned countless bloody lessons .
Why should the upgrade plan be reviewed repeatedly ? Why should engineering specifications be so strict ? Why build a spare parts warehouse ? Why is the cutover step necessary double-check, even to the extent that triple-check? Why should we arrange to be on duty after major operations ? Why should the Internet be closed on important holidays ?……
These are the experiences summarized by predecessors .
For network failure , Always be in awe
In addition to the internal management system and process standards , Aiming at the deliberate destruction of communication network that often happens now , The country has also established increasingly strict laws and regulations , Punishment .
Like illegal construction, cutting off optical fibers 、 Deliberately destroy the base station 、 Cut the optical fiber , Will be punished by law .
The malicious cut feeder of the base station
█ The deep-seated reasons behind the communication failure
Have a reasonable network architecture design , There is a complete active and standby mechanism , There are also perfect systems and norms , Why do so many faults occur ?
Next , Let me talk about some deep-seated reasons .
First and foremost , It is probably the most agreed point , That's it The internal environment of the communication industry .
Over the years , Malicious competition 、 Low price bidding prevails , Equipment suppliers and subcontractors should rush for orders , And maintain profits , Can only desperately lower costs , For example, product design cost 、 Material cost 、 Cost of construction materials . More importantly , Personnel salary cost .
Costs continue to compress , It is bound to affect product reliability and engineering quality . Low wages , Leading to the loss of a large number of experienced talents . Subcontractor to complete , Only fresh students can be recruited , Simple training ( Not even training ) after , Send to the scene to work .
These personnel lack the necessary training and practice , The quality level and technical ability are insufficient , Become a big risk point .
Some of them have very low quality , Oppressed hard , Directly delete the database and run , It's not impossible .
years ago , In order to ensure that front-line employees are not deducted , Some manufacturers even sign contracts with subcontractors , Restrict the bottom line of outsourcing employees .
Besides low price competition , Another important factor affecting the security of network operation , yes Increasing technical complexity .
The more advanced technology , The more complex , The lower the reliability . As technology evolves , The network scale of operators is becoming larger and larger , Networking is also becoming more and more complex , The probability of problems greatly increases .
The tidal effect of communication network is very obvious . Sometimes there is a difference of ten or even a hundred times between free time and busy time . If there is an accident ( Disasters, etc ), Traffic surged , It is more likely to be a thousand times the difference .
It is impossible for operators to do a thousand times redundant design . therefore , If there is no reasonable bypass design or threshold design , The probability of network congestion is extremely high .( Several major failures in recent years , There are factors of signaling traffic congestion .)
At present, the complex networking of operators , Few of them can fully understand . Time is long. , Once personnel flow , It's even stranger .
Communication network is originally a metaphysics , There are many strange problems , Who dares to say that he can calculate every possibility ?
The third potential network security risk , It is also the risk that Xiaozao Jun is most worried about , That's it External cyber attacks . For example, hackers 、 Viruses and system vulnerabilities .
Now , Communication equipment is basically IP turn 、 The cloud has melted , The network is more and more open , Some are directly deployed on the public cloud , Physical isolation from the outside world is getting weaker , More vulnerable than before .
Now the attacker , The level is also much higher than before , Means are also more diversified , The threat to the network is great .
Of course , Operators and equipment manufacturers are preventing network attacks , There's a lot of investment .
Now? , All manufacturers are concerned “ Safety reinforcement ” The concept . seeing the name of a thing one thinks of its function , Security reinforcement is to block system vulnerabilities , Make the system more stable . Operators will use third-party tools , Or hire a third-party manufacturer , Conduct security scanning of existing network equipment , Looking for security holes , Then ask the equipment manufacturer to rectify and block .
All for safety
such “ Go all the way , Magic height ” The game of , It will last for a long time .
however , Xiaozaojun thinks that , The current defensive side , In terms of personnel safety awareness 、 In terms of technical ability , There are big problems . follow-up , The security incidents we encountered , More and more .
I hope relevant units and departments don't talk about safety , Really spend some time to improve the quality of your staff , Strengthen training . Otherwise something really happened , It's too late to remedy .
█ Last words
Japan KDDI This is not the first time , Certainly not the last time . Communication network failure , It's like beating a drum to pass flowers , No one knows whether he is next .
Now? , Manufacturers have proposed to introduce AI, Let AI take over the network , So as to reduce the failure rate of the network . Some manufacturers , On the basis of network cloud , Do grayscale upgrading ( That is, partial upgrade ), It can also significantly reduce network risk . These are all good trends .
I think , On the road of fighting against the failure of communication network , We have a long way to go . What a long long road! , Correspondents ask for help from top to bottom .
Okay , That's all for today's article . Thank you for your patience in reading , See you next time !
thank you !
边栏推荐
- Debug xv6
- How to use scroll bars to dynamically adjust parameters in opencv
- Single responsibility principle
- [ASM] introduction and use of bytecode operation classwriter class
- Flet教程之 13 ListView最常用的滚动控件 基础入门(教程含源码)
- MySQL stored procedure
- 重磅!蚂蚁开源可信隐私计算框架“隐语”,主流技术灵活组装、开发者友好分层设计...
- Principle and usage of extern
- ASEMI整流桥DB207的导通时间与参数选择
- Run xv6 system
猜你喜欢
Easy introduction to SQL (1): addition, deletion, modification and simple query
The solution that flutterweb browser cannot be rolled back after refreshing
Pourquoi Li shufu a - t - il construit son téléphone portable?
C # nanoframework lighting and key esp32
ASEMI整流桥DB207的导通时间与参数选择
RepPoints:可形变卷积的进阶
Getting started with pytest ----- test case rules
开源与安全的“冰与火之歌”
Unity粒子特效系列-闪星星的宝箱
Summary of Android interview questions of Dachang in 2022 (I) (including answers)
随机推荐
中移动、蚂蚁、顺丰、兴盛优选技术专家,带你了解架构稳定性保障
Kill -9 system call used by PID to kill process
Codeforces Round #803 (Div. 2)
Nodejs 开发者路线图 2022 零基础学习指南
STM32按键状态机2——状态简化与增加长按功能
Solution qui ne peut pas être retournée après la mise à jour du navigateur Web flutter
The art of Engineering (3): do not rely on each other between functions of code robustness
Solid principle
Essai de pénétration du Code à distance - essai du module b
Stealing others' vulnerability reports and selling them into sidelines, and the vulnerability reward platform gives rise to "insiders"
78 year old professor Huake has been chasing dreams for 40 years, and the domestic database reaches dreams to sprint for IPO
Debug and run the first xv6 program
2022年大厂Android面试题汇总(一)(含答案)
面试突击63:MySQL 中如何去重?
基于STM32+华为云IOT设计的智能路灯
【Android】Kotlin代码编写规范化文档
OpenEuler 会长久吗
OliveTin能在网页上安全运行shell命令(上)
adb常用命令
[elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias