当前位置:网站首页>I want to say more about this communication failure
I want to say more about this communication failure
2022-07-06 17:58:00 【Fresh jujube class】
This few days , Everyone is paying attention to Japanese telecom operators KDDI Large scale communication failure .
This fault has a great impact , Involving the whole territory of Japan , common 3915 Million users . and , The fault lasts for a long time , It took almost two days , It's basically recovered .
The specific cause of the failure , I see many official account have been written , I will not repeat the analysis .
Today's article , I want to enlarge the topic , Have an in-depth chat with you —— all 2022 Years. , Why are there so many failures in our communication network , as well as , Do we have the ultimate solution .

█ Communication failure : A game that lasts for a hundred years
Fault is the natural attribute of communication network . Just like people get sick , Since the birth of communication network , It is accompanied by failure . Or say , We are in the process of troubleshooting , To create a communication network .

After solving countless troubles, father bell , Just invented the telephone
For more than 100 years , Countless correspondents , They are fighting and playing games with the fault unremittingly . They have worked hard to develop various technologies , Various means have been used , Fight against communication failure .
On a macro level , The effect of the struggle is remarkable . With the continuous accumulation of experience , With the continuous progress of Technology , The probability of communication network failure is declining .
Young readers may not know ,20 Many years ago , The landline cannot be dialed ( There are not many families with telephones ), It's the same as cutting off water and power , It's a common phenomenon .10 Many years ago , The mobile phone cannot be dialed , Don't go online , It is also a common phenomenon .

In the past ten years , These phenomena are becoming increasingly rare . Once in a while , Instead, people will feel very strange . The Internet is down , The first reaction of many people is that their mobile phone is broken , Or I owe you , Restart or recharge quickly . isn't it? ?
We are now in an information society , Communication network is the same as hydropower , Is an important infrastructure . Our work and life , And the operation of all walks of life , Can not be separated from the communication network .
Under such premise , As a state-owned enterprise , As the construction and maintenance of the network , We will always put the security and stability of the network first .

For network stability , The Ministry of industry and information technology has set strict assessment indicators for operators . If there is a network failure in a province or city , The top leaders must bear the responsibility , Career worries .
Pressure from operator leaders , Will be passed on to employees , It will also be passed on to equipment manufacturers and outsourcers .
Now the market competition is so fierce , Once something happens , Or huge compensation , Or lose the market share of this province , This is an unbearable loss for equipment manufacturers and outsourcers .
So , The entire communication industry is concerned about the security and stability of the communication network , Attention must be enough . The key , It's still a question of ability and execution .
█ The weakness of communication network , Where on earth ?
First , I want to talk about the definition of security level of communication network .
Depending on the scene , The security of communication network is divided into different levels . From low to high , They are family level 、 Enterprise class 、 Telecommunication level .

Security level of communication system
Like the router we use at home , All belong to family level . The safety and reliability of this equipment is very low , Bad is bad , It is easy to cause network interruption .
Enterprise level , It is the network equipment used in the unit . According to the network size and the number of users , Enterprise level equipment has high safety and reliability , It is not easy to interrupt the service .
Requirements for carrier grade , Even higher . Like moving 、 telecom 、 Unicom , Their network , To provide services for hundreds of millions of users , It is absolutely not allowed to break down easily . Generally speaking , Carrier level reliability , To achieve 5 individual 9 The above criteria .

Today, Xiaozao Jun talked about communication network , It refers to the public communication network of operators facing the public , Including cellular mobile communication network , It also includes fixed line broadband network . They all belong to the carrier class .
The architecture of cellular mobile communication network and fixed broadband network is similar , The main difference is that Access network part .

Cellular mobile communication network is a wireless access network , The access device is a base station . The fixed broadband network is a wired access network , The access device is PON equipment ( Passive optical network equipment , Including the light cat ).
Let's take the cellular mobile communication network as an example , Analyze .
Public communication network , It serves hundreds of millions of user groups , therefore , A pyramid level architecture is usually used , The core network is the core , Transmission network ( Bearer network ) As the backbone , The access network is limb .

You can see it at a glance , This architecture , The biggest weakness , It lies in the core network and transmission network ( Especially the backbone network ).
The core network is the management center , It is the heart and brain of the network , Once you hang up , Just hang up the whole network . therefore , Core network engineer ( For example, when I was ) It is the post with the greatest risk and pressure .

Core network machine room
Transmission network ( Bearer network ) Well , It is the blood vessel and nerve of communication network . It's easy to say at the end , Broken at most affects a small piece , however , If the cardiovascular and cerebrovascular system breaks down , What do I do ? That is also complete paralysis .

Optical transmission equipment
This time, KDDI Failure occurred , also 2021 year 10 month DoCoMo Failure occurred , as well as 2020 The breakdown of the four major operators in the UK ,2020 In the U.S. CenturyLink Failure of , Are related to the core router . To put it bluntly , There is something wrong with cardio cerebral vessels , The whole person ( The Internet ) He collapsed .
by comparison , The probability of major problems in the access network is very low . Individual base stations “ Drop the station ”, It affects hundreds of thousands of people at most , no room to swing a cat in , Complaints are controllable .

Base station equipment
If there is a large-scale failure in the access network , It is most likely the software version of the equipment manufacturer , Or hardware batch problem . The probability of this situation is extremely low .
█ In order to prevent failure , What did the correspondents do ?
that , In order to ensure the safe and smooth operation of the communication network , Prevent failure , What methods have our correspondents adopted ?
First , It is the perfection of the top-level architecture design .
The architecture of the network , It is the foundation of network security . A good architecture , Consider both performance and capacity , Also consider the cost , Also consider safety and redundancy .
Please remember one thing about big housework here : Communication equipment as a complex product , No matter how you design or stack , It has the possibility of failure , Just the probability 、 The question of time .
For possible faults , Instead of strictly guarding against , It is better to focus on the failure , What should I do .
therefore , Introduce backup mechanism , It is the most effective means to deal with faults .

Backup mechanism
Everyone has learned “ Probability and Statistics ”,1 If the failure probability of a device is 1%, that , Probability of simultaneous failure of two devices , Namely 1%×1%=0.01%. That's right. ?
To ensure absolute safety , Network architecture design , Will be used POOL( pool ) Networking mode , Here's the picture :

Several devices work together to form a pool (POOL), Each is responsible for the business , If one breaks , Others immediately top , Ensure that the business is not affected .
Core equipment , There are usually two or more , In different areas of the provincial capital , Physically, it's far away .
Besides , When doing network architecture design , Important device network elements are usually placed in the core computer room with a higher security level .

Core machine room
for example , The most important thing in mobile communication network 、 Responsible for storing and managing user data HSS( It's the old HLR, There is the mobile phone number of each user 、 Authentication data 、 Business information, etc ), It is stored in the core computer room of the provincial capital . meanwhile , Maintenance personnel will conduct physical remote isolation backup of data on a regular basis .
In recent years , Because of geological disasters , Plus factors such as war or terrorist attack , Operators even began to do Different provinces Backup of .
for example , Last year's Zhengzhou flood , At that time, the core computer room was flooded ,HLR Withdrawal , It is urgent to use the HLR, Realize the temporary recovery of business .

Different disaster recovery levels
The second way , The underlying active / standby mechanism .
Just now we are talking about the redundancy mechanism of top-level design . Specific to the machine room 、 frame 、 Veneer 、 Cable , There are also active and standby designs , It can be called the underlying active / standby mechanism .
If you have been to the computer room , You'll find out , The frame on the cabinet , There are all kinds of boards inserted . And these boards , Basically, they all appear in pairs .

A manufacturer 3G Front appearance of the equipment
in other words , A certain type of board , Usually there are two pieces .
The same is true of network cable and optical fiber , You can hardly see a single cable , It's all in pairs .

A manufacturer 4G Front appearance of the equipment
The reason for this , Just to back up each other . If a board breaks , Then another board can continue to work , Ensure that the business is not affected . meanwhile , The system will alarm , Remind the staff to replace as soon as possible .
Power supply is the same , All cabinet equipment in telecommunication machine room , There must be at least two power inputs .

Multiple power input ( One red and one blue is the way )
Except that the city electricity thought , Important machine rooms will also be equipped with batteries 、UPS、 Generators and other emergency power supply equipment .

Battery pack in the machine room
Third , Perfect management system and regulations .
Technology is never the only factor that affects network security and stability . The biggest threat to the communication network , It's actually people , Not technology .
For this point , Jujube Jun believes that every correspondent will have the same feeling .
In terms of management process and system , In terms of engineering technical specifications , We have learned countless bloody lessons .
Why should the upgrade plan be reviewed repeatedly ? Why should engineering specifications be so strict ? Why build a spare parts warehouse ? Why is the cutover step necessary double-check, even to the extent that triple-check? Why should we arrange to be on duty after major operations ? Why should the Internet be closed on important holidays ?……
These are the experiences summarized by predecessors .

For network failure , Always be in awe
In addition to the internal management system and process standards , Aiming at the deliberate destruction of communication network that often happens now , The country has also established increasingly strict laws and regulations , Punishment .
Like illegal construction, cutting off optical fibers 、 Deliberately destroy the base station 、 Cut the optical fiber , Will be punished by law .

The malicious cut feeder of the base station
█ The deep-seated reasons behind the communication failure
Have a reasonable network architecture design , There is a complete active and standby mechanism , There are also perfect systems and norms , Why do so many faults occur ?
Next , Let me talk about some deep-seated reasons .
First and foremost , It is probably the most agreed point , That's it The internal environment of the communication industry .
Over the years , Malicious competition 、 Low price bidding prevails , Equipment suppliers and subcontractors should rush for orders , And maintain profits , Can only desperately lower costs , For example, product design cost 、 Material cost 、 Cost of construction materials . More importantly , Personnel salary cost .
Costs continue to compress , It is bound to affect product reliability and engineering quality . Low wages , Leading to the loss of a large number of experienced talents . Subcontractor to complete , Only fresh students can be recruited , Simple training ( Not even training ) after , Send to the scene to work .
These personnel lack the necessary training and practice , The quality level and technical ability are insufficient , Become a big risk point .
Some of them have very low quality , Oppressed hard , Directly delete the database and run , It's not impossible .
years ago , In order to ensure that front-line employees are not deducted , Some manufacturers even sign contracts with subcontractors , Restrict the bottom line of outsourcing employees .
Besides low price competition , Another important factor affecting the security of network operation , yes Increasing technical complexity .
The more advanced technology , The more complex , The lower the reliability . As technology evolves , The network scale of operators is becoming larger and larger , Networking is also becoming more and more complex , The probability of problems greatly increases .
The tidal effect of communication network is very obvious . Sometimes there is a difference of ten or even a hundred times between free time and busy time . If there is an accident ( Disasters, etc ), Traffic surged , It is more likely to be a thousand times the difference .
It is impossible for operators to do a thousand times redundant design . therefore , If there is no reasonable bypass design or threshold design , The probability of network congestion is extremely high .( Several major failures in recent years , There are factors of signaling traffic congestion .)
At present, the complex networking of operators , Few of them can fully understand . Time is long. , Once personnel flow , It's even stranger .
Communication network is originally a metaphysics , There are many strange problems , Who dares to say that he can calculate every possibility ?
The third potential network security risk , It is also the risk that Xiaozao Jun is most worried about , That's it External cyber attacks . For example, hackers 、 Viruses and system vulnerabilities .
Now , Communication equipment is basically IP turn 、 The cloud has melted , The network is more and more open , Some are directly deployed on the public cloud , Physical isolation from the outside world is getting weaker , More vulnerable than before .
Now the attacker , The level is also much higher than before , Means are also more diversified , The threat to the network is great .
Of course , Operators and equipment manufacturers are preventing network attacks , There's a lot of investment .
Now? , All manufacturers are concerned “ Safety reinforcement ” The concept . seeing the name of a thing one thinks of its function , Security reinforcement is to block system vulnerabilities , Make the system more stable . Operators will use third-party tools , Or hire a third-party manufacturer , Conduct security scanning of existing network equipment , Looking for security holes , Then ask the equipment manufacturer to rectify and block .

All for safety
such “ Go all the way , Magic height ” The game of , It will last for a long time .
however , Xiaozaojun thinks that , The current defensive side , In terms of personnel safety awareness 、 In terms of technical ability , There are big problems . follow-up , The security incidents we encountered , More and more .
I hope relevant units and departments don't talk about safety , Really spend some time to improve the quality of your staff , Strengthen training . Otherwise something really happened , It's too late to remedy .
█ Last words
Japan KDDI This is not the first time , Certainly not the last time . Communication network failure , It's like beating a drum to pass flowers , No one knows whether he is next .
Now? , Manufacturers have proposed to introduce AI, Let AI take over the network , So as to reduce the failure rate of the network . Some manufacturers , On the basis of network cloud , Do grayscale upgrading ( That is, partial upgrade ), It can also significantly reduce network risk . These are all good trends .
I think , On the road of fighting against the failure of communication network , We have a long way to go . What a long long road! , Correspondents ask for help from top to bottom .
Okay , That's all for today's article . Thank you for your patience in reading , See you next time !
thank you !

边栏推荐
- EasyCVR授权到期页面无法登录,该如何解决?
- Guidelines for preparing for the 2022 soft exam information security engineer exam
- Summary of Android interview questions of Dachang in 2022 (I) (including answers)
- Smart street lamp based on stm32+ Huawei cloud IOT design
- Codeforces Round #803 (Div. 2)
- 【Android】Kotlin代码编写规范化文档
- EasyCVR电子地图中设备播放器loading样式的居中对齐优化
- SAP UI5 框架的 manifest.json
- FMT开源自驾仪 | FMT中间件:一种高实时的分布式日志模块Mlog
- Spark accumulator and broadcast variables and beginners of sparksql
猜你喜欢

kivy教程之在 Kivy 中支持中文以构建跨平台应用程序(教程含源码)

QT中Model-View-Delegate委托代理机制用法介绍

【MySQL入门】第一话 · 初入“数据库”大陆

Scratch epidemic isolation and nucleic acid detection Analog Electronics Society graphical programming scratch grade examination level 3 true questions and answers analysis June 2022

node の SQLite

1700C - Helping the Nature

历史上的今天:Google 之母出生;同一天诞生的两位图灵奖先驱

Is it meaningful for 8-bit MCU to run RTOS?
![[elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias](/img/03/ece7f7b28cd9caea4240635548c77f.jpg)
[elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias

基于STM32+华为云IOT设计的智能路灯
随机推荐
It doesn't make sense without a distributed gateway
面试突击63:MySQL 中如何去重?
【MySQL入门】第一话 · 初入“数据库”大陆
The art of Engineering (1): try to package things that do not need to be exposed
关于这次通信故障,我想多说几句…
OliveTin能在网页上安全运行shell命令(上)
SQL statement optimization, order by desc speed optimization
学 SQL 必须了解的 10 个高级概念
How to output special symbols in shell
开源与安全的“冰与火之歌”
Pytorch extract middle layer features?
There is a gap in traditional home decoration. VR panoramic home decoration allows you to experience the completion effect of your new house
1700C - Helping the Nature
Kali2021 installation and basic configuration
Unity粒子特效系列-闪星星的宝箱
The easycvr authorization expiration page cannot be logged in. How to solve it?
Growth of operation and maintenance Xiaobai - week 7
VR全景婚礼,帮助新人记录浪漫且美好的场景
李書福為何要親自掛帥造手機?
基于STM32+华为云IOT设计的智能路灯