当前位置:网站首页>I want to say more about this communication failure
I want to say more about this communication failure
2022-07-06 17:58:00 【Fresh jujube class】
This few days , Everyone is paying attention to Japanese telecom operators KDDI Large scale communication failure .
This fault has a great impact , Involving the whole territory of Japan , common 3915 Million users . and , The fault lasts for a long time , It took almost two days , It's basically recovered .
The specific cause of the failure , I see many official account have been written , I will not repeat the analysis .
Today's article , I want to enlarge the topic , Have an in-depth chat with you —— all 2022 Years. , Why are there so many failures in our communication network , as well as , Do we have the ultimate solution .
█ Communication failure : A game that lasts for a hundred years
Fault is the natural attribute of communication network . Just like people get sick , Since the birth of communication network , It is accompanied by failure . Or say , We are in the process of troubleshooting , To create a communication network .
After solving countless troubles, father bell , Just invented the telephone
For more than 100 years , Countless correspondents , They are fighting and playing games with the fault unremittingly . They have worked hard to develop various technologies , Various means have been used , Fight against communication failure .
On a macro level , The effect of the struggle is remarkable . With the continuous accumulation of experience , With the continuous progress of Technology , The probability of communication network failure is declining .
Young readers may not know ,20 Many years ago , The landline cannot be dialed ( There are not many families with telephones ), It's the same as cutting off water and power , It's a common phenomenon .10 Many years ago , The mobile phone cannot be dialed , Don't go online , It is also a common phenomenon .
In the past ten years , These phenomena are becoming increasingly rare . Once in a while , Instead, people will feel very strange . The Internet is down , The first reaction of many people is that their mobile phone is broken , Or I owe you , Restart or recharge quickly . isn't it? ?
We are now in an information society , Communication network is the same as hydropower , Is an important infrastructure . Our work and life , And the operation of all walks of life , Can not be separated from the communication network .
Under such premise , As a state-owned enterprise , As the construction and maintenance of the network , We will always put the security and stability of the network first .
For network stability , The Ministry of industry and information technology has set strict assessment indicators for operators . If there is a network failure in a province or city , The top leaders must bear the responsibility , Career worries .
Pressure from operator leaders , Will be passed on to employees , It will also be passed on to equipment manufacturers and outsourcers .
Now the market competition is so fierce , Once something happens , Or huge compensation , Or lose the market share of this province , This is an unbearable loss for equipment manufacturers and outsourcers .
So , The entire communication industry is concerned about the security and stability of the communication network , Attention must be enough . The key , It's still a question of ability and execution .
█ The weakness of communication network , Where on earth ?
First , I want to talk about the definition of security level of communication network .
Depending on the scene , The security of communication network is divided into different levels . From low to high , They are family level 、 Enterprise class 、 Telecommunication level .
Security level of communication system
Like the router we use at home , All belong to family level . The safety and reliability of this equipment is very low , Bad is bad , It is easy to cause network interruption .
Enterprise level , It is the network equipment used in the unit . According to the network size and the number of users , Enterprise level equipment has high safety and reliability , It is not easy to interrupt the service .
Requirements for carrier grade , Even higher . Like moving 、 telecom 、 Unicom , Their network , To provide services for hundreds of millions of users , It is absolutely not allowed to break down easily . Generally speaking , Carrier level reliability , To achieve 5 individual 9 The above criteria .
Today, Xiaozao Jun talked about communication network , It refers to the public communication network of operators facing the public , Including cellular mobile communication network , It also includes fixed line broadband network . They all belong to the carrier class .
The architecture of cellular mobile communication network and fixed broadband network is similar , The main difference is that Access network part .
Cellular mobile communication network is a wireless access network , The access device is a base station . The fixed broadband network is a wired access network , The access device is PON equipment ( Passive optical network equipment , Including the light cat ).
Let's take the cellular mobile communication network as an example , Analyze .
Public communication network , It serves hundreds of millions of user groups , therefore , A pyramid level architecture is usually used , The core network is the core , Transmission network ( Bearer network ) As the backbone , The access network is limb .
You can see it at a glance , This architecture , The biggest weakness , It lies in the core network and transmission network ( Especially the backbone network ).
The core network is the management center , It is the heart and brain of the network , Once you hang up , Just hang up the whole network . therefore , Core network engineer ( For example, when I was ) It is the post with the greatest risk and pressure .
Core network machine room
Transmission network ( Bearer network ) Well , It is the blood vessel and nerve of communication network . It's easy to say at the end , Broken at most affects a small piece , however , If the cardiovascular and cerebrovascular system breaks down , What do I do ? That is also complete paralysis .
Optical transmission equipment
This time, KDDI Failure occurred , also 2021 year 10 month DoCoMo Failure occurred , as well as 2020 The breakdown of the four major operators in the UK ,2020 In the U.S. CenturyLink Failure of , Are related to the core router . To put it bluntly , There is something wrong with cardio cerebral vessels , The whole person ( The Internet ) He collapsed .
by comparison , The probability of major problems in the access network is very low . Individual base stations “ Drop the station ”, It affects hundreds of thousands of people at most , no room to swing a cat in , Complaints are controllable .
Base station equipment
If there is a large-scale failure in the access network , It is most likely the software version of the equipment manufacturer , Or hardware batch problem . The probability of this situation is extremely low .
█ In order to prevent failure , What did the correspondents do ?
that , In order to ensure the safe and smooth operation of the communication network , Prevent failure , What methods have our correspondents adopted ?
First , It is the perfection of the top-level architecture design .
The architecture of the network , It is the foundation of network security . A good architecture , Consider both performance and capacity , Also consider the cost , Also consider safety and redundancy .
Please remember one thing about big housework here : Communication equipment as a complex product , No matter how you design or stack , It has the possibility of failure , Just the probability 、 The question of time .
For possible faults , Instead of strictly guarding against , It is better to focus on the failure , What should I do .
therefore , Introduce backup mechanism , It is the most effective means to deal with faults .
Backup mechanism
Everyone has learned “ Probability and Statistics ”,1 If the failure probability of a device is 1%, that , Probability of simultaneous failure of two devices , Namely 1%×1%=0.01%. That's right. ?
To ensure absolute safety , Network architecture design , Will be used POOL( pool ) Networking mode , Here's the picture :
Several devices work together to form a pool (POOL), Each is responsible for the business , If one breaks , Others immediately top , Ensure that the business is not affected .
Core equipment , There are usually two or more , In different areas of the provincial capital , Physically, it's far away .
Besides , When doing network architecture design , Important device network elements are usually placed in the core computer room with a higher security level .
Core machine room
for example , The most important thing in mobile communication network 、 Responsible for storing and managing user data HSS( It's the old HLR, There is the mobile phone number of each user 、 Authentication data 、 Business information, etc ), It is stored in the core computer room of the provincial capital . meanwhile , Maintenance personnel will conduct physical remote isolation backup of data on a regular basis .
In recent years , Because of geological disasters , Plus factors such as war or terrorist attack , Operators even began to do Different provinces Backup of .
for example , Last year's Zhengzhou flood , At that time, the core computer room was flooded ,HLR Withdrawal , It is urgent to use the HLR, Realize the temporary recovery of business .
Different disaster recovery levels
The second way , The underlying active / standby mechanism .
Just now we are talking about the redundancy mechanism of top-level design . Specific to the machine room 、 frame 、 Veneer 、 Cable , There are also active and standby designs , It can be called the underlying active / standby mechanism .
If you have been to the computer room , You'll find out , The frame on the cabinet , There are all kinds of boards inserted . And these boards , Basically, they all appear in pairs .
A manufacturer 3G Front appearance of the equipment
in other words , A certain type of board , Usually there are two pieces .
The same is true of network cable and optical fiber , You can hardly see a single cable , It's all in pairs .
A manufacturer 4G Front appearance of the equipment
The reason for this , Just to back up each other . If a board breaks , Then another board can continue to work , Ensure that the business is not affected . meanwhile , The system will alarm , Remind the staff to replace as soon as possible .
Power supply is the same , All cabinet equipment in telecommunication machine room , There must be at least two power inputs .
Multiple power input ( One red and one blue is the way )
Except that the city electricity thought , Important machine rooms will also be equipped with batteries 、UPS、 Generators and other emergency power supply equipment .
Battery pack in the machine room
Third , Perfect management system and regulations .
Technology is never the only factor that affects network security and stability . The biggest threat to the communication network , It's actually people , Not technology .
For this point , Jujube Jun believes that every correspondent will have the same feeling .
In terms of management process and system , In terms of engineering technical specifications , We have learned countless bloody lessons .
Why should the upgrade plan be reviewed repeatedly ? Why should engineering specifications be so strict ? Why build a spare parts warehouse ? Why is the cutover step necessary double-check, even to the extent that triple-check? Why should we arrange to be on duty after major operations ? Why should the Internet be closed on important holidays ?……
These are the experiences summarized by predecessors .
For network failure , Always be in awe
In addition to the internal management system and process standards , Aiming at the deliberate destruction of communication network that often happens now , The country has also established increasingly strict laws and regulations , Punishment .
Like illegal construction, cutting off optical fibers 、 Deliberately destroy the base station 、 Cut the optical fiber , Will be punished by law .
The malicious cut feeder of the base station
█ The deep-seated reasons behind the communication failure
Have a reasonable network architecture design , There is a complete active and standby mechanism , There are also perfect systems and norms , Why do so many faults occur ?
Next , Let me talk about some deep-seated reasons .
First and foremost , It is probably the most agreed point , That's it The internal environment of the communication industry .
Over the years , Malicious competition 、 Low price bidding prevails , Equipment suppliers and subcontractors should rush for orders , And maintain profits , Can only desperately lower costs , For example, product design cost 、 Material cost 、 Cost of construction materials . More importantly , Personnel salary cost .
Costs continue to compress , It is bound to affect product reliability and engineering quality . Low wages , Leading to the loss of a large number of experienced talents . Subcontractor to complete , Only fresh students can be recruited , Simple training ( Not even training ) after , Send to the scene to work .
These personnel lack the necessary training and practice , The quality level and technical ability are insufficient , Become a big risk point .
Some of them have very low quality , Oppressed hard , Directly delete the database and run , It's not impossible .
years ago , In order to ensure that front-line employees are not deducted , Some manufacturers even sign contracts with subcontractors , Restrict the bottom line of outsourcing employees .
Besides low price competition , Another important factor affecting the security of network operation , yes Increasing technical complexity .
The more advanced technology , The more complex , The lower the reliability . As technology evolves , The network scale of operators is becoming larger and larger , Networking is also becoming more and more complex , The probability of problems greatly increases .
The tidal effect of communication network is very obvious . Sometimes there is a difference of ten or even a hundred times between free time and busy time . If there is an accident ( Disasters, etc ), Traffic surged , It is more likely to be a thousand times the difference .
It is impossible for operators to do a thousand times redundant design . therefore , If there is no reasonable bypass design or threshold design , The probability of network congestion is extremely high .( Several major failures in recent years , There are factors of signaling traffic congestion .)
At present, the complex networking of operators , Few of them can fully understand . Time is long. , Once personnel flow , It's even stranger .
Communication network is originally a metaphysics , There are many strange problems , Who dares to say that he can calculate every possibility ?
The third potential network security risk , It is also the risk that Xiaozao Jun is most worried about , That's it External cyber attacks . For example, hackers 、 Viruses and system vulnerabilities .
Now , Communication equipment is basically IP turn 、 The cloud has melted , The network is more and more open , Some are directly deployed on the public cloud , Physical isolation from the outside world is getting weaker , More vulnerable than before .
Now the attacker , The level is also much higher than before , Means are also more diversified , The threat to the network is great .
Of course , Operators and equipment manufacturers are preventing network attacks , There's a lot of investment .
Now? , All manufacturers are concerned “ Safety reinforcement ” The concept . seeing the name of a thing one thinks of its function , Security reinforcement is to block system vulnerabilities , Make the system more stable . Operators will use third-party tools , Or hire a third-party manufacturer , Conduct security scanning of existing network equipment , Looking for security holes , Then ask the equipment manufacturer to rectify and block .
All for safety
such “ Go all the way , Magic height ” The game of , It will last for a long time .
however , Xiaozaojun thinks that , The current defensive side , In terms of personnel safety awareness 、 In terms of technical ability , There are big problems . follow-up , The security incidents we encountered , More and more .
I hope relevant units and departments don't talk about safety , Really spend some time to improve the quality of your staff , Strengthen training . Otherwise something really happened , It's too late to remedy .
█ Last words
Japan KDDI This is not the first time , Certainly not the last time . Communication network failure , It's like beating a drum to pass flowers , No one knows whether he is next .
Now? , Manufacturers have proposed to introduce AI, Let AI take over the network , So as to reduce the failure rate of the network . Some manufacturers , On the basis of network cloud , Do grayscale upgrading ( That is, partial upgrade ), It can also significantly reduce network risk . These are all good trends .
I think , On the road of fighting against the failure of communication network , We have a long way to go . What a long long road! , Correspondents ask for help from top to bottom .
Okay , That's all for today's article . Thank you for your patience in reading , See you next time !
thank you !
边栏推荐
- Shell input a string of numbers to determine whether it is a mobile phone number
- It doesn't make sense without a distributed gateway
- 微信小程序中给event对象传递数据
- How to solve the error "press any to exit" when deploying multiple easycvr on one server?
- 带你穿越古罗马,元宇宙巴士来啦 #Invisible Cities
- [translation] principle analysis of X Window Manager (I)
- The art of Engineering (1): try to package things that do not need to be exposed
- Unity小技巧 - 绘制瞄准准心
- RepPoints:可形变卷积的进阶
- Open source and safe "song of ice and fire"
猜你喜欢
Pourquoi Li shufu a - t - il construit son téléphone portable?
Unity tips - draw aiming Center
QT中Model-View-Delegate委托代理机制用法介绍
李書福為何要親自掛帥造手機?
Sqoop I have everything you want
Appium automated test scroll and drag_ and_ Drop slides according to element position
After entering Alibaba for the interview and returning with a salary of 35K, I summarized an interview question of Alibaba test engineer
Unity小技巧 - 绘制瞄准准心
【Android】Kotlin代码编写规范化文档
Getting started with pytest ----- test case pre post, firmware
随机推荐
VR全景婚礼,帮助新人记录浪漫且美好的场景
重磅硬核 | 一文聊透对象在 JVM 中的内存布局,以及内存对齐和压缩指针的原理及应用
传统家装有落差,VR全景家装让你体验新房落成效果
Manifest of SAP ui5 framework json
学 SQL 必须了解的 10 个高级概念
Unity粒子特效系列-闪星星的宝箱
10 advanced concepts that must be understood in learning SQL
EasyCVR平台通过接口编辑通道出现报错“ID不能为空”,是什么原因?
In terms of byte measurement with an annual salary of 30W, automated testing can be learned in this way
容器里用systemctl运行服务报错:Failed to get D-Bus connection: Operation not permitted(解决方法)
FlutterWeb瀏覽器刷新後無法回退的解决方案
面试突击62:group by 有哪些注意事项?
带你穿越古罗马,元宇宙巴士来啦 #Invisible Cities
编译原理——自上而下分析与递归下降分析构造(笔记)
RB157-ASEMI整流桥RB157
Video fusion cloud platform easycvr adds multi-level grouping, which can flexibly manage access devices
[introduction to MySQL] third, common data types in MySQL
scratch疫情隔离和核酸检测模拟 电子学会图形化编程scratch等级考试三级真题和答案解析2022年6月
Nodejs 开发者路线图 2022 零基础学习指南
[elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias