当前位置:网站首页>I want to say more about this communication failure
I want to say more about this communication failure
2022-07-06 17:58:00 【Fresh jujube class】
This few days , Everyone is paying attention to Japanese telecom operators KDDI Large scale communication failure .
This fault has a great impact , Involving the whole territory of Japan , common 3915 Million users . and , The fault lasts for a long time , It took almost two days , It's basically recovered .
The specific cause of the failure , I see many official account have been written , I will not repeat the analysis .
Today's article , I want to enlarge the topic , Have an in-depth chat with you —— all 2022 Years. , Why are there so many failures in our communication network , as well as , Do we have the ultimate solution .
█ Communication failure : A game that lasts for a hundred years
Fault is the natural attribute of communication network . Just like people get sick , Since the birth of communication network , It is accompanied by failure . Or say , We are in the process of troubleshooting , To create a communication network .
After solving countless troubles, father bell , Just invented the telephone
For more than 100 years , Countless correspondents , They are fighting and playing games with the fault unremittingly . They have worked hard to develop various technologies , Various means have been used , Fight against communication failure .
On a macro level , The effect of the struggle is remarkable . With the continuous accumulation of experience , With the continuous progress of Technology , The probability of communication network failure is declining .
Young readers may not know ,20 Many years ago , The landline cannot be dialed ( There are not many families with telephones ), It's the same as cutting off water and power , It's a common phenomenon .10 Many years ago , The mobile phone cannot be dialed , Don't go online , It is also a common phenomenon .
In the past ten years , These phenomena are becoming increasingly rare . Once in a while , Instead, people will feel very strange . The Internet is down , The first reaction of many people is that their mobile phone is broken , Or I owe you , Restart or recharge quickly . isn't it? ?
We are now in an information society , Communication network is the same as hydropower , Is an important infrastructure . Our work and life , And the operation of all walks of life , Can not be separated from the communication network .
Under such premise , As a state-owned enterprise , As the construction and maintenance of the network , We will always put the security and stability of the network first .
For network stability , The Ministry of industry and information technology has set strict assessment indicators for operators . If there is a network failure in a province or city , The top leaders must bear the responsibility , Career worries .
Pressure from operator leaders , Will be passed on to employees , It will also be passed on to equipment manufacturers and outsourcers .
Now the market competition is so fierce , Once something happens , Or huge compensation , Or lose the market share of this province , This is an unbearable loss for equipment manufacturers and outsourcers .
So , The entire communication industry is concerned about the security and stability of the communication network , Attention must be enough . The key , It's still a question of ability and execution .
█ The weakness of communication network , Where on earth ?
First , I want to talk about the definition of security level of communication network .
Depending on the scene , The security of communication network is divided into different levels . From low to high , They are family level 、 Enterprise class 、 Telecommunication level .
Security level of communication system
Like the router we use at home , All belong to family level . The safety and reliability of this equipment is very low , Bad is bad , It is easy to cause network interruption .
Enterprise level , It is the network equipment used in the unit . According to the network size and the number of users , Enterprise level equipment has high safety and reliability , It is not easy to interrupt the service .
Requirements for carrier grade , Even higher . Like moving 、 telecom 、 Unicom , Their network , To provide services for hundreds of millions of users , It is absolutely not allowed to break down easily . Generally speaking , Carrier level reliability , To achieve 5 individual 9 The above criteria .
Today, Xiaozao Jun talked about communication network , It refers to the public communication network of operators facing the public , Including cellular mobile communication network , It also includes fixed line broadband network . They all belong to the carrier class .
The architecture of cellular mobile communication network and fixed broadband network is similar , The main difference is that Access network part .
Cellular mobile communication network is a wireless access network , The access device is a base station . The fixed broadband network is a wired access network , The access device is PON equipment ( Passive optical network equipment , Including the light cat ).
Let's take the cellular mobile communication network as an example , Analyze .
Public communication network , It serves hundreds of millions of user groups , therefore , A pyramid level architecture is usually used , The core network is the core , Transmission network ( Bearer network ) As the backbone , The access network is limb .
You can see it at a glance , This architecture , The biggest weakness , It lies in the core network and transmission network ( Especially the backbone network ).
The core network is the management center , It is the heart and brain of the network , Once you hang up , Just hang up the whole network . therefore , Core network engineer ( For example, when I was ) It is the post with the greatest risk and pressure .
Core network machine room
Transmission network ( Bearer network ) Well , It is the blood vessel and nerve of communication network . It's easy to say at the end , Broken at most affects a small piece , however , If the cardiovascular and cerebrovascular system breaks down , What do I do ? That is also complete paralysis .
Optical transmission equipment
This time, KDDI Failure occurred , also 2021 year 10 month DoCoMo Failure occurred , as well as 2020 The breakdown of the four major operators in the UK ,2020 In the U.S. CenturyLink Failure of , Are related to the core router . To put it bluntly , There is something wrong with cardio cerebral vessels , The whole person ( The Internet ) He collapsed .
by comparison , The probability of major problems in the access network is very low . Individual base stations “ Drop the station ”, It affects hundreds of thousands of people at most , no room to swing a cat in , Complaints are controllable .
Base station equipment
If there is a large-scale failure in the access network , It is most likely the software version of the equipment manufacturer , Or hardware batch problem . The probability of this situation is extremely low .
█ In order to prevent failure , What did the correspondents do ?
that , In order to ensure the safe and smooth operation of the communication network , Prevent failure , What methods have our correspondents adopted ?
First , It is the perfection of the top-level architecture design .
The architecture of the network , It is the foundation of network security . A good architecture , Consider both performance and capacity , Also consider the cost , Also consider safety and redundancy .
Please remember one thing about big housework here : Communication equipment as a complex product , No matter how you design or stack , It has the possibility of failure , Just the probability 、 The question of time .
For possible faults , Instead of strictly guarding against , It is better to focus on the failure , What should I do .
therefore , Introduce backup mechanism , It is the most effective means to deal with faults .
Backup mechanism
Everyone has learned “ Probability and Statistics ”,1 If the failure probability of a device is 1%, that , Probability of simultaneous failure of two devices , Namely 1%×1%=0.01%. That's right. ?
To ensure absolute safety , Network architecture design , Will be used POOL( pool ) Networking mode , Here's the picture :
Several devices work together to form a pool (POOL), Each is responsible for the business , If one breaks , Others immediately top , Ensure that the business is not affected .
Core equipment , There are usually two or more , In different areas of the provincial capital , Physically, it's far away .
Besides , When doing network architecture design , Important device network elements are usually placed in the core computer room with a higher security level .
Core machine room
for example , The most important thing in mobile communication network 、 Responsible for storing and managing user data HSS( It's the old HLR, There is the mobile phone number of each user 、 Authentication data 、 Business information, etc ), It is stored in the core computer room of the provincial capital . meanwhile , Maintenance personnel will conduct physical remote isolation backup of data on a regular basis .
In recent years , Because of geological disasters , Plus factors such as war or terrorist attack , Operators even began to do Different provinces Backup of .
for example , Last year's Zhengzhou flood , At that time, the core computer room was flooded ,HLR Withdrawal , It is urgent to use the HLR, Realize the temporary recovery of business .
Different disaster recovery levels
The second way , The underlying active / standby mechanism .
Just now we are talking about the redundancy mechanism of top-level design . Specific to the machine room 、 frame 、 Veneer 、 Cable , There are also active and standby designs , It can be called the underlying active / standby mechanism .
If you have been to the computer room , You'll find out , The frame on the cabinet , There are all kinds of boards inserted . And these boards , Basically, they all appear in pairs .
A manufacturer 3G Front appearance of the equipment
in other words , A certain type of board , Usually there are two pieces .
The same is true of network cable and optical fiber , You can hardly see a single cable , It's all in pairs .
A manufacturer 4G Front appearance of the equipment
The reason for this , Just to back up each other . If a board breaks , Then another board can continue to work , Ensure that the business is not affected . meanwhile , The system will alarm , Remind the staff to replace as soon as possible .
Power supply is the same , All cabinet equipment in telecommunication machine room , There must be at least two power inputs .
Multiple power input ( One red and one blue is the way )
Except that the city electricity thought , Important machine rooms will also be equipped with batteries 、UPS、 Generators and other emergency power supply equipment .
Battery pack in the machine room
Third , Perfect management system and regulations .
Technology is never the only factor that affects network security and stability . The biggest threat to the communication network , It's actually people , Not technology .
For this point , Jujube Jun believes that every correspondent will have the same feeling .
In terms of management process and system , In terms of engineering technical specifications , We have learned countless bloody lessons .
Why should the upgrade plan be reviewed repeatedly ? Why should engineering specifications be so strict ? Why build a spare parts warehouse ? Why is the cutover step necessary double-check, even to the extent that triple-check? Why should we arrange to be on duty after major operations ? Why should the Internet be closed on important holidays ?……
These are the experiences summarized by predecessors .
For network failure , Always be in awe
In addition to the internal management system and process standards , Aiming at the deliberate destruction of communication network that often happens now , The country has also established increasingly strict laws and regulations , Punishment .
Like illegal construction, cutting off optical fibers 、 Deliberately destroy the base station 、 Cut the optical fiber , Will be punished by law .
The malicious cut feeder of the base station
█ The deep-seated reasons behind the communication failure
Have a reasonable network architecture design , There is a complete active and standby mechanism , There are also perfect systems and norms , Why do so many faults occur ?
Next , Let me talk about some deep-seated reasons .
First and foremost , It is probably the most agreed point , That's it The internal environment of the communication industry .
Over the years , Malicious competition 、 Low price bidding prevails , Equipment suppliers and subcontractors should rush for orders , And maintain profits , Can only desperately lower costs , For example, product design cost 、 Material cost 、 Cost of construction materials . More importantly , Personnel salary cost .
Costs continue to compress , It is bound to affect product reliability and engineering quality . Low wages , Leading to the loss of a large number of experienced talents . Subcontractor to complete , Only fresh students can be recruited , Simple training ( Not even training ) after , Send to the scene to work .
These personnel lack the necessary training and practice , The quality level and technical ability are insufficient , Become a big risk point .
Some of them have very low quality , Oppressed hard , Directly delete the database and run , It's not impossible .
years ago , In order to ensure that front-line employees are not deducted , Some manufacturers even sign contracts with subcontractors , Restrict the bottom line of outsourcing employees .
Besides low price competition , Another important factor affecting the security of network operation , yes Increasing technical complexity .
The more advanced technology , The more complex , The lower the reliability . As technology evolves , The network scale of operators is becoming larger and larger , Networking is also becoming more and more complex , The probability of problems greatly increases .
The tidal effect of communication network is very obvious . Sometimes there is a difference of ten or even a hundred times between free time and busy time . If there is an accident ( Disasters, etc ), Traffic surged , It is more likely to be a thousand times the difference .
It is impossible for operators to do a thousand times redundant design . therefore , If there is no reasonable bypass design or threshold design , The probability of network congestion is extremely high .( Several major failures in recent years , There are factors of signaling traffic congestion .)
At present, the complex networking of operators , Few of them can fully understand . Time is long. , Once personnel flow , It's even stranger .
Communication network is originally a metaphysics , There are many strange problems , Who dares to say that he can calculate every possibility ?
The third potential network security risk , It is also the risk that Xiaozao Jun is most worried about , That's it External cyber attacks . For example, hackers 、 Viruses and system vulnerabilities .
Now , Communication equipment is basically IP turn 、 The cloud has melted , The network is more and more open , Some are directly deployed on the public cloud , Physical isolation from the outside world is getting weaker , More vulnerable than before .
Now the attacker , The level is also much higher than before , Means are also more diversified , The threat to the network is great .
Of course , Operators and equipment manufacturers are preventing network attacks , There's a lot of investment .
Now? , All manufacturers are concerned “ Safety reinforcement ” The concept . seeing the name of a thing one thinks of its function , Security reinforcement is to block system vulnerabilities , Make the system more stable . Operators will use third-party tools , Or hire a third-party manufacturer , Conduct security scanning of existing network equipment , Looking for security holes , Then ask the equipment manufacturer to rectify and block .
All for safety
such “ Go all the way , Magic height ” The game of , It will last for a long time .
however , Xiaozaojun thinks that , The current defensive side , In terms of personnel safety awareness 、 In terms of technical ability , There are big problems . follow-up , The security incidents we encountered , More and more .
I hope relevant units and departments don't talk about safety , Really spend some time to improve the quality of your staff , Strengthen training . Otherwise something really happened , It's too late to remedy .
█ Last words
Japan KDDI This is not the first time , Certainly not the last time . Communication network failure , It's like beating a drum to pass flowers , No one knows whether he is next .
Now? , Manufacturers have proposed to introduce AI, Let AI take over the network , So as to reduce the failure rate of the network . Some manufacturers , On the basis of network cloud , Do grayscale upgrading ( That is, partial upgrade ), It can also significantly reduce network risk . These are all good trends .
I think , On the road of fighting against the failure of communication network , We have a long way to go . What a long long road! , Correspondents ask for help from top to bottom .
Okay , That's all for today's article . Thank you for your patience in reading , See you next time !
thank you !
边栏推荐
- 一体化实时 HTAP 数据库 StoneDB,如何替换 MySQL 并实现近百倍性能提升
- Guidelines for preparing for the 2022 soft exam information security engineer exam
- Spark calculation operator and some small details in liunx
- OliveTin能在网页上安全运行shell命令(上)
- kivy教程之在 Kivy 中支持中文以构建跨平台应用程序(教程含源码)
- Solution qui ne peut pas être retournée après la mise à jour du navigateur Web flutter
- Debug xv6
- 微信小程序中给event对象传递数据
- The art of Engineering (1): try to package things that do not need to be exposed
- Unity tips - draw aiming Center
猜你喜欢
Selected technical experts from China Mobile, ant, SF, and Xingsheng will show you the guarantee of architecture stability
中移动、蚂蚁、顺丰、兴盛优选技术专家,带你了解架构稳定性保障
2022年大厂Android面试题汇总(一)(含答案)
Pytest learning ----- detailed explanation of the request for interface automation test
Reppoints: advanced order of deformable convolution
78 year old professor Huake has been chasing dreams for 40 years, and the domestic database reaches dreams to sprint for IPO
Spark calculation operator and some small details in liunx
Establishment of graphical monitoring grafana
OpenCV中如何使用滚动条动态调整参数
历史上的今天:Google 之母出生;同一天诞生的两位图灵奖先驱
随机推荐
Video fusion cloud platform easycvr adds multi-level grouping, which can flexibly manage access devices
The easycvr platform reports an error "ID cannot be empty" through the interface editing channel. What is the reason?
adb常用命令
2022年大厂Android面试题汇总(二)(含答案)
Nodejs 开发者路线图 2022 零基础学习指南
How to submit data through post
node の SQLite
It doesn't make sense without a distributed gateway
8位MCU跑RTOS有没有意义?
面试突击62:group by 有哪些注意事项?
RB157-ASEMI整流桥RB157
关于这次通信故障,我想多说几句…
MarkDown语法——更好地写博客
EasyCVR授权到期页面无法登录,该如何解决?
基本磁盘与动态磁盘 RAID磁盘冗余阵列区分
FlutterWeb浏览器刷新后无法回退的解决方案
Wechat applet obtains mobile number
Basic configuration and use of spark
面试突击63:MySQL 中如何去重?
Mysqlimport imports data files into the database