当前位置:网站首页>I want to say more about this communication failure
I want to say more about this communication failure
2022-07-06 17:58:00 【Fresh jujube class】
This few days , Everyone is paying attention to Japanese telecom operators KDDI Large scale communication failure .
This fault has a great impact , Involving the whole territory of Japan , common 3915 Million users . and , The fault lasts for a long time , It took almost two days , It's basically recovered .
The specific cause of the failure , I see many official account have been written , I will not repeat the analysis .
Today's article , I want to enlarge the topic , Have an in-depth chat with you —— all 2022 Years. , Why are there so many failures in our communication network , as well as , Do we have the ultimate solution .
█ Communication failure : A game that lasts for a hundred years
Fault is the natural attribute of communication network . Just like people get sick , Since the birth of communication network , It is accompanied by failure . Or say , We are in the process of troubleshooting , To create a communication network .
After solving countless troubles, father bell , Just invented the telephone
For more than 100 years , Countless correspondents , They are fighting and playing games with the fault unremittingly . They have worked hard to develop various technologies , Various means have been used , Fight against communication failure .
On a macro level , The effect of the struggle is remarkable . With the continuous accumulation of experience , With the continuous progress of Technology , The probability of communication network failure is declining .
Young readers may not know ,20 Many years ago , The landline cannot be dialed ( There are not many families with telephones ), It's the same as cutting off water and power , It's a common phenomenon .10 Many years ago , The mobile phone cannot be dialed , Don't go online , It is also a common phenomenon .
In the past ten years , These phenomena are becoming increasingly rare . Once in a while , Instead, people will feel very strange . The Internet is down , The first reaction of many people is that their mobile phone is broken , Or I owe you , Restart or recharge quickly . isn't it? ?
We are now in an information society , Communication network is the same as hydropower , Is an important infrastructure . Our work and life , And the operation of all walks of life , Can not be separated from the communication network .
Under such premise , As a state-owned enterprise , As the construction and maintenance of the network , We will always put the security and stability of the network first .
For network stability , The Ministry of industry and information technology has set strict assessment indicators for operators . If there is a network failure in a province or city , The top leaders must bear the responsibility , Career worries .
Pressure from operator leaders , Will be passed on to employees , It will also be passed on to equipment manufacturers and outsourcers .
Now the market competition is so fierce , Once something happens , Or huge compensation , Or lose the market share of this province , This is an unbearable loss for equipment manufacturers and outsourcers .
So , The entire communication industry is concerned about the security and stability of the communication network , Attention must be enough . The key , It's still a question of ability and execution .
█ The weakness of communication network , Where on earth ?
First , I want to talk about the definition of security level of communication network .
Depending on the scene , The security of communication network is divided into different levels . From low to high , They are family level 、 Enterprise class 、 Telecommunication level .
Security level of communication system
Like the router we use at home , All belong to family level . The safety and reliability of this equipment is very low , Bad is bad , It is easy to cause network interruption .
Enterprise level , It is the network equipment used in the unit . According to the network size and the number of users , Enterprise level equipment has high safety and reliability , It is not easy to interrupt the service .
Requirements for carrier grade , Even higher . Like moving 、 telecom 、 Unicom , Their network , To provide services for hundreds of millions of users , It is absolutely not allowed to break down easily . Generally speaking , Carrier level reliability , To achieve 5 individual 9 The above criteria .
Today, Xiaozao Jun talked about communication network , It refers to the public communication network of operators facing the public , Including cellular mobile communication network , It also includes fixed line broadband network . They all belong to the carrier class .
The architecture of cellular mobile communication network and fixed broadband network is similar , The main difference is that Access network part .
Cellular mobile communication network is a wireless access network , The access device is a base station . The fixed broadband network is a wired access network , The access device is PON equipment ( Passive optical network equipment , Including the light cat ).
Let's take the cellular mobile communication network as an example , Analyze .
Public communication network , It serves hundreds of millions of user groups , therefore , A pyramid level architecture is usually used , The core network is the core , Transmission network ( Bearer network ) As the backbone , The access network is limb .
You can see it at a glance , This architecture , The biggest weakness , It lies in the core network and transmission network ( Especially the backbone network ).
The core network is the management center , It is the heart and brain of the network , Once you hang up , Just hang up the whole network . therefore , Core network engineer ( For example, when I was ) It is the post with the greatest risk and pressure .
Core network machine room
Transmission network ( Bearer network ) Well , It is the blood vessel and nerve of communication network . It's easy to say at the end , Broken at most affects a small piece , however , If the cardiovascular and cerebrovascular system breaks down , What do I do ? That is also complete paralysis .
Optical transmission equipment
This time, KDDI Failure occurred , also 2021 year 10 month DoCoMo Failure occurred , as well as 2020 The breakdown of the four major operators in the UK ,2020 In the U.S. CenturyLink Failure of , Are related to the core router . To put it bluntly , There is something wrong with cardio cerebral vessels , The whole person ( The Internet ) He collapsed .
by comparison , The probability of major problems in the access network is very low . Individual base stations “ Drop the station ”, It affects hundreds of thousands of people at most , no room to swing a cat in , Complaints are controllable .
Base station equipment
If there is a large-scale failure in the access network , It is most likely the software version of the equipment manufacturer , Or hardware batch problem . The probability of this situation is extremely low .
█ In order to prevent failure , What did the correspondents do ?
that , In order to ensure the safe and smooth operation of the communication network , Prevent failure , What methods have our correspondents adopted ?
First , It is the perfection of the top-level architecture design .
The architecture of the network , It is the foundation of network security . A good architecture , Consider both performance and capacity , Also consider the cost , Also consider safety and redundancy .
Please remember one thing about big housework here : Communication equipment as a complex product , No matter how you design or stack , It has the possibility of failure , Just the probability 、 The question of time .
For possible faults , Instead of strictly guarding against , It is better to focus on the failure , What should I do .
therefore , Introduce backup mechanism , It is the most effective means to deal with faults .
Backup mechanism
Everyone has learned “ Probability and Statistics ”,1 If the failure probability of a device is 1%, that , Probability of simultaneous failure of two devices , Namely 1%×1%=0.01%. That's right. ?
To ensure absolute safety , Network architecture design , Will be used POOL( pool ) Networking mode , Here's the picture :
Several devices work together to form a pool (POOL), Each is responsible for the business , If one breaks , Others immediately top , Ensure that the business is not affected .
Core equipment , There are usually two or more , In different areas of the provincial capital , Physically, it's far away .
Besides , When doing network architecture design , Important device network elements are usually placed in the core computer room with a higher security level .
Core machine room
for example , The most important thing in mobile communication network 、 Responsible for storing and managing user data HSS( It's the old HLR, There is the mobile phone number of each user 、 Authentication data 、 Business information, etc ), It is stored in the core computer room of the provincial capital . meanwhile , Maintenance personnel will conduct physical remote isolation backup of data on a regular basis .
In recent years , Because of geological disasters , Plus factors such as war or terrorist attack , Operators even began to do Different provinces Backup of .
for example , Last year's Zhengzhou flood , At that time, the core computer room was flooded ,HLR Withdrawal , It is urgent to use the HLR, Realize the temporary recovery of business .
Different disaster recovery levels
The second way , The underlying active / standby mechanism .
Just now we are talking about the redundancy mechanism of top-level design . Specific to the machine room 、 frame 、 Veneer 、 Cable , There are also active and standby designs , It can be called the underlying active / standby mechanism .
If you have been to the computer room , You'll find out , The frame on the cabinet , There are all kinds of boards inserted . And these boards , Basically, they all appear in pairs .
A manufacturer 3G Front appearance of the equipment
in other words , A certain type of board , Usually there are two pieces .
The same is true of network cable and optical fiber , You can hardly see a single cable , It's all in pairs .
A manufacturer 4G Front appearance of the equipment
The reason for this , Just to back up each other . If a board breaks , Then another board can continue to work , Ensure that the business is not affected . meanwhile , The system will alarm , Remind the staff to replace as soon as possible .
Power supply is the same , All cabinet equipment in telecommunication machine room , There must be at least two power inputs .
Multiple power input ( One red and one blue is the way )
Except that the city electricity thought , Important machine rooms will also be equipped with batteries 、UPS、 Generators and other emergency power supply equipment .
Battery pack in the machine room
Third , Perfect management system and regulations .
Technology is never the only factor that affects network security and stability . The biggest threat to the communication network , It's actually people , Not technology .
For this point , Jujube Jun believes that every correspondent will have the same feeling .
In terms of management process and system , In terms of engineering technical specifications , We have learned countless bloody lessons .
Why should the upgrade plan be reviewed repeatedly ? Why should engineering specifications be so strict ? Why build a spare parts warehouse ? Why is the cutover step necessary double-check, even to the extent that triple-check? Why should we arrange to be on duty after major operations ? Why should the Internet be closed on important holidays ?……
These are the experiences summarized by predecessors .
For network failure , Always be in awe
In addition to the internal management system and process standards , Aiming at the deliberate destruction of communication network that often happens now , The country has also established increasingly strict laws and regulations , Punishment .
Like illegal construction, cutting off optical fibers 、 Deliberately destroy the base station 、 Cut the optical fiber , Will be punished by law .
The malicious cut feeder of the base station
█ The deep-seated reasons behind the communication failure
Have a reasonable network architecture design , There is a complete active and standby mechanism , There are also perfect systems and norms , Why do so many faults occur ?
Next , Let me talk about some deep-seated reasons .
First and foremost , It is probably the most agreed point , That's it The internal environment of the communication industry .
Over the years , Malicious competition 、 Low price bidding prevails , Equipment suppliers and subcontractors should rush for orders , And maintain profits , Can only desperately lower costs , For example, product design cost 、 Material cost 、 Cost of construction materials . More importantly , Personnel salary cost .
Costs continue to compress , It is bound to affect product reliability and engineering quality . Low wages , Leading to the loss of a large number of experienced talents . Subcontractor to complete , Only fresh students can be recruited , Simple training ( Not even training ) after , Send to the scene to work .
These personnel lack the necessary training and practice , The quality level and technical ability are insufficient , Become a big risk point .
Some of them have very low quality , Oppressed hard , Directly delete the database and run , It's not impossible .
years ago , In order to ensure that front-line employees are not deducted , Some manufacturers even sign contracts with subcontractors , Restrict the bottom line of outsourcing employees .
Besides low price competition , Another important factor affecting the security of network operation , yes Increasing technical complexity .
The more advanced technology , The more complex , The lower the reliability . As technology evolves , The network scale of operators is becoming larger and larger , Networking is also becoming more and more complex , The probability of problems greatly increases .
The tidal effect of communication network is very obvious . Sometimes there is a difference of ten or even a hundred times between free time and busy time . If there is an accident ( Disasters, etc ), Traffic surged , It is more likely to be a thousand times the difference .
It is impossible for operators to do a thousand times redundant design . therefore , If there is no reasonable bypass design or threshold design , The probability of network congestion is extremely high .( Several major failures in recent years , There are factors of signaling traffic congestion .)
At present, the complex networking of operators , Few of them can fully understand . Time is long. , Once personnel flow , It's even stranger .
Communication network is originally a metaphysics , There are many strange problems , Who dares to say that he can calculate every possibility ?
The third potential network security risk , It is also the risk that Xiaozao Jun is most worried about , That's it External cyber attacks . For example, hackers 、 Viruses and system vulnerabilities .
Now , Communication equipment is basically IP turn 、 The cloud has melted , The network is more and more open , Some are directly deployed on the public cloud , Physical isolation from the outside world is getting weaker , More vulnerable than before .
Now the attacker , The level is also much higher than before , Means are also more diversified , The threat to the network is great .
Of course , Operators and equipment manufacturers are preventing network attacks , There's a lot of investment .
Now? , All manufacturers are concerned “ Safety reinforcement ” The concept . seeing the name of a thing one thinks of its function , Security reinforcement is to block system vulnerabilities , Make the system more stable . Operators will use third-party tools , Or hire a third-party manufacturer , Conduct security scanning of existing network equipment , Looking for security holes , Then ask the equipment manufacturer to rectify and block .
All for safety
such “ Go all the way , Magic height ” The game of , It will last for a long time .
however , Xiaozaojun thinks that , The current defensive side , In terms of personnel safety awareness 、 In terms of technical ability , There are big problems . follow-up , The security incidents we encountered , More and more .
I hope relevant units and departments don't talk about safety , Really spend some time to improve the quality of your staff , Strengthen training . Otherwise something really happened , It's too late to remedy .
█ Last words
Japan KDDI This is not the first time , Certainly not the last time . Communication network failure , It's like beating a drum to pass flowers , No one knows whether he is next .
Now? , Manufacturers have proposed to introduce AI, Let AI take over the network , So as to reduce the failure rate of the network . Some manufacturers , On the basis of network cloud , Do grayscale upgrading ( That is, partial upgrade ), It can also significantly reduce network risk . These are all good trends .
I think , On the road of fighting against the failure of communication network , We have a long way to go . What a long long road! , Correspondents ask for help from top to bottom .
Okay , That's all for today's article . Thank you for your patience in reading , See you next time !
thank you !
边栏推荐
- Manifest of SAP ui5 framework json
- C# NanoFramework 点灯和按键 之 ESP32
- Pytorch extract middle layer features?
- 重磅!蚂蚁开源可信隐私计算框架“隐语”,主流技术灵活组装、开发者友好分层设计...
- node の SQLite
- The integrated real-time HTAP database stonedb, how to replace MySQL and achieve nearly a hundredfold performance improvement
- Selected technical experts from China Mobile, ant, SF, and Xingsheng will show you the guarantee of architecture stability
- kivy教程之在 Kivy 中支持中文以构建跨平台应用程序(教程含源码)
- How to output special symbols in shell
- 面试突击62:group by 有哪些注意事项?
猜你喜欢
FlutterWeb瀏覽器刷新後無法回退的解决方案
EasyCVR接入设备开启音频后,视频无法正常播放是什么原因?
Pytest learning ----- detailed explanation of the request for interface automation test
FMT开源自驾仪 | FMT中间件:一种高实时的分布式日志模块Mlog
IP, subnet mask, gateway, default gateway
2022年大厂Android面试题汇总(二)(含答案)
基本磁盘与动态磁盘 RAID磁盘冗余阵列区分
STM32按键状态机2——状态简化与增加长按功能
Video fusion cloud platform easycvr adds multi-level grouping, which can flexibly manage access devices
Kivy tutorial: support Chinese in Kivy to build cross platform applications (tutorial includes source code)
随机推荐
Sqoop I have everything you want
基于STM32+华为云IOT设计的智能路灯
Fleet tutorial 13 basic introduction to listview's most commonly used scroll controls (tutorial includes source code)
关于这次通信故障,我想多说几句…
STM32按键状态机2——状态简化与增加长按功能
传统家装有落差,VR全景家装让你体验新房落成效果
Solution qui ne peut pas être retournée après la mise à jour du navigateur Web flutter
[rapid environment construction] openharmony 10 minute tutorial (cub pie)
Spark accumulator and broadcast variables and beginners of sparksql
Unity particle special effects series - treasure chest of shining stars
Open source and safe "song of ice and fire"
Interview shock 62: what are the precautions for group by?
[introduction to MySQL] third, common data types in MySQL
The art of Engineering (1): try to package things that do not need to be exposed
Solid principle
Hongmeng introduction and development environment construction
OliveTin能在网页上安全运行shell命令(上)
Establishment of graphical monitoring grafana
BearPi-HM_ Nano development environment
趣-关于undefined的问题