当前位置:网站首页>Suning's practice of large scale alarm convergence and root cause location based on Knowledge Map

Suning's practice of large scale alarm convergence and root cause location based on Knowledge Map

2020-11-09 12:09:00 InfoQ

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" One 、 summary "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Knowledge map has strong knowledge expression ability 、 Intuitive information presentation ability and good reasoning interpretability , So the knowledge map is in the recommendation system 、 Question answering system 、 Search engine 、 Health care 、 Biopharmaceutical and other fields have a wide range of applications ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" The construction of operation and maintenance knowledge map is relative to the construction of knowledge map in other fields , It has a natural advantage , The topology of network devices 、 The call relation of system application can quickly form the entity and relation in the knowledge map of software and hardware . Historical alarm data contains a lot of correlation 、 Causal relationship , Using causal discovery algorithm , It can also effectively build alarm knowledge map . Search path based on weight on knowledge map , We can give the path of root cause propagation , It is convenient for operation and maintenance personnel to make intervention decision quickly ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Suning passed CMDB、 Call chain and other data to build software and hardware knowledge map , On this basis, the alarm knowledge map is constructed by historical alarm data , Finally, the knowledge map is used to alarm convergence and root cause localization . This paper mainly includes the construction of operation and maintenance knowledge map 、 Knowledge map storage 、 Alarm convergence and root cause location etc ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" Two 、 Pain points and product countermeasures evolution "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" Pain points "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" Complexity of Suning's internal systems and services :"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6000+ System , The number is still increasing ;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" The way of inter system call is complex : Most use RSF, Also have HTTP、HESSIAN etc. ;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" The complexity of Suning's business : New and old offline business , There will be a lot of correlation between these business systems ."}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" The complexity of the underlying environment :"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Multi-data center , Each data center will be divided into multiple logical rooms and deployment environments ;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Server size 30w+, for example , Cache servers can have thousands of servers ;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Equipment complexity : Multi brand switch , Router , Load balancing ,OpenStack, KVM, k8s Next docker,swarm Under the docker etc. ."}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢