当前位置:网站首页>Sre core system understanding
Sre core system understanding
2022-07-05 06:45:00 【Dream of finding flowers~~】
List of articles
- SRE The source of the
- SRE What is it? ?
- What is not SRE?
- SRE The five foundations of architecture
- SLO What is it? ?
- What the system needs uptime It's a few 9?
- SLO Level distribution
- take SLI The measurement value is converted to SLO Percentile
- Wrong budget logic
- Chicken eating game case analysis
- Implementation oriented SLO System monitoring
- Collect the indicators of load balancer
- Index calculation
- Use 4 Weekly data calculation initial SLO
- establish SLO Relevant documents and communication process
- be based on SLO Budget decisions and mistakes
- SRE Working principles
Reference resources B Link to station video materials
https://www.bilibili.com/video/BV1ak4y1975Z?spm_id_from=333.1007.top_right_bar_window_custom_collection.content.clickSRE The source of the
SRE What is it? ?
- SRE- Full name ”Site Reliability Engineering“, Station reliability engineering , Come of 2003 year
- A reliable framework for the operation and maintenance of large-scale systems
- It's about letting software engineers design the operation and maintenance functions
- Be responsible for the operation of the production system at the operation and maintenance level
- Build and operate high reliability systems 、 The best way to apply universally
What is not SRE?
- SRE The principle sounds good , however , It is not acclimatized here , oranges change with their environment , It can only grow in a specific culture , It only makes sense for super large scale
- SRE vs DevOps, There is a conflict between the two , Who is better? ? Which direction should I choose
- Traditional engineers and teams can be renamed SRE The engineer / The team / department
SRE The five foundations of architecture
SLO What is it? ?
- The service quality objective of the system defines the normal performance of the system
- Focus on tracking customers ( people / machine ) Using experience of
- If the customer is satisfied , that SLO It's up to the standard
What the system needs uptime It's a few 9?
- 2 individual 9 yes :99%
- 3 individual 9 yes :99.9%
- 4 individual 9 yes :99.99%
- 5 individual 9 yes :99.999%
- 6 individual 9 yes :99.9999%
- 7 individual 9 yes :99.99999%
SLA Uptime Online calculator :
https://www.xarg.org/tools/sla-uptime-calculator/
SLO Level distribution
take SLI The measurement value is converted to SLO Percentile
face SLI The measurement , The units of monitoring indicators are inconsistent ;
- Network traffic MB/s、 Disk write write/s、HTTP Respond to ms、 How long does the homepage of the website open s wait
Continuous measurement SLI The numerical , And will collect SLI Values are converted to values in different percentiles :
- In the recent 10 Within minutes ,SLI- Opening time of homepage ,P90(90%) The mean for 259ms
- In the recent 10 Within minutes ,SLI- Opening time of homepage ,P99(99%) The mean for 589ms
- In the recent 10 Within minutes ,SLI- Disk write ,P90(90%) The mean for 45 write/s
- In the recent 10 Within minutes ,SLI- Disk write ,P99(99%) The mean for 12 write/s
reflection
SLI The measurement value is P90 and P99 The state of distribution , Is the customer satisfied ?
Wrong budget logic
Chicken eating game case analysis
Implementation oriented SLO System monitoring
Collect the indicators of load balancer
CloudWatch It can provide data collection
github Address :https://github.com/prometheus/cloudwatch_exporter
use Prometheus Monitoring tools notation Express SLI, Part of the sample code is as follows :
Index calculation
Use 4 Weekly data calculation initial SLO
establish SLO Relevant documents and communication process
- Establish a formal for mobile game application system 《SLO file 》
– Gain recognition from all stakeholders : The product manager 、 Developer 、 Operations staff - establish 《 Wrong budget strategy 》 file
– Consequence oriented , Authorized by the management ,SRE Have the right to stop the delivery of features , Have the right to return the operation and maintenance of the system to the development team - establish SLO Monitoring instrument panel 、 Report and wrong budget burnout chart
- Continue to optimize SLO Goal setting , Continuously optimize the monitoring mode
be based on SLO Budget decisions and mistakes
SRE Working principles
SRE Need to design and implement consequence oriented SLO.
Any organization , Even one SRE No need to hire , Can design the wrong budget strategy .
This means identifying and using any hand that can prevent customers from experiencing pain points .
You can start implementing : Measure 、 be responsible for 、 action
SRE Need time to optimize and improve .
once SRE Personnel are ready : Make sure they know , Their job is not to continue to suffer the crime of operation and maintenance , Instead, optimize the operation and maintenance work every day .
” Smarter jobs “ It may mean doing different things : It depends SRE What are the most useful and valuable work items you can find .
SRE Need to be able to regulate their workload .
SRE The team needs to be able to prioritize and work .
The maintenance of each new system requires labor costs .
Must be able to suppress unreliable work practices , Push back unreliable systems .
边栏推荐
猜你喜欢
随机推荐
‘mongoexport‘ 不是内部或外部命令,也不是可运行的程序 或批处理文件。
The route of wechat applet jumps again without triggering onload
cgroup_ memcg
CGroup CPU group source code analysis
The “mode“ argument must be integer. Received an instance of Object
Game theory acwing 893 Set Nim game
【高德地图POI踩坑】AMap.PlaceSearch无法使用
区间问题 AcWing 906. 区间分组
Adg5412fbruz-rl7 applies dual power analog switch and multiplexer IC
Instruction execution time
求组合数 AcWing 887. 求组合数 III
[Chongqing Guangdong education] National Open University 2018 autumn 0702-22t contemporary Chinese political system reference questions
2022/6/29-日报
AE tutorial - path growth animation
将webApp或者H5页面打包成App
Game theory acwing 892 Steps Nim game
20220213-CTF MISC-a_ good_ Idea (use of stegsolve tool) -2017_ Dating_ in_ Singapore
.net core踩坑实践
Bit of MySQL_ OR、BIT_ Count function
FFmpeg build下载(包含old version)