当前位置:网站首页>Amway! How to provide high-quality issue? That's what Xueba wrote!
Amway! How to provide high-quality issue? That's what Xueba wrote!
2022-06-26 23:56:00 【Shengsi mindspire】

introduction
This tweet is for Shengsi MindSpore The quality of the community ISSUE, Amway gives you reference and learning , Click below “ Read the original ” You can jump gitee original text , more ISSUE Please see the link for details :
https://gitee.com/mindspore/mindspore/issues
summary
UB The fusion process is as follows :
adopt Pass Match the operator to be fused , take fusion id Set to node Properties of
Fuse the matched small operators into FusionOp
Initialize fusion information , Including determining fusion scope Included nodes 、 Determine the input and output nodes, etc
Check each fusion scope Whether it will form a ring ( call CheckCircle function )
Compile the fusion operator , Compilation failed fusion scope Do not do UB The fusion
For each fusion scope, First, check whether the ring is formed ( call CheckCircle function ), If it does not form a ring, create a fusion operator and replace it on the original graph
According to the actual measurement ,CheckCircle Function in UB The time-consuming proportion of fusion is 86%, So you need to CheckCircle To optimize .
UB There is no ring on the graph before merging , But after fusion, a ring may be formed . As shown in the figure below :

hold A、B and C Nodes are merged into E after ,E and D Formed a ring .

Optimization point
Optimization point 1
From the above processing flow , There are two ring forming inspections :
Check before compiling the operator for the first time , The goal is to prevent compilation from taking too long , Eliminate unnecessary operator compilation in advance through looping check , Lifting performance . From the test data , At this time CheckCircle() Function called 1719 Time , among 51 Secondary cyclization , The ring forming ratio is only 3%, So this check can delete .
The second time is at every FusionOp Before replacing , The check at this time cannot be deleted .
An optimization method : Delete the first looping inspection .

Optimization point 2
Current looping inspection , from fusion scope Input node of ( Image below C and D) Start traversing its predecessor nodes ( The figure below C and D The precursors of all are B).
As shown in the figure below , First input from C To traverse the C->B-A, And then input D To traverse the D->B->A, And that leads to B and A Repeatedly visited .

So I'm checking fusion scope when , You can record the visited nodes , Avoid repeated visits .
Optimization point 3
At present UB Integrated pattern Basically, they are single input and single output structures :

According to this feature, some optimization can be done , Avoid unnecessary checks .
If all inputs are passed to fusion scope The first entry node in the ( The figure below 2 In a scene C node ), Then the fusion will not form a ring , Looping check can be skipped .

If the input is passed to different entry nodes ( In the left C and D) Or not the first entry node (fusion scope The internal nodes are topologically ordered , Judging right graph from topological order C Not the first entry node ) It may form a ring .

in consideration of UB The fusion pattern Features and implementation complexity of , Make the following judgments : If all inputs are passed to fusion scope The first node in the , Skip the looping inspection .
Because topology sorting has been done , So the first node must be the entry node .
This condition is too strict , Some other scenarios do not need to be checked , Consider other scenarios in UB The proportion of integration is very small , Temporary does not support .
It is found in the actual measurement that fusion scope Accounted for as :1616/1719 = 94%.
Optimization point 4
If all outputs are generated by fusion scope The last exit node in the ( On the left of the figure below B And the one on the right C node ) produce , Then the fusion will not form a ring , Looping check can be skipped .

If all outputs are from different exit nodes ( In the left B and D) Produced or produced by intermediate nodes ( On the right B) It may form a ring .

in consideration of UB The fusion pattern Features and implementation complexity of , Make the following judgments : If all outputs are generated by fusion scope The last exit node in the , Skip the looping inspection .
It is found in the actual measurement that fusion scope Accounted for as :1617/1719 = 94%, And optimization points 3 The proportion is the same .
Verification effect
Time units in the following table : second .



MindSpore Official information
GitHub : https://github.com/mindspore-ai/mindspore
Gitee : https : //gitee.com/mindspore/mindspore
official QQ Group : 486831414
边栏推荐
猜你喜欢

Let agile return to its original source -- Some Thoughts on reading the way of agile neatness

Why does EDR need defense in depth to combat ransomware?

On cap theorem in distributed system development technology

go语言的服务发现、存储引擎、静态网站

A simple and crude method for exporting R language list to local

全网最全的混合精度训练原理

阿里云服务器的购买、基本配置、(xshell)远程连接、搭建环境

Can't write to avoid killing and can easily go online CS through defender

PHP代码审计系列(一) 基础:方法、思路、流程

通过两个stack来实现Queue
随机推荐
通过两个stack来实现Queue
leetcode 1143. Longest common subsequence (medium)
[test] the content of the hottest test development learning route has been updated again to help pass the customs and open the test of large factories
Understanding of "the eigenvectors corresponding to different eigenvalues cannot be orthogonalized"
【Try to Hack】正向shell和反向shell
Wechat applet automatically generates punch in Poster
超硬核!华为智慧屏上的家庭相册竟可以自动精准分类?
微信小程序自动生成打卡海报
目标追踪拍摄?目标遮挡拍摄?拥有19亿安装量的花瓣app,究竟有什么别出心裁的功能如此吸引用户?
邮箱附件钓鱼常用技法
不会写免杀也能轻松过defender上线CS
Cve-2022-30190 follina office rce analysis [attached with customized word template POC]
电子协会 C语言 1级 29 、 对齐输出
Different subsequence problems I
Why don't I recommend going to sap training institution for training?
Crawler and Middleware of go language
6.24 学习内容
Unity4.6版本下载
Introduction de l'opérateur
万字详解-MindArmour 小白教程!