当前位置:网站首页>Amway! How to provide high-quality issue? That's what Xueba wrote!
Amway! How to provide high-quality issue? That's what Xueba wrote!
2022-06-26 23:56:00 【Shengsi mindspire】

introduction
This tweet is for Shengsi MindSpore The quality of the community ISSUE, Amway gives you reference and learning , Click below “ Read the original ” You can jump gitee original text , more ISSUE Please see the link for details :
https://gitee.com/mindspore/mindspore/issues
summary
UB The fusion process is as follows :
adopt Pass Match the operator to be fused , take fusion id Set to node Properties of
Fuse the matched small operators into FusionOp
Initialize fusion information , Including determining fusion scope Included nodes 、 Determine the input and output nodes, etc
Check each fusion scope Whether it will form a ring ( call CheckCircle function )
Compile the fusion operator , Compilation failed fusion scope Do not do UB The fusion
For each fusion scope, First, check whether the ring is formed ( call CheckCircle function ), If it does not form a ring, create a fusion operator and replace it on the original graph
According to the actual measurement ,CheckCircle Function in UB The time-consuming proportion of fusion is 86%, So you need to CheckCircle To optimize .
UB There is no ring on the graph before merging , But after fusion, a ring may be formed . As shown in the figure below :

hold A、B and C Nodes are merged into E after ,E and D Formed a ring .

Optimization point
Optimization point 1
From the above processing flow , There are two ring forming inspections :
Check before compiling the operator for the first time , The goal is to prevent compilation from taking too long , Eliminate unnecessary operator compilation in advance through looping check , Lifting performance . From the test data , At this time CheckCircle() Function called 1719 Time , among 51 Secondary cyclization , The ring forming ratio is only 3%, So this check can delete .
The second time is at every FusionOp Before replacing , The check at this time cannot be deleted .
An optimization method : Delete the first looping inspection .

Optimization point 2
Current looping inspection , from fusion scope Input node of ( Image below C and D) Start traversing its predecessor nodes ( The figure below C and D The precursors of all are B).
As shown in the figure below , First input from C To traverse the C->B-A, And then input D To traverse the D->B->A, And that leads to B and A Repeatedly visited .

So I'm checking fusion scope when , You can record the visited nodes , Avoid repeated visits .
Optimization point 3
At present UB Integrated pattern Basically, they are single input and single output structures :

According to this feature, some optimization can be done , Avoid unnecessary checks .
If all inputs are passed to fusion scope The first entry node in the ( The figure below 2 In a scene C node ), Then the fusion will not form a ring , Looping check can be skipped .

If the input is passed to different entry nodes ( In the left C and D) Or not the first entry node (fusion scope The internal nodes are topologically ordered , Judging right graph from topological order C Not the first entry node ) It may form a ring .

in consideration of UB The fusion pattern Features and implementation complexity of , Make the following judgments : If all inputs are passed to fusion scope The first node in the , Skip the looping inspection .
Because topology sorting has been done , So the first node must be the entry node .
This condition is too strict , Some other scenarios do not need to be checked , Consider other scenarios in UB The proportion of integration is very small , Temporary does not support .
It is found in the actual measurement that fusion scope Accounted for as :1616/1719 = 94%.
Optimization point 4
If all outputs are generated by fusion scope The last exit node in the ( On the left of the figure below B And the one on the right C node ) produce , Then the fusion will not form a ring , Looping check can be skipped .

If all outputs are from different exit nodes ( In the left B and D) Produced or produced by intermediate nodes ( On the right B) It may form a ring .

in consideration of UB The fusion pattern Features and implementation complexity of , Make the following judgments : If all outputs are generated by fusion scope The last exit node in the , Skip the looping inspection .
It is found in the actual measurement that fusion scope Accounted for as :1617/1719 = 94%, And optimization points 3 The proportion is the same .
Verification effect
Time units in the following table : second .



MindSpore Official information
GitHub : https://github.com/mindspore-ai/mindspore
Gitee : https : //gitee.com/mindspore/mindspore
official QQ Group : 486831414
边栏推荐
- How to open an account on the mobile phone? Is it safe to open an account online and speculate in stocks
- How to download on selenium computer -selenium download and installation graphic tutorial [ultra detailed]
- 股票开户有哪些优惠活动?手机开户安全么?
- 6.24 learning content
- CVPR2022-不对称分辨率图像的立体匹配
- [微服务]Nacos
- leetcode 1143. Longest Commom Subsequence 最长公共子序列(中等)
- DAST black box vulnerability scanner part 5: vulnerability scanning engine and service capability
- golang语言的开发学习路线
- 代码之外:写作是倒逼成长的最佳方式
猜你喜欢
![[微服務]認識微服務](/img/62/e826e692e7fd6e6e8dab2baa4dd170.png)
[微服務]認識微服務

Learun low code OA system construction platform

利用burp精准定位攻击者

golang语言的开发学习路线

An article takes you to learn container escape

CVE-2022-30190 Follina Office RCE分析【附自定义word模板POC】

Typera set title auto numbering

运筹说 第66期|贝尔曼也有“演讲恐惧症”?
![[微服务]Nacos](/img/69/6641e943c4366d5591acdf9e12389c.png)
[微服务]Nacos
![[interface] pyqt5 and swing transformer for face recognition](/img/37/b259627a8ffd82afe8e8f3029bf290.png)
[interface] pyqt5 and swing transformer for face recognition
随机推荐
买股票在手机上开户安全吗 网上开户炒股安全吗
golang语言的开发学习路线
A simple and crude method for exporting R language list to local
go语言的爬虫和中间件
能在手机上开户炒股吗 网上开户炒股安全吗
[微服務]認識微服務
[micro service]nacos
Why don't I recommend going to sap training institution for training?
Open world mecha games phantom Galaxy
[strong foundation program] video of calculus in mathematics and Physics Competition
Crawler and Middleware of go language
为什么EDR需要深度防御来打击勒索软件?
【leetcode】275. H index II
PHP code audit series (I) basis: methods, ideas and processes
Service discovery, storage engine and static website of go language
xshell的安装、xftp的安装
Openpyxl module
Implement the queue through two stacks
Cve-2022-30190 follina office rce analysis [attached with customized word template POC]
50 tips that unity beginners can definitely use