当前位置:网站首页>[TA frost wolf \u may - hundred people plan] Figure 3.7 TP (d) r architecture of mobile terminal
[TA frost wolf \u may - hundred people plan] Figure 3.7 TP (d) r architecture of mobile terminal
2022-07-28 02:40:00 【zczplus】
【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- @[TOC](【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- 3.7.1 Current mobile terminal GPU survey
- 3.7.2 IMR
- 3.7.3 TBDR
- 3.7.4 TBR Sketch Map
- 3.7.5 TBR Advantages and disadvantages
- Under different strategies Defer Realization
- 3.7.6 Mobile TBR Optimization method
- Refer to the directory
- Homework
【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- @[TOC](【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- 3.7.1 Current mobile terminal GPU survey
- 3.7.2 IMR
- 3.7.3 TBDR
- 3.7.4 TBR Sketch Map
- 3.7.5 TBR Advantages and disadvantages
- Under different strategies Defer Realization
- 3.7.6 Mobile TBR Optimization method
- Refer to the directory
- Homework
3.7.1 Current mobile terminal GPU survey
Android Market CPU Proportion

General analysis , Proportion of Qualcomm 49.2%, Compared with other manufacturers, it has great advantages , The second is Huawei 28.6%, mediatek
Android GPU Proportion

GPU in , qualcomm Adreno And Mali The markets are 49.2% and 48.2%, and PowerVR-GPU Less used in Android , The share is only 2.6%
Comparison of power consumption of various electronic equipment
- desktop :300w
- Game consoles :150-200w
- Game book :100w
- Mainstream game books :50-60w
- Ultra extreme copy :15-25w
- Flagship tablet :8-15w
- Flagship cell phone :5-8w, Mainstream cell phones :3-5w
rough estimate , The power consumption gap between desktop and mobile is 100 times .
Desktop and mobile bandwidth comparison

A term is used to explain
- soc system on chip, Integrated CPU,GPU wait
- System Memory,soc in GPU and CPU Share a piece of on-chip LPDDR Physical memory , The general size is several G.
- besides , Both have their own independent tell SRAM be called Cache cache , It's just On-chip Memory, The size is hundreds k To what M Unequal . Read on-chip memory Is faster than reading system memory It's much faster .
- among ,on-chip memory stay TB(D)R The architecture will store Tile The color of the 、 Depth and template buffer , Make reading, writing and modification speed faster .
- Stall: When one GPU When there is a dependency between the two calculation results of the core and it must be serial , The process of waiting is called Stall.
- FillRate: Pixel fill ratio = ROP Operating clock frequency x ROP The number of x Each always ROP Number of pixels that can be processed .
3.7.2 IMR
Desktop rendering mode
Immediate mode rendering

Interact directly with system memory
3.7.3 TBDR
Tile-Based Defered Rendering
TB(D)R Simple meaning : The screen is rendered in blocks
TBR:VS - defer - RS - PS
TBDR : VS - Defer - RS - Defer - PS
Defer: Literally delay , From the perspective of rendering data ,defer Namely “ Blocking + The batch ”GPU Of “ One frame ” Multiple data of , Then deal with it together .
TBDR
- The first stage performs all geometry related processing , And generate Primitive List( List of elements ), And identify each tile What's on primitive;
- In the second stage, rasterization and subsequent processing will be performed block by block , And on completion will Frame Buffer from Tile Buffer Write back to System Memory in .
TBR And IMR The difference between :
3.7.4 TBR Sketch Map
Rendering process diagram :
In the actual GPU Out of order execution in hardware :
3.7.5 TBR Advantages and disadvantages
advantage :
- TBR To eliminate Overdraw Provide opportunities ,HSR technology (PowerVR) and Forward Pixel Killing (Mali) Technology is all about minimizing occlusion pixel Of texturing and shading.
- TBR Mainly cached friendly, stay cache The speed of reading and writing in is much faster than that in global memory , Thus reducing bandwidth consumption and saving power
shortcoming :
3. binning stay vertex After the phase , First, output the geometric data to DDR, Re be fragment shader Read . Pipelines with too much geometric data , It's easy to have performance bottlenecks here
4. If some triangles are superimposed on several tile On , You need to draw several times . This means that the total rendering time will be higher than the instant rendering mode .
Binning

Under different strategies Defer Realization
1. Adreno Of LRZ modular
- External modules are used LRZ
- Render one more step in the normal rendering pipeline VS Generate low precision Depth Texture, Used to eliminate invisible triangles
- In principle, it is realized by hardware occlusion culling ( Occlusion culling ), The function is similar to occlusion culling in soft grating
2. Arm Mali Adopted FPK
3. IOS PowerVR Of HSR
HSR:Hidden surface removal( Invisible face culling )
Use an imaginary beam of light , To sort ;
- Sort each object intersected by the projected beam
- Use chunking to reduce data set size
- Only the nearest opaque object and the nearest transparent object need to be rendered
- Birth element is eliminated
Examples of scenarios :
There are opaque objects in the picture : figure , The statue , Walls, etc ;
Transparent objects : laser ,UI etc. ;
If not used HSR To eliminate :( Be similar to Unity Inside OverDraw View effect )
after HSR Optimize :
Black areas represent opaque objects , So the object behind is completely invisible .
3.7.6 Mobile TBR Optimization method
- Remember not to use Framebuffer When clear perhaps discard
- Don't switch frequently in one frame framebuffer The binding of
- On the mobile platform , It is recommended to use Alpha Mix instead of Alpha Test,( Make decisions based on current needs ). In actual production , The use of Alpha Mix to achieve transparency , Need to carry out alpha When mixed , Try to reduce the coverage of the mixing area ( Reduce the area of the translucent area )
- It must be done on the mobile phone Alpha Test When , Do it first Depth prepass
- Try to compress the picture
- Try to use mipmap
- Try to use from Vertex Shader From the varying Variable uv value ( Successive ) Sample map , Not in FragmentShader Dynamic calculation of maps in UV value ( Discontinuous ), otherwise CacheMiss
- In delayed rendering , Make full use of Tile Buffer
- If you adjust the quality of the texture , Adjusting the overall resolution will change the frame rate , So it's mostly the problem of bandwidth
- MSAA stay TBDR Next is very fast ,MSAA At the hardware level , Happened on the film .
- Less in FS Use in discard function , call gl_FragDepth To interrupt Early-DT(HLSL In Chinese, it means Clip,GLSL In Chinese, it means discard)
- stay shader Floating point precision , Purposeful use of float,half
- On the mobile side TBDR Architecture , Vertex processing part , It's easy to be a bottleneck , Try to avoid using surface subdivision shader, Negative operations such as displacement mapping .
- Advocate the use of models LOD, Essentially reduce FrameData The pressure of the ,Unity In the application stage as soon as possible umbra Occlusion culling
Refer to the directory
- [GPU Performance indicators ]
https://www.gpuinsight.com/gpu_performance/ - [ Samsung GPU-FrameBuff To guide the ]
https://developer.samsung.com/galaxy-gamedev/resources/articles/gpu-framebuffer.html - [ NVIDIA TBR Teaching articles ]
https://www.techpowerup.com/231129/on-nvidias-tile-based-rendering - [ARM Of TBR Teaching articles ]
https://developer.arm.com/solutions/graphics-and-gaming/developer-guides/learn-the-basics/tile-based-rendering/single-page - [ Apple OpenGL Program development guide ]
https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/Performance/Performance.html - [OpenGL Insights]
https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-TileBasedArchitectures.pdf - [ Zhihu article :Tile-based and Full-screen The way of Rasterization What are the advantages and disadvantages compared to ]
https://www.zhihu.com/question/49141824 - [ Mobile devices GPU Architecture knowledge summary ]
https://zhuanlan.zhihu.com/p/112120206 - [ Further discussion on the development of mobile platform AlphaTest The efficiency problem ]
https://zhuanlan.zhihu.com/p/33127345 - [ The mobile platform GPU Hardware learning and understanding ]
https://zhuanlan.zhihu.com/p/347001411 - [PowerVR Developer Guide ]
http://cdn.imgtec.com/sdk-documentation/Introduction_to_PowerVR_for_Developers.pdf - [Performance Tunning for Tile-Based Architecture Tile-Based Performance tuning under architecture ]
https://www.cnblogs.com/gameknife/p/3515714.html - [TBDR Of HSR Process details and usage AlphaBlend The degree of efficiency improvement ]
https://www.zhihu.com/question/49141824 - [ When we talk about optimization , What are we talking about ]
https://zhuanlan.zhihu.com/p/68158277
https://edu.uwa4d.com/course-intro/1/179 - [Alpha Test The double pass Optimization idea ]
https://zhuanlan.zhihu.com/p/58017068 - [ Personal collection ]
https://github.com/killop/anything_about_game#gpu-architecture - [Adreno Hardware Tutorial 3: Tile Based Rendering]
https://www.youtube.com/watch?v=SeySx0TkluE&pbjreload=101 - [Mali GPU Unique characteristics of ]
https://www.cnblogs.com/hamwj1991/p/12404551.html - [Mali-T880]
http://grmanet.sogang.ac.kr/ihm/cs170/20/HC27.25.531-Mali-T880-Bratt-ARM-2015_08_23.pdf - [ Xiong Da's optimization suggestions ]
http://www.xionggf.com/post/unity3d/shader/u3d_shader_optimization/ - [GPU What is the order in which pixels are drawn ]
https://zhuanlan.zhihu.com/p/22232448 - [Tile-based Rasterization in Nvidia GPUs with David Kanter of Real World Tech]
https://www.youtube.com/watch?v=Nc6R1hwXhL8&t=973s&pbjreload=101
Homework
To learn unity Medium UPR(unity performance reporter) Test service , Use virtual machine or cloud real machine to test .
Yunzhen machine is more convenient :
The frame rate changes under different degrees of surface subdivision are compared :
The first is this super dense (Tessellation uniform = 64):
You can see that the frame rate is very low , Average 12.78 frame ;
Reduce the degree of surface subdivision to 16,
The average frame rate doubles directly :
Here you can see , The degree of subdivision is 16 In fact, the grass was already dense , It may be due to the small area , There is no linear decline in the frame rate mentioned in other students' homework .
When trying to use the simulator for performance debugging, I found , This surface is subdivided shader If it doesn't work directly, it will report an error 
Use or unity This night God simulator is recommended , Try to modify the rendering mode of the graphics card, and it cannot be eliminated bug, It's strange , Make a note of .
边栏推荐
- 关于Sqli-labs单引号不报错的问题
- Smart contract security -- selfdestroy attack
- How MySQL uses indexes (glory Collection Edition)
- [data processing] boxplot drawing
- 功能测试和非功能测试区别简析,上海好口碑软件测试公司推荐
- MySQL数据库InnoDB存储引擎中的锁机制(荣耀典藏版)
- [Yugong series] use of tabby integrated terminal in July 2022
- 使用BigDecimal类型应该避免哪些问题?(荣耀典藏版)
- How to put app on the app store?
- Class notes (5) (1) - 593. Binary search
猜你喜欢

【TA-霜狼_may-《百人计划》】图形3.5 Early-z 和 Z-prepass

Today in history: the father of database passed away; Apple buys cups code; IBM chip Alliance

一文读懂Plato Farm的ePLATO,以及其高溢价缘由

【TA-霜狼_may-《百人计划》】图形3.7 移动端TP(D)R架构

第二季度邮件安全报告:邮件攻击暴增4倍,利用知名品牌获取信任
![[Yugong series] July 2022 go teaching course 019 - for circular structure](/img/40/b4e673de0462c3dd6ca8b8fb513914.png)
[Yugong series] July 2022 go teaching course 019 - for circular structure
![This operation may not be worth money, but it is worth learning | [batch cutting of pictures]](/img/e8/a34e471b0089f8085b140c74b5c01f.jpg)
This operation may not be worth money, but it is worth learning | [batch cutting of pictures]

初识C语言 -- 操作符和关键字,#define,指针

「冒死上传」Proe/Creo产品结构设计-止口与扣位
![[Yugong series] use of tabby integrated terminal in July 2022](/img/df/bf01fc77ae019200d1bf57be783cb9.png)
[Yugong series] use of tabby integrated terminal in July 2022
随机推荐
Should programmers choose outsourcing companies
Product axure9 English version, using repeater repeater repeater to realize multi-choice and single choice
mysql 如图所示,现有表a,表b,需求为 通过projectcode关联a、b表,查出address不同的 idcardnum。
[hcip] BGP Foundation
Ceresdao: new endorsement of ventures Dao
[self growth website collection]
Notes for the fourth time of first knowing C language
Ceresdao: the world's first decentralized digital asset management protocol based on Dao enabled Web3.0
MYSQL解决死锁之路 - 常见 SQL 语句的加锁分析
Deep understanding of recursion
MySQL blocking monitoring script
Pytorch optimizer settings
别人发你的jar包你如何使用(如何使用别人发您的jar包)
Special network technology virtual host PHP version setting
Three core issues of concurrent programming (glory Collection Edition)
Digital empowerment and innovation in the future: hese eredi appears at the 5th Digital China Construction Summit
mysql: error while loading shared libraries: libtinfo.so. 5 solutions
Wechat campus maintenance and repair applet graduation design finished product of applet completion work (4) opening report
借助Elephant Swap打造的ePLATO,背后的高溢价解析
Design of edit memory path of edit box in Gui