当前位置:网站首页>[TA frost wolf \u may - hundred people plan] Figure 3.7 TP (d) r architecture of mobile terminal
[TA frost wolf \u may - hundred people plan] Figure 3.7 TP (d) r architecture of mobile terminal
2022-07-28 02:40:00 【zczplus】
【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- @[TOC](【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- 3.7.1 Current mobile terminal GPU survey
- 3.7.2 IMR
- 3.7.3 TBDR
- 3.7.4 TBR Sketch Map
- 3.7.5 TBR Advantages and disadvantages
- Under different strategies Defer Realization
- 3.7.6 Mobile TBR Optimization method
- Refer to the directory
- Homework
【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- @[TOC](【TA- Frost Wolf _may-《 Hundred people plan 》】 graphics 3.7 Mobile TP(D)R framework
- 3.7.1 Current mobile terminal GPU survey
- 3.7.2 IMR
- 3.7.3 TBDR
- 3.7.4 TBR Sketch Map
- 3.7.5 TBR Advantages and disadvantages
- Under different strategies Defer Realization
- 3.7.6 Mobile TBR Optimization method
- Refer to the directory
- Homework
3.7.1 Current mobile terminal GPU survey
Android Market CPU Proportion

General analysis , Proportion of Qualcomm 49.2%, Compared with other manufacturers, it has great advantages , The second is Huawei 28.6%, mediatek
Android GPU Proportion

GPU in , qualcomm Adreno And Mali The markets are 49.2% and 48.2%, and PowerVR-GPU Less used in Android , The share is only 2.6%
Comparison of power consumption of various electronic equipment
- desktop :300w
- Game consoles :150-200w
- Game book :100w
- Mainstream game books :50-60w
- Ultra extreme copy :15-25w
- Flagship tablet :8-15w
- Flagship cell phone :5-8w, Mainstream cell phones :3-5w
rough estimate , The power consumption gap between desktop and mobile is 100 times .
Desktop and mobile bandwidth comparison

A term is used to explain
- soc system on chip, Integrated CPU,GPU wait
- System Memory,soc in GPU and CPU Share a piece of on-chip LPDDR Physical memory , The general size is several G.
- besides , Both have their own independent tell SRAM be called Cache cache , It's just On-chip Memory, The size is hundreds k To what M Unequal . Read on-chip memory Is faster than reading system memory It's much faster .
- among ,on-chip memory stay TB(D)R The architecture will store Tile The color of the 、 Depth and template buffer , Make reading, writing and modification speed faster .
- Stall: When one GPU When there is a dependency between the two calculation results of the core and it must be serial , The process of waiting is called Stall.
- FillRate: Pixel fill ratio = ROP Operating clock frequency x ROP The number of x Each always ROP Number of pixels that can be processed .
3.7.2 IMR
Desktop rendering mode
Immediate mode rendering

Interact directly with system memory
3.7.3 TBDR
Tile-Based Defered Rendering
TB(D)R Simple meaning : The screen is rendered in blocks
TBR:VS - defer - RS - PS
TBDR : VS - Defer - RS - Defer - PS
Defer: Literally delay , From the perspective of rendering data ,defer Namely “ Blocking + The batch ”GPU Of “ One frame ” Multiple data of , Then deal with it together .
TBDR
- The first stage performs all geometry related processing , And generate Primitive List( List of elements ), And identify each tile What's on primitive;
- In the second stage, rasterization and subsequent processing will be performed block by block , And on completion will Frame Buffer from Tile Buffer Write back to System Memory in .
TBR And IMR The difference between :
3.7.4 TBR Sketch Map
Rendering process diagram :
In the actual GPU Out of order execution in hardware :
3.7.5 TBR Advantages and disadvantages
advantage :
- TBR To eliminate Overdraw Provide opportunities ,HSR technology (PowerVR) and Forward Pixel Killing (Mali) Technology is all about minimizing occlusion pixel Of texturing and shading.
- TBR Mainly cached friendly, stay cache The speed of reading and writing in is much faster than that in global memory , Thus reducing bandwidth consumption and saving power
shortcoming :
3. binning stay vertex After the phase , First, output the geometric data to DDR, Re be fragment shader Read . Pipelines with too much geometric data , It's easy to have performance bottlenecks here
4. If some triangles are superimposed on several tile On , You need to draw several times . This means that the total rendering time will be higher than the instant rendering mode .
Binning

Under different strategies Defer Realization
1. Adreno Of LRZ modular
- External modules are used LRZ
- Render one more step in the normal rendering pipeline VS Generate low precision Depth Texture, Used to eliminate invisible triangles
- In principle, it is realized by hardware occlusion culling ( Occlusion culling ), The function is similar to occlusion culling in soft grating
2. Arm Mali Adopted FPK
3. IOS PowerVR Of HSR
HSR:Hidden surface removal( Invisible face culling )
Use an imaginary beam of light , To sort ;
- Sort each object intersected by the projected beam
- Use chunking to reduce data set size
- Only the nearest opaque object and the nearest transparent object need to be rendered
- Birth element is eliminated
Examples of scenarios :
There are opaque objects in the picture : figure , The statue , Walls, etc ;
Transparent objects : laser ,UI etc. ;
If not used HSR To eliminate :( Be similar to Unity Inside OverDraw View effect )
after HSR Optimize :
Black areas represent opaque objects , So the object behind is completely invisible .
3.7.6 Mobile TBR Optimization method
- Remember not to use Framebuffer When clear perhaps discard
- Don't switch frequently in one frame framebuffer The binding of
- On the mobile platform , It is recommended to use Alpha Mix instead of Alpha Test,( Make decisions based on current needs ). In actual production , The use of Alpha Mix to achieve transparency , Need to carry out alpha When mixed , Try to reduce the coverage of the mixing area ( Reduce the area of the translucent area )
- It must be done on the mobile phone Alpha Test When , Do it first Depth prepass
- Try to compress the picture
- Try to use mipmap
- Try to use from Vertex Shader From the varying Variable uv value ( Successive ) Sample map , Not in FragmentShader Dynamic calculation of maps in UV value ( Discontinuous ), otherwise CacheMiss
- In delayed rendering , Make full use of Tile Buffer
- If you adjust the quality of the texture , Adjusting the overall resolution will change the frame rate , So it's mostly the problem of bandwidth
- MSAA stay TBDR Next is very fast ,MSAA At the hardware level , Happened on the film .
- Less in FS Use in discard function , call gl_FragDepth To interrupt Early-DT(HLSL In Chinese, it means Clip,GLSL In Chinese, it means discard)
- stay shader Floating point precision , Purposeful use of float,half
- On the mobile side TBDR Architecture , Vertex processing part , It's easy to be a bottleneck , Try to avoid using surface subdivision shader, Negative operations such as displacement mapping .
- Advocate the use of models LOD, Essentially reduce FrameData The pressure of the ,Unity In the application stage as soon as possible umbra Occlusion culling
Refer to the directory
- [GPU Performance indicators ]
https://www.gpuinsight.com/gpu_performance/ - [ Samsung GPU-FrameBuff To guide the ]
https://developer.samsung.com/galaxy-gamedev/resources/articles/gpu-framebuffer.html - [ NVIDIA TBR Teaching articles ]
https://www.techpowerup.com/231129/on-nvidias-tile-based-rendering - [ARM Of TBR Teaching articles ]
https://developer.arm.com/solutions/graphics-and-gaming/developer-guides/learn-the-basics/tile-based-rendering/single-page - [ Apple OpenGL Program development guide ]
https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/Performance/Performance.html - [OpenGL Insights]
https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-TileBasedArchitectures.pdf - [ Zhihu article :Tile-based and Full-screen The way of Rasterization What are the advantages and disadvantages compared to ]
https://www.zhihu.com/question/49141824 - [ Mobile devices GPU Architecture knowledge summary ]
https://zhuanlan.zhihu.com/p/112120206 - [ Further discussion on the development of mobile platform AlphaTest The efficiency problem ]
https://zhuanlan.zhihu.com/p/33127345 - [ The mobile platform GPU Hardware learning and understanding ]
https://zhuanlan.zhihu.com/p/347001411 - [PowerVR Developer Guide ]
http://cdn.imgtec.com/sdk-documentation/Introduction_to_PowerVR_for_Developers.pdf - [Performance Tunning for Tile-Based Architecture Tile-Based Performance tuning under architecture ]
https://www.cnblogs.com/gameknife/p/3515714.html - [TBDR Of HSR Process details and usage AlphaBlend The degree of efficiency improvement ]
https://www.zhihu.com/question/49141824 - [ When we talk about optimization , What are we talking about ]
https://zhuanlan.zhihu.com/p/68158277
https://edu.uwa4d.com/course-intro/1/179 - [Alpha Test The double pass Optimization idea ]
https://zhuanlan.zhihu.com/p/58017068 - [ Personal collection ]
https://github.com/killop/anything_about_game#gpu-architecture - [Adreno Hardware Tutorial 3: Tile Based Rendering]
https://www.youtube.com/watch?v=SeySx0TkluE&pbjreload=101 - [Mali GPU Unique characteristics of ]
https://www.cnblogs.com/hamwj1991/p/12404551.html - [Mali-T880]
http://grmanet.sogang.ac.kr/ihm/cs170/20/HC27.25.531-Mali-T880-Bratt-ARM-2015_08_23.pdf - [ Xiong Da's optimization suggestions ]
http://www.xionggf.com/post/unity3d/shader/u3d_shader_optimization/ - [GPU What is the order in which pixels are drawn ]
https://zhuanlan.zhihu.com/p/22232448 - [Tile-based Rasterization in Nvidia GPUs with David Kanter of Real World Tech]
https://www.youtube.com/watch?v=Nc6R1hwXhL8&t=973s&pbjreload=101
Homework
To learn unity Medium UPR(unity performance reporter) Test service , Use virtual machine or cloud real machine to test .
Yunzhen machine is more convenient :
The frame rate changes under different degrees of surface subdivision are compared :
The first is this super dense (Tessellation uniform = 64):
You can see that the frame rate is very low , Average 12.78 frame ;
Reduce the degree of surface subdivision to 16,
The average frame rate doubles directly :
Here you can see , The degree of subdivision is 16 In fact, the grass was already dense , It may be due to the small area , There is no linear decline in the frame rate mentioned in other students' homework .
When trying to use the simulator for performance debugging, I found , This surface is subdivided shader If it doesn't work directly, it will report an error 
Use or unity This night God simulator is recommended , Try to modify the rendering mode of the graphics card, and it cannot be eliminated bug, It's strange , Make a note of .
边栏推荐
- 实际工作中,我是如何使用 Postman 做接口测试?
- Wechat campus maintenance and repair applet graduation design finished product of applet completion work (4) opening report
- 2022.7.8 eth price analysis
- How to put app on the app store?
- "Risking your life to upload" proe/creo product structure design - seam and buckle
- MySQL数据库InnoDB存储引擎中的锁机制(荣耀典藏版)
- 【HCIP】BGP 特性
- JVM tuning -xms -xmx -xmn -xss
- POC模拟攻击利器 —— Nuclei入门(一)
- "The faster the code is written, the slower the program runs"
猜你喜欢

【HCIP】路由策略、策略路由

一文读懂Plato Farm的ePLATO,以及其高溢价缘由

Find - block search

Important arrangements - the follow-up live broadcast of dx12 engine development course will be held at station B

使用BigDecimal类型应该避免哪些问题?(荣耀典藏版)

Chapter 3 business function development (batch export of market activities, Apache POI)

The virtual host website cannot access the self-test method

AWS elastic three swordsman

With elephant & nbsp; Eplato created by swap, analysis of the high premium behind it

Newline required at end of file but not found.
随机推荐
Mysql Explain 详解(荣耀典藏版)
功能测试和非功能测试区别简析,上海好口碑软件测试公司推荐
Compile and use Qwt in qt|vs2017
上课笔记(5)(1)——#593. 二分查找(binary)
Feign calls get and post records
软件产品第三方测试费用为什么没有统一的报价?
Read Plato & nbsp; Eplato of farm and the reasons for its high premium
regular expression
【TA-霜狼_may-《百人计划》】图形3.7 移动端TP(D)R架构
【 图像去雾】基于暗通道和非均值滤波实现图像去雾附matlab代码
树的孩子兄弟表示法
正则表达式
The virtual host website cannot access the self-test method
mysql: error while loading shared libraries: libtinfo.so. 5 solutions
作业7.27 IO进程
MySQL's way to solve deadlock - lock analysis of common SQL statements
LETV responded that employees live an immortal life without internal problems and bosses; Apple refuses to store user icloud data in Russia; Dapr 1.8.0 release | geek headlines
Find - block search
Lock mechanism in MySQL database InnoDB storage engine (glory Collection Edition)
[understanding of opportunity -53]: Yang Mou stands up and plots to defend himself