当前位置:网站首页>Game optimization performance (11) - Zhihu
Game optimization performance (11) - Zhihu
2020-11-08 08:54:00 【osc_eoqljui5】
VS after , It's the rasterization stage . This stage is a fixed function ( Not programmable ) Stage , Usually considered to be highly efficient in execution , So it's often overlooked .
In fact, in terms of what I have observed , This part becomes the bottleneck situation , It's not uncommon . such as 《 Protogod 》 In the development process , That's what happened .
《 Protogod 》 So here's what happened , In the game, when characters climb trees , In order to avoid the canopy blocking the characters , There will be a translucent crown effect . Normal translucent rendering is a known performance killer , So here developers use stencil Cut out some pixels , It's called dither( shake ) Methods . If you don't understand this method , Imagine the pictures in the newspaper , It's all made up of dots .
Logically speaking , This hollowing out reduces the number of pixels that need to be rendered , That is to say PS The amount of work . But the development team found , The end result is a rise instead of a fall . In other words, rendering time has increased . And even more incredible is , By comparing the switching effect of GPU Tracking files , It can be observed that PS The amount of work is definitely reduced , But the rendering time has not changed, or even slightly longer .
In fact, the reason lies in the grating .VS The output triangle , After the grating module is rasterized , formation PS workload . Before rasterization , Will proceed according to the triangle level on the back / Positive elimination 、 Cone culling / tailoring , And zero area / Small triangle culling . however , be based on stencil Test level exclusion , It doesn't happen at the triangle level , It happened after rasterization fragment Level . in other words ,dither Although it reduces access to PS Stage fragment Number , But it doesn't affect the work of rasterization .
But if it's just that , that dither After opening , It should be faster . Because rasterization has the same amount of work , however PS Reduced workload , It should be faster . But the measurement is slower , Why is that ?
This is because on the contemporary desktop GPU among , Introduced tile-based rasterization. Note that this is not a mobile platform TB(D)R, Because it's limited to rasterization Stage .
say concretely ,GPU The unwrapping will not be rendered as triangles at one time fragment, It's at a lower resolution , such as 1/8 Target resolution , To rasterize . such as , If our picture turns out to be 1920x1080, be GPU First of all, with 240x145 This resolution is rasterized , And then for each rasterization result (8x8 Pixels ) Further rasterization .( The specific method and size are different GPU There may be significant differences in models )
There is one advantage to this approach , It can be greatly improved pre Z as well as pre Stencil The efficiency of . If a unit of low resolution (tile) On the whole pre Z Test or pre Stencil Rejected during the test , So there's no need to rasterize it more finely .
And the situation in our case is , Its use Stencil Templates , That is to say “ Hollowing out ” The template of , The pattern of the hole is not aligned with this tile. in other words , When we use tile Do it for the unit pre Stencil When , Can't refuse forever ( because tile The mask values are different , Partly through partial rejection ). In comparison, it doesn't open dither The situation of , It's like one more in vain stencil Testing, but the rasterization workload is not reduced at all , Instead, there is a query in the rasterization process stencil Steps for . So the efficiency of rasterization becomes lower .
版权声明
本文为[osc_eoqljui5]所创,转载请带上原文链接,感谢
边栏推荐
- Sum up some useful functions
- 0.计算机简史
- “智能5G”引领世界,数位智能网优+5G能带来什么?
- Insight -- the application of sanet in arbitrary style transfer
- More than 50 object detection datasets from different industries
- M 端软件产品设计思虑札记 - 知乎
- QT hybrid Python development technology: Python introduction, hybrid process and demo
- 函数周期表丨筛选丨值丨SELECTEDVALUE - 知乎
- 阅读心得:FGAGT: Flow-Guided Adaptive Graph Tracking
- 解决RabbitMQ消息丢失与重复消费问题
猜你喜欢

VC6兼容性及打开文件崩溃问题解决

Littlest JupyterHub| 02 使用nbgitpuller分发共享文件

Mate 40系列发布 搭载华为运动健康服务带来健康数字生活

分布式共识机制

Application of bidirectional LSTM in outlier detection of time series

糟糕,系统又被攻击了

PCR and PTS calculation and inverse operation in TS stream

C语言I博客作业03

ASP.NET A complete solution based on exception handling in MVC

Brief history of computer
随机推荐
进程、线程和协程的区别
Python3.9的7个特性
sed之查找替换
Sum up some useful functions
2020-11-07:已知一个正整数数组,两个数相加等于N并且一定存在,如何找到两个数相乘最小的两个数?
解决Safari浏览器下载文件文件名称乱码的问题
vivoS7e和vivoS7的区别 哪个更值得入手
What? Your computer is too bad? You can handle these moves! (win10 optimization tutorial)
Littlest jupyterhub| 02 using nbgitpuller to distribute shared files
vivoy73s和荣耀30青春版的区别
Blazor 准备好为企业服务了吗?
IOS upload app store error: this action cannot be completed - 22421 solution
nvm
NOIP 2012 提高组 复赛 第一天 第二题 国王游戏 game 数学推导 AC代码(高精度 低精度 乘 除 比较)+60代码(long long)+20分代码(全排列+深搜dfs)
Introduction to ucgui
C语言I博客作业03
糟糕,系统又被攻击了
Bili Bili common API
Review the cloud computing application scenarios you didn't expect (Part 1)
shiyou的数值分析作业