当前位置:网站首页>Intra prediction and transform kernel selection based on Neural Network
Intra prediction and transform kernel selection based on Neural Network
2022-07-28 13:27:00 【Dillon2015】
This article is from JVET-T0073 The proposal 《neural network-based intra prediction with transform selection in VVC》
brief introduction
In this paper, a neural network is proposed to directly generate prediction blocks using left and upper reference pixels , And the use of prediction LFNST The index of transformation kernel and whether transpose is needed . stay VTM-8.0 On ,all intra Under configuration YUV Of BD-Rate Respectively -3.36%,-2.95%,-2.97%, The encoding and decoding time is 395% and 3575%,random access Under configuration YUV Of BD-Rate Respectively -1.52%,-1.00%,-1.26%, The encoding and decoding time is 159% and 723%.
The overall framework
A total of 8 A model ,{4x4,8x4,16x4,32x4, 8x8,16x8,16x16,32x32}.

Use model to block wxh For the treatment process of
Express ,
Represent model parameters . For a given wxh block Y, Its adjacent pixels are X Unified expression, such as Fig.1 Shown ,X Include Y At the top of the
Pixels and left
Pixels . The whole process is like Fig.1,X After pretreatment, it is sent to the network , Network output is
and grpIdx1,gapIdx2, Generated after post-processing Y Prediction block of
. Among them, network output grpIdx1,gapIdx2 It's predicted LFNST Transform the core index and whether to transpose .
Intra prediction based on Neural Network
Pretreatment operation
Fig.1 Chinese vs X The preprocessing operations include the following 4 A step :

X Divide the reference pixel in by 2^(b-8),b Indicates bit depth .
For available reference pixels ( Rebuilt ) Subtract the mean u.
Set the unavailable reference pixel to 255.
If min(h,w)<=8, The result obtained in the previous step is flattened (flattened), This is because for min(h,w)<=8 The block network of adopts full connection processing . If min(h,w)>8, The result of the previous step is divided into two rectangular parts ,Y The reference pixel above X0 And on the left X1, This is because the network is for min(h,w)>8 The block of is processed by convolution . So if min(h,w)<=8 Output after preprocessing
yes
Dimension vector , otherwise
Network structure
If min(h,w)<=8, Then the network structure is a fully connected network , As shown in the table 1,

about 16x16 The block of uses convolution network , And the network is composed of 3 Sub network composition , Such as Fig.3,

3 The specific structure of the sub network is shown in the table 2、3、4,

about 32x32 Blocks of also use convolutional Networks , And the network is also made up of 3 Sub network composition , Such as Fig.3, The structure of each sub network is shown in table 5、6、7,

Post processing operations
Fig.1 The post-processing operations in include inputting reshape by wxh Size , Add the mean value of the available reference pixels u, And then multiplied by the 2^(b-8),

LFNST choice
Such as Fig.1, The output of the network model also includes grpIdx1,gapIdx2. according to grpIdx1,gapIdx2 You can choose LFNST And whether to transpose the transformation coefficients , As shown in the table 8.

Model transfer
Model flag bit transmission of brightness block
VVC Through the code stream nnFlag Flag bits indicate whether neural networks are used for intra prediction . If the brightness block wxh The size of meets T If the reference pixel does not exceed the image boundary, it will be transmitted in the code stream nnFlag Sign a , Otherwise, only the traditional intra prediction mode is used .

Model flag bit transmission of chrominance block
If the luminance block corresponding to the chrominance block uses neural network for intra prediction , And the size of the chroma block meets T, be DM Used to indicate whether the chroma block uses neural network , otherwise DM Or to express PLANAR Pattern .
Context information transmission
In the neural network processing flow , The pre-processing phase may be vertically down sampled 、 Sample horizontally 、 Transpose, etc , These contextual information are specified as follows ,

LFNST Prediction coding of
grpIdx Predictive coding can be used , The encoder and decoder are respectively as Fig.5 and Fig.6,


experimental result
The details of model training are shown in the table 10,

The model derivation is shown in table 11,

The experimental results are as follows ,

add LFNST The result of predictive coding of parameters is ,

Interested parties, please pay attention to WeChat official account Video Coding

边栏推荐
- [embedded C foundation] Part 2: binary conversion and BCD coding
- Change password, confirm password verification antd
- MySQL practice -- master-slave replication
- PHP生成随机数(昵称随机生成器)
- Auto.js enables Taobao to quickly submit orders
- Dry goods -- encapsulated anti shake and throttling method in the project
- Shell basic concepts and variables
- Using auto.js to realize the function of fifaol3 mobile terminal card interceptor
- 管理区解耦架构见过吗?能帮客户搞定大难题的
- Gamestop bear market entered NFT trading, and established game retailers took advantage of Web3 to make a second spring
猜你喜欢

Shell基础概念和变量

今日睡眠质量记录75分

Dimming and color matching cool light touch chip-dlt8ma12ts-jericho

How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
![[matlab]: FFT related problems](/img/08/c721394c172bdfd1d5d55aef31ccbf.png)
[matlab]: FFT related problems
![[embedded C foundation] Part 2: binary conversion and BCD coding](/img/12/d9a42cf7b4dc177d00e5dc3cdaa5cd.png)
[embedded C foundation] Part 2: binary conversion and BCD coding

屈辱、抗争、逆转,三十年,中国该赢微软一次了

org.apache.ibatis.exceptions.TooManyResultsException的异常排查过程
![[embedded C foundation] Part 6: super detailed explanation of common input and output functions](/img/eb/69264bc0d8e9349991b7b9e1b8ca22.png)
[embedded C foundation] Part 6: super detailed explanation of common input and output functions

Change password, confirm password verification antd
随机推荐
IP电话系统和VoIP系统使用指南
Rust from introduction to mastery 01 introduction
Beyond Istio OSS——Istio服务网格的现状与未来
Interview must ask, focus! Tell me about the Android application startup process and its source code?
FFT wave simulation
Leetcode-190. inverting binary bits
10、 Kubernetes scheduling principle
How to design a second kill system?
Why neural networks are ineffective?
How does the vditor renderer achieve server-side rendering (SSR)?
How does kotlin help you avoid memory leaks?
[FPGA]: Joint Simulation of FPGA and MATLAB
PCP parity principle arbitrage
Risk analysis of option trading
管理区解耦架构见过吗?能帮客户搞定大难题的
Leetcode 笔记 118. 杨辉三角
GO语言-栈的应用-表达式求值
vim常用命令详解(vim使用教程)
gicv3 spi register
nport串口服务器配置网址(串口服务器是不是网口转串口)
yes
Dimension vector , otherwise