当前位置：网站首页>Intra prediction and transform kernel selection based on Neural Network

Intra prediction and transform kernel selection based on Neural Network

2022-07-28 13:27:00 【Dillon2015】

This article is from JVET-T0073 The proposal 《neural network-based intra prediction with transform selection in VVC》

brief introduction

In this paper, a neural network is proposed to directly generate prediction blocks using left and upper reference pixels , And the use of prediction LFNST The index of transformation kernel and whether transpose is needed . stay VTM-8.0 On ,all intra Under configuration YUV Of BD-Rate Respectively -3.36%,-2.95%,-2.97%, The encoding and decoding time is 395% and 3575%,random access Under configuration YUV Of BD-Rate Respectively -1.52%,-1.00%,-1.26%, The encoding and decoding time is 159% and 723%.

The overall framework

A total of 8 A model ,{4x4,8x4,16x4,32x4, 8x8,16x8,16x16,32x32}.

Use model to block wxh For the treatment process of $f_{h,w}(.,\theta _{h,w})$ Express , $\theta _{h,w}$ Represent model parameters . For a given wxh block Y, Its adjacent pixels are X Unified expression, such as Fig.1 Shown ,X Include Y At the top of the $n_{a}\times (n_{l}+2w )$ Pixels and left $n_{l}\times 2h$ Pixels . The whole process is like Fig.1,X After pretreatment, it is sent to the network , Network output is $\tilde{Y}$ and grpIdx1,gapIdx2, Generated after post-processing Y Prediction block of $\hat{Y}$ . Among them, network output grpIdx1,gapIdx2 It's predicted LFNST Transform the core index and whether to transpose .

Intra prediction based on Neural Network

Pretreatment operation

Fig.1 Chinese vs X The preprocessing operations include the following 4 A step ：

X Divide the reference pixel in by 2^(b-8),b Indicates bit depth .
For available reference pixels （ Rebuilt ） Subtract the mean u.
Set the unavailable reference pixel to 255.
If min(h,w)<=8, The result obtained in the previous step is flattened （flattened）, This is because for min(h,w)<=8 The block network of adopts full connection processing . If min(h,w)>8, The result of the previous step is divided into two rectangular parts ,Y The reference pixel above X0 And on the left X1, This is because the network is for min(h,w)>8 The block of is processed by convolution . So if min(h,w)<=8 Output after preprocessing $\tilde{X}$ yes $n_{a}(n_{l}+2w)+2hn_{l}$ Dimension vector , otherwise $\tilde{X} =X_{0}\cup X_{1}$

Network structure

If min(h,w)<=8, Then the network structure is a fully connected network , As shown in the table 1,

about 16x16 The block of uses convolution network , And the network is composed of 3 Sub network composition , Such as Fig.3,

3 The specific structure of the sub network is shown in the table 2、3、4,

about 32x32 Blocks of also use convolutional Networks , And the network is also made up of 3 Sub network composition , Such as Fig.3, The structure of each sub network is shown in table 5、6、7,

Post processing operations

Fig.1 The post-processing operations in include inputting reshape by wxh Size , Add the mean value of the available reference pixels u, And then multiplied by the 2^(b-8),

LFNST choice

Such as Fig.1, The output of the network model also includes grpIdx1,gapIdx2. according to grpIdx1,gapIdx2 You can choose LFNST And whether to transpose the transformation coefficients , As shown in the table 8.

Model transfer

Model flag bit transmission of brightness block

VVC Through the code stream nnFlag Flag bits indicate whether neural networks are used for intra prediction . If the brightness block wxh The size of meets T If the reference pixel does not exceed the image boundary, it will be transmitted in the code stream nnFlag Sign a , Otherwise, only the traditional intra prediction mode is used .

Model flag bit transmission of chrominance block

If the luminance block corresponding to the chrominance block uses neural network for intra prediction , And the size of the chroma block meets T, be DM Used to indicate whether the chroma block uses neural network , otherwise DM Or to express PLANAR Pattern .

Context information transmission

In the neural network processing flow , The pre-processing phase may be vertically down sampled 、 Sample horizontally 、 Transpose, etc , These contextual information are specified as follows ,