当前位置:网站首页>Intra prediction and transform kernel selection based on Neural Network

Intra prediction and transform kernel selection based on Neural Network

2022-07-28 13:27:00 Dillon2015

This article is from JVET-T0073 The proposal 《neural network-based intra prediction with transform selection in VVC》

 

brief introduction


In this paper, a neural network is proposed to directly generate prediction blocks using left and upper reference pixels , And the use of prediction LFNST The index of transformation kernel and whether transpose is needed . stay VTM-8.0 On ,all intra Under configuration YUV Of BD-Rate Respectively -3.36%,-2.95%,-2.97%, The encoding and decoding time is 395% and 3575%,random access Under configuration YUV Of BD-Rate Respectively -1.52%,-1.00%,-1.26%, The encoding and decoding time is 159% and 723%.

The overall framework


A total of 8 A model ,{4x4,8x4,16x4,32x4, 8x8,16x8,16x16,32x32}.

Use model to block wxh For the treatment process of   f_{h,w}(.,\theta _{h,w}) Express ,  \theta _{h,w} Represent model parameters . For a given wxh block Y, Its adjacent pixels are X Unified expression, such as Fig.1 Shown ,X Include Y At the top of the n_{a}\times (n_{l}+2w )   Pixels and left   n_{l}\times 2h Pixels . The whole process is like Fig.1,X After pretreatment, it is sent to the network , Network output is  \tilde{Y}  and grpIdx1,gapIdx2,   Generated after post-processing Y Prediction block of \hat{Y}  . Among them, network output grpIdx1,gapIdx2 It's predicted LFNST Transform the core index and whether to transpose .

Intra prediction based on Neural Network


Pretreatment operation


Fig.1 Chinese vs X The preprocessing operations include the following 4 A step :

  • X Divide the reference pixel in by 2^(b-8),b Indicates bit depth .

  • For available reference pixels ( Rebuilt ) Subtract the mean u.

  • Set the unavailable reference pixel to 255.

  • If min(h,w)<=8, The result obtained in the previous step is flattened (flattened), This is because for min(h,w)<=8 The block network of adopts full connection processing . If min(h,w)>8, The result of the previous step is divided into two rectangular parts ,Y The reference pixel above X0 And on the left X1, This is because the network is for min(h,w)>8 The block of is processed by convolution . So if min(h,w)<=8 Output after preprocessing  \tilde{X}  yes  n_{a}(n_{l}+2w)+2hn_{l}  Dimension vector , otherwise  \tilde{X} =X_{0}\cup X_{1} 

Network structure


If min(h,w)<=8, Then the network structure is a fully connected network , As shown in the table 1,

about 16x16 The block of uses convolution network , And the network is composed of 3 Sub network composition , Such as Fig.3,

3 The specific structure of the sub network is shown in the table 2、3、4,

about 32x32 Blocks of also use convolutional Networks , And the network is also made up of 3 Sub network composition , Such as Fig.3, The structure of each sub network is shown in table 5、6、7,

Post processing operations


Fig.1 The post-processing operations in include inputting reshape by wxh Size , Add the mean value of the available reference pixels u, And then multiplied by the 2^(b-8),

LFNST choice


Such as Fig.1, The output of the network model also includes grpIdx1,gapIdx2. according to grpIdx1,gapIdx2 You can choose LFNST And whether to transpose the transformation coefficients , As shown in the table 8.

Model transfer


Model flag bit transmission of brightness block


VVC Through the code stream nnFlag Flag bits indicate whether neural networks are used for intra prediction . If the brightness block wxh The size of meets T If the reference pixel does not exceed the image boundary, it will be transmitted in the code stream nnFlag Sign a , Otherwise, only the traditional intra prediction mode is used .

Model flag bit transmission of chrominance block


If the luminance block corresponding to the chrominance block uses neural network for intra prediction , And the size of the chroma block meets T, be DM Used to indicate whether the chroma block uses neural network , otherwise DM Or to express PLANAR Pattern .

Context information transmission


In the neural network processing flow , The pre-processing phase may be vertically down sampled 、 Sample horizontally 、 Transpose, etc , These contextual information are specified as follows ,

LFNST Prediction coding of


grpIdx Predictive coding can be used , The encoder and decoder are respectively as Fig.5 and Fig.6,

experimental result


The details of model training are shown in the table 10,

The model derivation is shown in table 11,

The experimental results are as follows ,

add LFNST The result of predictive coding of parameters is ,

Interested parties, please pay attention to WeChat official account Video Coding

原网站

版权声明
本文为[Dillon2015]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/209/202207281216004070.html