当前位置:网站首页>2D human posture estimation for posture estimation - simple baseline (SBL)
2D human posture estimation for posture estimation - simple baseline (SBL)
2022-06-10 15:48:00 【light169】
Address of thesis :Simple Baselines for Human Pose Estimation and Tracking
Code address :GitHub - leoxiaobin/pose.pytorch: Simple Baselines for Human Pose Estimation and Tracking
Simple Baselines, yes 2018 year MSRA The job of , The network structure is shown in the figure below . It's called , Because this network is really simple . The network is in ResNet On the basis of head, This head Just a few deconvolutional layer, For lifting ResNet Output feature map The resolution of the , We have mentioned many times that high resolution is the need of attitude estimation task . there deconvolutional layer It's a less rigorous statement , Read the source code ,deconvolutional layer It's actually going to be transpose convolution、BatchNorm、ReLU Encapsulated into a structure . So the point is transpose convolution, Think of it as convolution The inverse process .
From the picture, we can see Simple Baselines The network structure of is a little similar Hourglass One of them module, But it can be found that :① The network does not use similar Hourglass Medium skip connection;② The network is single-stage,Hourglass yes multi-stage Of . But it's amazing , The network is more effective than Hourglass. Personally, I think there are two reasons , One is Simple Baselines With ResNet As backbone, Compared with feature extraction ability Hourglass stronger . Two is Hourglass The middle and upper sampling uses a simple nearest neighbor upsampling, And here we use deconvolutional layer, The latter works better ( You can see in the back MSRA Of Higher-HRNet This structure is still used in ).
SBL Network structure
SBL(Simple Baseline) [7] It provides a set of methods for human posture estimation Benchmark framework .SBL stay Backbone network Followed by The deconvolution module is used to predict the heat map , Is in the ResNet Then add a few layers Deconvolution Directly generate thermodynamic diagram . Compared with other models , Is the use of Deconvolution Supersampling structure is replaced . The up sampling and convolution parameters are combined into the deconvolution layer in a simpler way , Instead of using jump layer connection .
Hourglass、CPN、SBL What they have in common is , Three up sampling steps and three levels of nonlinearity are adopted ( From the deepest features ) To obtain high-resolution feature map and heatmap

- Above picture a yes Hourglass The Internet ,b yes CPN,c It's in this article SimplePose, The complexity of the structure can be seen directly
- The first two structures need to construct pyramid feature structure , Such as FPN Or from Resnet structure
- SimplePose There is no need to build a pyramid feature structure , It's directly in Resnet The deconvolution module is designed and the result is output , It's from deep and low The simplest way to generate heat map based on resolution feature
- Specific structure : First : stay Resnet On the basis of , Take the last residual module and output the characteristic layer ( name C5) then : Followed by three deconvolution modules ( Each module is :Deconv + batchnorm + relu, Deconvolution parameters ,256 passageway ,4X4 Convolution kernel ,stride by 2,pad by 1), Last : use 1X1 Convolution layer generation k Output thermal diagram of key points
. - Mean square error (MSE) It is used to predict the loss between the heat map and the target heat map
- Through the application to the k With two joints GT Centered 2D Gauss function , Generate k Target heat map of the joint
.
In these models , It can be seen that How to generate high-resolution feature map It is a key of attitude estimation ,SimplePose use Deconv Expand the resolution of feature map ,Hourglass,CPN It is used in upsampling+skip The way ; Of course, it is difficult for us to judge which way is good based on this example
Description of attitude tracking problem 
ICCV’17 PoseTrack Challenge[2] The winner of [11] It solves the problem of multi person pose tracking , use first Mask RCNN[12] Estimate the posture of human body in the frame , Then the greedy bipartite graph matching algorithm is used for on-line tracking frame by frame .
This greedy matching algorithm , In a nutshell , In the first frame of the video, each detected person is given a id, Then the person detected in each subsequent frame is measured in a certain way with the person detected in the previous frame ( What is mentioned in this paper is the method of calculating the detection frame IOU) Calculate a similarity , Put the ones with high similarity ( Greater than threshold ) As one id, And delete . Repeat the above steps , Until there is no instance similar to the current frame , At this point, assign a new... To the remaining instances id.

The method proposed in this paper retains the main process of this method , On this basis, two improvements are put forward :
First, in addition to detecting the network , The optical flow method is also used to supplement some detection frames , To solve the problem of missing detection in the detection network ( For example 2 The person on the far left of the network is not detected by the detection network ).
The two is the use of Object Keypoint Similarity (OKS) Replace the detection box IOU To calculate the similarity . This is because when people move faster , use IOU It may not be reasonable .
OKS Is a measure of key point distance , The calculation method is as follows :
The new similarity calculation method proposed in this paper is to use the optical flow method to calculate the position where the key points of one frame will appear in another frame , Then calculate the distance between the calculated position and the key points detected in this frame OKS, Take this as the similarity value of different people between two frames .
Joint Propagation using Optical Flow
If a single image level detector is simply used in video ( Such as fast - rcnn [27], R-FCN[16]), Motion blur and occlusion are introduced into video frames , It may lead to missed detection and false detection . Pictured 2 Shown , Due to rapid movement , The detector missed the black man on the left . Time information is often used to produce more reliable detection [36,35].
We recommend using time information represented by optical flow , Generate pedestrian frames from nearby frames for processing frames .
The specific method is : Given
An instance at frame i , There is a set of key points
as well as
and
The optical flow field between
, We can estimate the corresponding key point coordinate set
. specifically , Is for
Medium joints Location (x,y), The next frame may be ( x + δ x , y + δ y ), among δ x and δ y Is in (x,y) Flow field value at ( flow field values ). When we calculate
The boundary of the , And after extending it Box As candidated box . The extended value used in the experiment is 15 % .
When due to motion blur or occlusion , After the pedestrian detector fails to detect the current frame , We can use the data propagated from the previous frame Boxes , People who miss will be detected by these boxes . Pictured 2 (c ) Shown , For the black man on the left , Because we are in the picture 2(a) There is the tracking result of the previous frame in , So the spread of box Successfully included this person .
Flow-based Pose Similarity
Use bounding boxes IoU (Intersection-over-Union) As a measure of similarity (
) There may be problems connecting instances , One is when the instance moves very fast , These boxes don't overlap ; Second, in a crowded scene , Instances in the close box are not necessarily related . A more granular measure can be attitude similarity (
) , It uses object key similarity (OKS) Calculate the body joint distance between two instances . When different frames , The same person's posture may change , At this time, pose similarity will also cause problems . therefore , We propose to use a flow based attitude similarity measure .
Given
An instance key at frame
and
l Instance at frame
, The attitude similarity measure based on flow is expressed as :
among OKS Indicates the relationship between two body postures Object Keypoint Similarity (OKS) Calculation . about
example , According to the optical flow field
Calculation
Frame time correspondence , Write it down as 
Due to shielding with others or objects , Pedestrians often disappear , Then again . Considering two consecutive frames is not enough . therefore , We have considered multi frame stream based pose similarity , Write it down as
, This means spreading
From multiple previous frames . In this way , We can even relink instances that disappear in the middle frame .
Flow-based Pose Tracking Algorithm


First Solve the problem of attitude estimation . For the current processing frame , The detection frame is composed of a pedestrian detector and a frame obtained by using optical flow in the previous frame , And non maximal inhibition (NMS) operation . Then the clipped and scaled images are sent to the attitude estimation network for attitude estimation .
Solve tracking problems . We store the tracked instances in a double ended queue (Deque) Q in :![Q=\left[\mathcal{P}_{k-1}, \mathcal{P}_{k-2}, \ldots, \mathcal{P}_{k-L_{Q}}\right]](http://img.inotgo.com/imagesLocal/202205/19/202205191705550781_9.gif)

- First , We solved the pose estimation problem .
- For processing frames in video , Use bbox Non maximum inhibition (NMS) Operate to unify... From human probes box And use optical flow Propagate the generated by the joint from the previous frame box. from progagating joints Produced boxes As a supplement to detector missing detection
- Then through our proposed pose estimation network , Take advantage of these boxes Human pose estimation is performed on the cropped and resized image
- secondly , Solved the tracking problem . We store the tracked instances in a file with a fixed length LQ The two terminal queue (Deque) in , Expressed as
- among
Indicates that in the previous frame
The tracked instance set in , Q The length of
Indicates the number of previous frames considered when performing matching . - Q It can be used to capture the previous multi frame link relationship , Initialize in the first frame of the video . For the first k frame
, We calculate untracked body joints
(id by none) And Q Between previous instances flow Pose similarity matrix based on flow
. Then through greedy matching and
by
Each of the bodyjoints example J Distribute id , Get the specified instance set
. Last , We add the second k Frame instance set
To update the tracked instance Q.
边栏推荐
- ORB_ Slam2 visual inertial tight coupling positioning technology route and code explanation 0 - overall framework and theoretical basic knowledge
- Hutool Usage Summary (VIP collection version)
- Li Kou daily question - day 18 -350 Intersection of two data Ⅱ
- Guanghetong high computing power intelligent module injects intelligence into 5g c-v2x in the trillion market
- [high code file format API] Shanghai daoning provides you with the file format API set Aspose, which can create, convert and operate more than 100 file formats in just a few lines of code
- 初学pytorch踩坑
- ORB_ Slam2 visual inertial tight coupling positioning technology route and code explanation 2 - IMU initialization
- Self recommendation - in depth understanding of the rust Standard Library Kernel
- Necessary tools for automatic operation and maintenance shell script introduction
- 竟然还有人说ArrayList是2倍扩容,今天带你手撕ArrayList源码
猜你喜欢

使用特定大小、分辨率或背景色保存图窗

Hessian matrix of convex function and Gauss Newton descent method
CAP 6.1 版本发布通告

MySQL8安装详细步骤

This article introduces you to j.u.c's futuretask, fork/join framework and BlockingQueue

Méthodes couramment utilisées dans uniapp - TIMESTAMP et Rich Text Analysis picture

4. Meet panuon again UI. Title bar of silver form

MapReduce之排序及序列化案例的代码实现

MapReduce之Map阶段的join操作案例

Fast detection of short text repetition rate
随机推荐
This article introduces you to j.u.c's futuretask, fork/join framework and BlockingQueue
自动化运维必备的工具-Shell脚本介绍
Hutool Usage Summary (VIP collection version)
[MySQL basics]
竟然还有人说ArrayList是2倍扩容,今天带你手撕ArrayList源码
ORB_ Slam2 visual inertial tight coupling positioning technology route and code explanation 0 - overall framework and theoretical basic knowledge
The ultimate buff of smart grid - guanghetong module runs through the whole process of "generation, transmission, transformation, distribution and utilization"
影刀RPA学习和遇见excel部分问题解决方式
ORB_SLAM2视觉惯性紧耦合定位技术路线与代码详解1——IMU流型预积分
[reward publicity] [content co creation] issue 16 may Xu sublimation, create a good time! You can also win a gift package of up to 500 yuan if you sign a contract with Huawei cloud Xiaobian!
云图说|每个成功的业务系统都离不开APIG的保驾护航
视觉SLAM常见的QR分解SVD分解等矩阵分解方式求解满秩和亏秩最小二乘问题(最全的方法分析总结)
Guanghetong high computing power intelligent module injects intelligence into 5g c-v2x in the trillion market
企业如何提升文档管理水平
Sorting and paging
ORB_ Slam2 visual inertial tight coupling positioning technology route and code explanation 2 - IMU initialization
Detailed installation steps of mysql8
Recommend an easy-to-use designer navigation website
Vins theory and code explanation 4 - initialization
Unified certification center oauth2 certification pit
Indicates that in the previous frame
The tracked instance set in , Q The length of
Indicates the number of previous frames considered when performing matching .
(id by none) And Q Between previous instances flow Pose similarity matrix based on flow
. Then through greedy matching and
. Last , We add the second k Frame instance set