当前位置:网站首页>Siamfc: full convolution twin network for target tracking
Siamfc: full convolution twin network for target tracking
2022-07-26 14:45:00 【The way of code】
SiamFC The Internet

In the figure z It represents the template image , The algorithm uses the first frame ground truth;x It stands for search region, Represents the candidate box search area in the subsequent frame to be tracked ;ϕ It represents a feature mapping operation , Map the original image to a specific feature space , What is used in this paper is CNN Convolution layer and in pooling layer ;6×6×128 representative z after ϕ The resulting feature , It's a 128 passageway 6×6 size feature, Empathy ,22×22×128 yes x after ϕ After the characteristics of ; hinder × Represents convolution operation , Give Way 22×22×128 Of feature By 6×6×128 Convolution kernel convolution , Get one 17×17 Of score map, It represents the similarity value between each position in the search area and the template .
The algorithm itself is to compare the similarity between the search area and the target template , Finally get the search area score map. In fact, in principle , This method is very similar to the method of correlation filtering . It matches the target template point by point in the search area , This point by point translation matching method for calculating similarity is regarded as a convolution , Then find the point with the largest similarity value in the convolution result , As the center of new goals .
The picture above ϕ It's actually CNN Part of , And two ϕ The network structure is the same , This is a typical twin neural network , And in the whole model only conv Layer and the pooling layer , So this is also a typical full convolution (fully-convolutional) neural network .
The loss function is definitely needed when training the model , And the optimal model is obtained by minimizing the loss function . The algorithm in this paper is to construct an effective loss function , The location points of the search area are distinguished by positive and negative samples , That is, the points within a certain range of the target are taken as positive samples , Points outside this range are taken as negative samples , For example, figure 1 Generated on the far right of score map in , The red dot is the positive sample , The blue dot is a negative sample , They all correspond to search region Red rectangular area and blue rectangular area in . The article uses logistic loss, The specific loss function form is as follows :
about score map Hit the loss of each point :
among v yes score map The true value of each point in ,y∈{+1,−1} Is the label corresponding to this point .
The above is score map At every point in loss value , And for score map Holistic loss, All points are used loss The average of . namely :
there u∈D representative score map Position in .
The whole network structure is similar to AlexNet, But there is no final full connection layer , Only the previous convolution layer and pooling layer .

The whole network structure is shown in the table above , among pooling The layer uses max-pooling, There is one behind each convolution layer ReLU Nonlinear activation layer , But the fifth floor doesn't . in addition , During training , Every ReLU Used in front of the layer batch normalization( Batch standardization is a training method often seen in deep learning , It refers to training with gradient descent method DNN when , For each in the network layer mini-batch The data were normalized , Change its mean value to 0, Variance becomes 1, Its main function is to alleviate DNN The gradient in training disappears / Explosion phenomenon , Speed up the training of the model ), Used to reduce the risk of over fitting .
AlexNet

AlexNet by 8 The layer structure , The top 5 The layers are convolutions , Back 3 The layer is the full connection layer ; The learning parameters are 6 Ten million , Neurons have 650,000 individual .AlexNet In two GPU Up operation ;AlexNet In the 2,4,5 Each floor is the previous floor itself GPU Internal connection , The first 3 The first floor is fully connected with the first two floors , Full connection is 2 individual GPU Full connection ;
RPN Layer 1,2 After a convolution ;Max pooling Layer in RPN The first floor and the second floor 5 After a convolution .ReLU After each convolution layer and full connection layer .
Convolution kernel size and number :
conv1:96 11×11×3( Number / Long / wide / depth ) conv2:256 5×5×48 conv3:384 3×3×256 conv4: 384 3×3×192 conv5: 256 3×3×192
ReLU、 double GPU operation : Improve training speed .( Apply to all convolution layers and full connection layers )
overlap pool Pooling layer : Improve accuracy , It is not easy to produce over fitting .( Applied in the first layer , The second floor , Behind the fifth floor )
Local response normalized layer (LRN): Improve accuracy .( Apply behind the first and second layers )
Dropout: Reduce over fitting .( Applied in the first two full connection layers )
fine-tuning (fine-tune)
See a good model from others , Although the specific problems are different , But I also want to try , See if you can get good results , And I don't have much data , What do I do ? No problem , Bring someone else's ready-made trained model , Replace it with your own data , Adjust the parameters , Train again , This is fine tuning (fine-tune).
Freeze part of the convolution layer of the pre training model ( It is usually the most convoluted layer close to the input ), Train the remaining convolution layers ( It is usually a partial convolution layer close to the output ) And full connection layer . In a sense , Fine tuning should be part of transfer learning .
perceptron :PLA
Multilayer perceptron is a generalization of perceptron , Perceptron learning algorithm (PLA: Perceptron Learning Algorithm) Describing the structure of neurons is a separate .
The neural network of the perceptron is represented as follows :

Multilayer perceptron :MLP
An important feature of multilayer perceptron is multilayer , We call the first layer the input layer , The last layer is called the output layer , The middle layer is called the hidden layer .MLP The number of hidden layers is not specified , Therefore, the appropriate number of hidden layers can be selected according to their respective needs . And there is no limit to the number of neurons in the output layer .
MLP The structure model of neural network is as follows , Only one hidden layer is involved in this paper , The input has only three variables [x1,x2,x3] And an offset b, The output layer has three neurons . Compared with the neuron model in the perceptron algorithm, it is integrated .

ReLU function
ReLU The function formula is as follows :
The image below :

sigmod function
sigmod When a function approaches positive infinity or negative infinity , The function approaches a smooth state . Because the output range (0,1), So the probability of binary classification is often used by this function .
sigmoid The function expression is as follows :
The image below :

Learn more about programming , Please pay attention to my official account :

边栏推荐
- Devops system of "cloud native" kubesphere pluggable components
- Brief description of llcc68 broadcast wake-up
- 基于CAS的SSO单点登录环境搭建
- 堆叠降噪自动编码器 Stacked Denoising Auto Encoder(SDAE)
- [Yugong series] July 2022 go teaching course 017 - if of branch structure
- TDengine 助力西门子轻量级数字化解决方案 SIMICAS 简化数据处理流程
- Would you please refer to the document of Database specification?
- 创建Root权限虚拟环境
- CAS based SSO single point client configuration
- [dry goods] data structure and algorithm principle behind MySQL index
猜你喜欢

Fill in the questionnaire and receive the prize | we sincerely invite you to fill in the Google play academy activity survey questionnaire

Canvas laser JS special effect code

WPF 常用功能整合

Win11运行虚拟机死机了?Win11运行VMware虚拟机崩溃的解决方法
![[integer programming]](/img/e5/aebc5673903f932030120822e4331b.png)
[integer programming]

中值滤波器
网络图片转本地导致内核退出

基于CAS的SSO单点客户端配置
![[1.2. return and risk of investment]](/img/61/0135c429225e1c18705749a20e2a96.png)
[1.2. return and risk of investment]
![Matlab solution of [analysis of variance]](/img/30/638c4671c3e37b7ce999c6c98e3700.png)
Matlab solution of [analysis of variance]
随机推荐
Instructions for various interfaces of hand-held vibrating wire collector vh03
过滤器和拦截器的区别
WPF 常用功能整合
Seata的部署与微服务集成
Create root permission virtual environment
Tips for unity transparent channel
Brief description of llcc68 broadcast wake-up
[integer programming]
Mysql-04 storage engine and data type
median filter
TDengine 助力西门子轻量级数字化解决方案 SIMICAS 简化数据处理流程
go开发调试之Delve的使用
《MySQL高级篇》五、InnoDB数据存储结构
【常微分方程求解及绘图之求解小船行走轨迹】
GOM登录器配置免费版生成图文教程
如何做 APP 升级测试 ?
Realize the full link grayscale based on Apache APIs IX through MSE
全校软硬件基础设施一站式监控 ,苏州大学以时序数据库替换 PostgreSQL
VBA upload pictures
Wechat applet - "do you really understand the use of applet components?