当前位置:网站首页>Pytoch official fast r-cnn source code analysis (I) -- feature extraction
Pytoch official fast r-cnn source code analysis (I) -- feature extraction
2022-06-12 12:59:00 【hhhcbw】
In depth Pytorch official Faster R-CNN Source code , Bloggers will explain every code in as much detail as possible , If it helps you, you can pay attention to it and praise it , There are questions in the comments section , Bloggers will try their best to answer .
Faster R-CNN Thesis link
Pytorch official Faster R-CNN Link to the code documentation for .
Pytorch The official example code is as follows :
import torch
import torchvision
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
# For training
images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
boxes[:, :, 2:4] = boxes[:, :, 0:2] + boxes[:, :, 2:4]
labels = torch.randint(1, 91, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
d = {
'boxes': boxes[i], 'labels': labels[i]}
targets.append(d)
output = model(images, targets)
# For inference
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)
# optionally, if you want to export the model to ONNX:
torch.onnx.export(model, x, "faster_rcnn.onnx", opset_version = 11)
The following is a detailed description of the sample code .
First , initialization Faster R-CNN Model .
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
It can be seen that , The backbone network is used here Resnet-50-FPN Of Faster R-CNN. Next Debug Enter the internal code .
def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
num_classes=91, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs):
""" Constructs a Faster R-CNN model with a ResNet-50-FPN backbone. Building a backbone network is ResNet-50-FPN Of Faster R-CNN Model . The input to the model is expected to be a list of tensors, each of shape ``[C, H, W]``, one for each image, and should be in ``0-1`` range. Different images can have different sizes. The input to the model should be an input from tensors A list of components , Every tensor The shape of is [C,H,W], For each image, the element value should be in [0,1] Within the scope of , Different images have different sizes . The behavior of the model changes depending if it is in training or evaluation mode. The model has two modes: training and evaluation , The performance of the model depends on the model . During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing: - boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values of ``x`` between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H`` - labels (``Int64Tensor[N]``): the class label for each ground-truth box In the process of training , The model needs to input the tensor, And the goal ( A list of dictionaries ), It contains : - Frame (FloatTensor[N,4]): The real box is [x1,y1,x2,y2] In the form of ,x The value of the 0~W Between ,y The value of the 0-H Between . - label (Int64Tensor[N]): Category labels for each real box . The model returns a ``Dict[Tensor]`` during training, containing the classification and regression losses for both the RPN and the R-CNN. During training , The model returns a ”Dict[Tensor]“, contain RPN And R-CNN Classification of stages and regression losses . During inference, the model requires only the input tensors, and returns the post-processed predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as follows: - boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values of ``x`` between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H`` - labels (``Int64Tensor[N]``): the predicted labels for each image - scores (``Tensor[N]``): the scores or each prediction In the process of reasoning , The model only needs to input the tensor, Then return the post-processing prediction results to "List[Dict[Tensor]]" In the form of , For each input image , Its "Dict" The fields are as follows : - Frame (FloatTensor[N,4]): The prediction box is [x1,y1,x2,y2] In the form of ,x The value of the 0~W Between ,y The value of the 0~H Between . - label (Int64Tensor[N]): Prediction labels for each image . - fraction (Tensor[N]): Score per forecast . Faster R-CNN is exportable to ONNX for a fixed batch size with inputs images of fixed size. Faster R—CNN Can be exported as a fixed batch size field fixed size input image ONNX Format . Arguments: pretrained (bool): If True, returns a model pre-trained on COCO train2017 progress (bool): If True, displays a progress bar of the download to stderr pretrained_backbone (bool): If True, returns a model with backbone pre-trained on Imagenet num_classes (int): number of output classes of the model (including the background) trainable_backbone_layers (int): number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. Parameters : pretrianed(bool): If it is true , Return to a COCO train2017 The pre training model on . progress(bool): If it is true , Display the download progress bar on the screen . pretrained_backbone(bool): If it is true , Return to a Imagenet Backbone network pre training model on . num_classes(int): Number of types of model output ( Including background ). trainable_backbone_layers(int): Start with the last block ResNet Number of layers ( Not frozen ). The legal value is 0~5 Between ,5 This means that all layers of the backbone network are trainable . """
# Use assert Judge trainable_backbone_layers Whether the value of is legal
assert trainable_backbone_layers <= 5 and trainable_backbone_layers >= 0
# dont freeze any layers if pretrained model or backbone is not used
# If the pre training model or the pre training backbone network is not used , Do not freeze any layers .
if not (pretrained or pretrained_backbone):
trainable_backbone_layers = 5
if pretrained:
# no need to download the backbone if pretrained is set
# If the pre training model is used , There is no need to download the pre training backbone
pretrained_backbone = False
# obtain ResNet_FPN Backbone network
backbone = resnet_fpn_backbone('resnet50', pretrained_backbone, trainable_layers=trainable_backbone_layers)
# obtain Faster R-CNN Model
model = FasterRCNN(backbone, num_classes, **kwargs)
if pretrained:
# If you use a pre training model , Download the relevant pre training model configuration
state_dict = load_state_dict_from_url(model_urls['fasterrcnn_resnet50_fpn_coco'],
progress=progress)
# Load the model configuration into the model
model.load_state_dict(state_dict)
return model # Back to the model
Debug Access ResNet_FPN Corresponding code of backbone network .
def resnet_fpn_backbone(
backbone_name,
pretrained,
norm_layer=misc_nn_ops.FrozenBatchNorm2d,
trainable_layers=3,
returned_layers=None,
extra_blocks=None
):
""" Constructs a specified ResNet backbone with FPN on top. Freezes the specified number of layers in the backbone. Build one that adds... At the top FPN Of ResNet Backbone network . Freeze the specified number of layers in the backbone network . Examples:: >>> from torchvision.models.detection.backbone_utils import resnet_fpn_backbone >>> backbone = resnet_fpn_backbone('resnet50', pretrained=True, trainable_layers=3) >>> # get some dummy image >>> x = torch.rand(1,3,64,64) >>> # compute the output >>> output = backbone(x) >>> print([(k, v.shape) for k, v in output.items()]) >>> # returns >>> [('0', torch.Size([1, 256, 16, 16])), >>> ('1', torch.Size([1, 256, 8, 8])), >>> ('2', torch.Size([1, 256, 4, 4])), >>> ('3', torch.Size([1, 256, 2, 2])), >>> ('pool', torch.Size([1, 256, 1, 1]))] Arguments: backbone_name (string): resnet architecture. Possible values are 'ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2' norm_layer (torchvision.ops): it is recommended to use the default value. For details visit: (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267) pretrained (bool): If True, returns a model with backbone pre-trained on Imagenet trainable_layers (int): number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. Parameters : backbone_name (string):resnet framework . The possible value is 'ResNet','resnet18','resnet34','resnet50','resnet101','resnet152', 'resnext50_32x4d','resnet101_32x8d','wide_resnet50_2','wide_resnet101_2' norm_layer (torchivision.ops): It is recommended to use the default value . For details, please visit : (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267) pretrained (bool): If it is true , Return to a Imagenet Pre training backbone network model on trainable_layers (int): Start with the last block ResNet Number of layers ( Not frozen ). The legal value is 0~5 Between ,5 This means that all layers of the backbone network are trainable . """
backbone = resnet.__dict__[backbone_name](
pretrained=pretrained,
norm_layer=norm_layer) # obtain resnet-50 Backbone network
# select layers that wont be frozen
# Select the frozen layer ( Don't take part in training )
assert trainable_layers <= 5 and trainable_layers >= 0
layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
# freeze layers only if pretrained backbone is used
# The layer is frozen only when the pre training backbone network is used
for name, parameter in backbone.named_parameters():
if all([not name.startswith(layer) for layer in layers_to_train]):
parameter.requires_grad_(False)
if extra_blocks is None:
extra_blocks = LastLevelMaxPool()
if returned_layers is None:
returned_layers = [1, 2, 3, 4]
assert min(returned_layers) > 0 and max(returned_layers) < 5
return_layers = {
f'layer{
k}': str(v) for v, k in enumerate(returned_layers)}
in_channels_stage2 = backbone.inplanes // 8
in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers]
out_channels = 256
return BackboneWithFPN(backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks)
Debug Access ResNet-50 The code of the backbone network .
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=None,
norm_layer=None):
super(ResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer
self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
# each element in the tuple indicates if we should replace
# the 2x2 stride with a dilated convolution instead
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
dilate=replace_stride_with_dilation[2])
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
norm_layer(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
self.base_width, previous_dilation, norm_layer))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, groups=self.groups,
base_width=self.base_width, dilation=self.dilation,
norm_layer=norm_layer))
return nn.Sequential(*layers)
def _forward_impl(self, x):
# See note [TorchScript super()]
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
def forward(self, x):
return self._forward_impl(x)
ResNet-50 The network structure of is shown in the figure below .
ResNet-50 Adopted BottleNeck structure , Their ratio BasicNeck Save more parameters . The code is as follows :
class Bottleneck(nn.Module):
# Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)
# while original implementation places the stride at the first 1x1 convolution(self.conv1)
# according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.
# This variant is also known as ResNet V1.5 and improves accuracy according to
# https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None):
super(Bottleneck, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
width = int(planes * (base_width / 64.)) * groups
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
At last we got ResNet-50 Network structure :
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): FrozenBatchNorm2d(64)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): FrozenBatchNorm2d(256)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(512)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(1024)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(2048)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
After that, it was added as FPN.
As shown in the figure above ,FPN It is possible to build feature pyramids with strong semantics at various scales , The specific principle can be seen in This blog .FPN Get... Here ResNet-50 The feature map extracted at each stage plus the last feature map of the maximum pool is a total of five feature maps . The code is as follows :
def forward(self, x: Dict[str, Tensor]) -> Dict[str, Tensor]:
""" Computes the FPN for a set of feature maps. Arguments: x (OrderedDict[Tensor]): feature maps for each feature level. Returns: results (OrderedDict[Tensor]): feature maps after FPN layers. They are ordered from highest resolution first. """
# unpack OrderedDict into two lists for easier handling
names = list(x.keys())
x = list(x.values())
last_inner = self.get_result_from_inner_blocks(x[-1], -1)
results = []
results.append(self.get_result_from_layer_blocks(last_inner, -1))
for idx in range(len(x) - 2, -1, -1):
inner_lateral = self.get_result_from_inner_blocks(x[idx], idx) # 1x1 Convolution reduces the number of channels to 256
feat_shape = inner_lateral.shape[-2:]
inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest") # On the sampling
last_inner = inner_lateral + inner_top_down # Transverse connection
results.insert(0, self.get_result_from_layer_blocks(last_inner, idx)) # 3x3 Convolution to eliminate aliasing effect
if self.extra_blocks is not None:
results, names = self.extra_blocks(results, x, names) # Maximize the pool to obtain the characteristic map of the fifth layer
# make it back an OrderedDict
out = OrderedDict([(k, v) for k, v in zip(names, results)])
return out
thus , Feature map extraction has been completed , Next, we will carry on RPN( Generation of regions of interest ).
边栏推荐
- Freshman girls' nonsense programming is popular! Those who understand programming are tied with Q after reading
- 用PyTorch进行语义分割
- Help you with everything from the basics to the source code. Introduce the technology in detail
- B站分布式KV存储混沌工程实践
- 干货满满,这些知识你必须拿下
- STM32F1与STM32CubeIDE编程实例-设备驱动-EEPROM-AT24C256驱动
- [EDA] chip layout design: VLSI layout design using electric
- Dasctf Sept x Zhejiang University of technology autumn challenge Web
- Stm32f1 and stm32subeide programming example - device driver -dht11 temperature sensor driver
- [HXBCTF 2021]easywill
猜你喜欢
![[EDA] chip layout design: VLSI layout design using electric](/img/a1/da0739070c940b36bc7a55d8eb2fe5.jpg)
[EDA] chip layout design: VLSI layout design using electric

关于派文的问题

leetcode 47. Permutations II 全排列 II(中等)

Summary of question brushing in leetcode sliding window

The goods are full. You must take this knowledge

快速下载谷歌云盘大文件的5种方法

torch_geometric message passing network

一个ES设置操作引发的“血案”

Experience and learning path of introductory deep learning and machine learning

Array -- fancy traversal technique of two-dimensional array
随机推荐
A brief introduction to Verilog mode
Constant time delete / find any element in array
JVM 运行时参数
Binary tree (serialization)
Itk:: neighborhood get 6 neighborhood, 18 neighborhood, 26 neighborhood, 18/6 neighborhood, 26/18 neighborhood
Typeof and instanceof, how to simulate the implementation of an instanceof? Is there a general detection data type?
C语言【23道】经典面试题【下】
2022 ARTS|Week 23
数组——二维数组的花式遍历技巧
[cloud native | kubernetes] in depth understanding of deployment (VIII)
数组——双指针技巧秒杀七道数组题目
【云原生 | Kubernetes篇】Ingress案例实战
The goods are full. You must take this knowledge
Summary of question brushing in leetcode sliding window
ITK multiresolution image itk:: recursivemultiresolutionpyramidimagefilter
[you code, I fix] whitesource was officially renamed mend
Safety KNN
verilog-mode的简要介绍
【微信小程序开发】第1篇:开发工具安装及程序配置
八大误区,逐个击破(2):性能差?应用程序少?你对云的这些担心很多余!