The tutorial is in notebook Running on , Not scripts , download notebook file .

PyTorch Provides modules and classes with elegant design :torch.nn, torch.optim, Dataset, DataLoader, To create and train nerves . In order to make full use of its function , And customize it according to the problem , We need to fully understand what they do . In order to improve cognition , We started with MNIST Train a basic neural network , Without using any of the features of these modules ; Use only the most basic PyTorch tensor Function initialization . then , Add one at a time from torch.nn, torch.optim, Dataset, DataLoader Characteristics of , Accurately show the function of each part , And how to make the code simpler and more flexible .

This tutorial assumes that you have installed PyTorch, And right tensor Familiar with the basic knowledge of operation .

MNIST Data settings

We're going to use the classic MNIST Data sets , It is composed of handwritten digits (0-9) Black and white pictures of .

We will use pathlib(Python3 Part of the standard library ) Solve the problem of path , And use requests. We only import when we use them , In this way, you can accurately see what is used in each part .

from pathlib import Path
import requests DATA_PATH = Path('data')
PATH = DATA_PATH / "mnist" PATH.mkdir(parents=True, exist_ok=True) URL = "https://github.com/pytorch/tutorials/raw/master/_static/"
FILENAME = "mnist.pkl.gz" if not (PATH / FILENAME).exits():
content = requests.get(URL + FILENAME).content
(PATH / FILENAME).open('wb').write(content)

The dataset is Numpy The array format , Use pickle(python The specific format used to serialize data in ) preservation .

import pickle
import gzip with gzip.open((PATH / FILENAME).as_posix(), 'rb') as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')

The size of each picture is 28*28, Saved as flattened rows , The length is 784. Check out one of them , First of all, it needs to be reshape To 2D .

from matplotlib import pyplot
import numpy as np pyplot.imshow(x_train[0].reshape((28, 28)), cmap = 'gray')
print(x_train.shape)

Output :

(50000, 784)

PyTorch Use torch.tensor, instead of numpy Array , So we do data conversion .

import torch

x_train, y_train, x_valid, y_valid = map(
torch.tensor, (x_train, y_train, x_valid, y_valid)
)
n, c = x_train.shape
print(x_train, y_train)
print(x_train.shape)
print(y_train.min(), y_train.max())

Output :

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]) tensor([5, 0, 4, ..., 8, 4, 8])
torch.Size([50000, 784])
tensor(0) tensor(9)

Neural networks from scratch ( Don't use torch.nn)

First use only PyTorch tensor Operation create model . Suppose you are familiar with the basic knowledge of neural network .

PyTorch Provides the ability to create random values or 0 value tensors Methods , It can be used to build weights and offsets for simple linear models . These are just routine tensors, There is also a very special addition : tell PyTorch They need gradients . bring PyTorch The record occurred in tensor All operations on , Thus, the gradient can be calculated in automatic back propagation .

For weights , We set after initialization requires_grad, Because we don't want this step to be included in the gradient update .( Be careful ,PyTorch in , suffix _ representative in-place operation ).

Be careful : This article USES the Xavier initialisation( multiply 1/sqrt(n) ) Initialization weight .

import math

weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)

Thanks to the PyTorch The ability to automatically calculate gradients , We can use any standard Python function ( Or call the object ) Build the model . So here is a simple linear model created by pure matrix multiplication and broadcast addition . You also need an activation function , So we will write and use log_softmax. remember : Even though PyTorch Provides a large number of written loss functions , Activation functions, etc , You still use pure python Write them .PyTorch It will even automatically create a quick for your function GPU Or vectorization CPU Code .

def log_softmax(X):
return x - x.enp().sum(-1).log.unsqueeze(-1) def model(xb):
return log_softmax(xb @ weights + bias)

among ,@ For matrix multiplication . Every batch( This case 64 A picture ) Will call our function . This is a forward pass . Be careful , At this stage, our prediction will not be better than random , Because we use random initialization weights .

bs = 64 # batch size

xb = x_train[0:bs]    # a mini-batch from x
preds = model(xb) # predictions print(preds[0], preds.shape)

Output :

tensor([-1.8235, -2.3674, -2.6933, -2.0418, -2.2708, -2.3946, -2.1448, -2.5031,
-2.7917, -2.3786], grad_fn=<SelectBackward0>) torch.Size([64, 10])

As you can see ,predstensor It's not just about tensor The number , There is also a gradient function . Next we will use this for back propagation .

Now let's implement the negative log likelihood function as the loss function ( Once again, , Use standards Python)

def nll(input, target):
return -input[range(target.shape[0]), target].mean() loss_func = nll

Calculate the loss , In this way, we will see whether there is any improvement after back propagation .

yb = y_train[0:bs]
print(loss_func(preds, yb))

Output :

tensor(2.2979, grad_fn=<NegBackward0>)

Use a function to calculate the accuracy of the model . For each prediction , If the index of the maximum value is consistent with the target value , The prediction is correct .

def accuracy(out, yb):
preds = torch.argmax(out, dim=1)
return (preds == yb).float().mean()

Check the accuracy of the model , So you can see when loss In ascension , Whether the accuracy will be improved

print(accuracy(preds, yb))

Output :

tensor(0.1094)

We can now run a training cycle , For each cycle , Include :

  • Select a mini-batch of data (of size bs)
  • Use model forecast
  • Calculate the loss
  • loss.backward() Update model gradient , In this case, it's weights and bias

Now use gradient update weights and bias. adopt torch.no_grad() Context manager , Because we don't want these behaviors to be recorded in the next gradient calculation . You can here Read more about PyTorch Of Autograd Record operation .

Next, set the gradient 0, Get ready for the next cycle . otherwise , Our gradient will record the operation records of all operations that have occurred ( namely loss.backward() Add gradients to anything stored , Instead of replacing them ).

TIP: You can use the standard python Debugger step-by-step debugging in PyTroch Code , Allows you to check the values of various variables at each step . Cancel the following set_trace() Note try .

from IPython.core.debugger import set_trace

lr = 0.5 # learning rate
epochs = 2 # how many epochs to train for for epoch in range(epochs):
for i in range((n - 1) // bs + 1)):
# set_trace()
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb) loss.backward()
with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()

above , We from 0 A small neural network is constructed and trained ( In this case , One logistic Return to , Because there is no hidden layer ).

Check loss And accuracy compared to before ,loss Reduce , Accuracy increases , This is the fact. .

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

Output :

tensor(0.0803, grad_fn=<NegBackward0>) tensor(1.)

Use torch.nn.functional

We will use PyTorch Of nn Class reconfiguration network , Make it consistent with what you have done before , But more concise and flexible . Every step from here , We may make the code shorter 、 Easy to understand 、 flexible .

The first and simplest step is to make the code shorter , utilize torch.nn.functional( It is usually named F) Instead of handwriting activation and loss functions . This module contains torch.nn All functions of the Library ( Other parts are classes ). In addition to a lot of losses 、 Activate the function , You can also find some functions for creating neural networks here , Such as pooling function .( There are also some convolutions 、 Functions of linear layers, etc , But we will see , It is usually better to use other parts of the library to solve these things )

If you use the negative log likelihood loss function and log softmax Activation function , that Pytorch Provide a single function F.cross_entropy Combine these two . So we can even remove the activation function from our model .

import torch.nn.functional as F

loss_func = F.cross_entropy

def model(xb):
return xb @ weights + bias

Be careful , We are no longer model Call in function log_softmax, confirm loss and accuracy Is it the same as before .

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

Output :

tensor(0.0803, grad_fn=<NllLossBackward0>) tensor(1.)

Use nn.Module restructure

next step , For a clearer and simpler training cycle , We will use nn.Module and nn.Parameter. We inherit nn.Module( It is a class in itself and can track state ). In this case , We want to create a class to hold our weights 、 bias , And the method of forward transmission .nn.Module There are a lot of properties and methods ( for example .parameters() and .zero_grad()).

Be careful :nn.Module( Capitalization M) yes PyTorch A specific concept , It is a class that we need to use a lot . Don't put nn.Module And Python Write small and medium m Of module The concept of confusion , The latter can be imported Python file .

from torch import nn

class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
self.bias = nn.Parameter(torch.zerors(10)) def forward(self, xb):
return xb @ self.weights + self.bias

Because we use an object instead of a function , We need to instantiate the model first .

model = Mnist_Logistic()

Now it can be calculated as above loss. Be careful nn.Module Object is used as a function ( That is, they are callable ), But at the bottom ,Pytorch Will automatically call forward Method .

print(loss_func(model(xb), yb))

Output :

tensor(2.2899, grad_fn=<NllLossBackward0>)

For the training cycle , We need to update each parameter value according to the name , And manually set the gradient of each parameter 0:

with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()

Now? , We can use model.parameters() and model.zero_grad()( It's all by Pytorch by nn.Module Defined ) Make these steps easier , And it is not easy to forget some parameters , Especially some complex models with home :

with torch.no_grad():
for p in model.parameters(): p -= p.grad * lr
model.zero_grad()

Surround the training cycle with fit Function , In order to run again later

def fit():
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb) loss.backward()
with torch.no_grad():
for p in model.parameters():
p -= p.grad * lr
model.zero_grad() fit()

go over loss Whether it has dropped :

print(loss_func(model(xb), yb))

Output :

tensor(0.0835, grad_fn=<NllLossBackward0>)

Use nn.Linear restructure

Let's continue refactoring the code . Replace manual definition and initialization self.weights and self.bias, And calculation xb @ self.weights + self.bias, For a linear layer , We will use Pytorch class nn.Linear, It can realize all the above functions .Pytorch By many predefined layers, It can greatly simplify and speed up our code .

class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.lin = nn.Linear(784, 10)
def forward():
return self.lin(xb)

Instantiate the model and calculate loss

model = Mnist_Logistc()
print(loss_func(model(xb), yb))

Output :

tensor(2.3694, grad_fn=<NllLossBackward0>)

Use fit Method training :

fit()

print(loss_func(model(xb), yb))

Output :

tensor(0.0820, grad_fn=<NllLossBackward0>)

Use optim restructure

Pytorch There are also libraries containing various optimization algorithms :torch.optim. We can use step Method instead of manually updating parameters .

This can replace the current manual optimization steps :

with torch.no_grad():
for p in model.parameters(): p -= p.grad * lr
model.zero_grad()

But only with :

opt.step()
opt.zero_grad()

optim.zero_grad() Reset the gradient to 0, Need to calculate the next minibatch Call before gradient of )

for torch import optim

Define a small function to create the model and optimizer , For later use :

def get_model():
model = Mnist_Logistic()
return model, optim.SGD(model.parameters(), lr=lr) model, opt = get_modle()
print(loss_func(model(xb), yb)) for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb) loss.backward()
opt.step()
opt.zero_grad() print(loss_func(model(xb), yb))

Output :

tensor(2.3185, grad_fn=<NllLossBackward0>)
tensor(0.0827, grad_fn=<NllLossBackward0>)

Use Dataset restructure

PyTorch There is an abstract Dataset class . One Dataset It can be owned by any one __len__ function ( Can be Python Standards for len Function call ), And a __getitem__() function , Used to index .This tutorial Complete description of creating custom Dataset Example , take FacialLandmarkDataset Class as Dataset Subclasses of .

Pytorch Of TensorDataset It's a package tensor Of Dataset. By defining length And indexing , It also provides iterations 、 Index and follow tensor The way of slicing in the first dimension . This will make it easier for us to access independent variables and dependent variables in the same line during training .

from torch.utils.data import TensorDataset

x_train and y_train It can be put into a single TensorDataset, For iteration and slicing .

train_ds = TensorDataset(x_train, y_train)

Before , We need to iterate separately x and y Of minibatches:

xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]

Now? , These two steps can be combined into one step :

xb, yb = train_ds[i*bs : i*bs+bs]
model, opt = get_model()
for epcoh in range(epochs):
for i in range((n - 1) // bs + 1):
xb, yb = train_ds[i * bs: i * bs + bs]
pred = model(xb)
loss = loss_func(pred, yb) loss.backward()
opt.step()
opt.zero_grad() print(loss_func(model(xb), yb))

Output :

tensor(0.0812, grad_fn=<NllLossBackward0>)

Use DataLoader restructure

Pytorch Of DataLoader Responsible for managing the batches. You can get it from any Dataset establish DataLoader.DataLoader Make iteration batch It is more convenient , Compared with train_ds[i * bs: i * bs + bs],DataLoader Automatically provide each minibatch.

from torch.utils.data import DataLoader

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs)

previously , Loop iteration batches (xb, yb) That's true :

for i in range((n-1)//bs + 1):
xb,yb = train_ds[i*bs : i*bs+bs]
pred = model(xb)

Now? ,(xb, yb) Automatically from dataloader load , The cycle is more concise :

for xb, yb in train_dl:
pred = model(xb)
model, opt = get_model()

for epoch in range(epochs):
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb) loss.backward()
opt.step()
opt.zero_grad() print(loss_func(model(xb), yb))

Output :

tensor(0.0824, grad_fn=<NllLossBackward0>)

Thanks to the Pytorch Of nn.Module,nn.Parameter,Dataset, and DataLoader, Our training cycle is now greatly reduced , And easier to understand . Now try to add the basic functions necessary for an effective model in practice .

Add validation

In the first section , We are just trying to establish a reasonable training cycle for training data . actually , You need one more validation set, To verify whether it is over fitted .

Disrupting training data is important to prevent batch The connection between and over fitting is very important . But on the other hand , Whether or not shuffle Verification set , Verification set of loss It's all certain , therefore shuffle Validation sets are meaningless .

We use 2 Times the training set batch As a validation set batch. This is because the validation set does not need to be back propagated , There is no need to save the gradient , Only a small amount of memory . Take advantage of this , We use bigger batch size And calculate losses faster .

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True) valid_ds = TensorDataset(x_valid, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs*2)

We will be in each epoch End the calculation and print the validation set loss

( Be careful , Call before training model.train(), Call before reasoning model.eval(), This is because layers It uses such as nn.BatchNorm2dnn.Dropout, These are not required for verification )

model, opt = get_model()

for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb) loss.backward()
opt.step()
opt.zero_grad() model.eval()
with torch.no_grad():
valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl) print(epoch, valid_loss / len(valid_dl))

Output :

0 tensor(0.3260)
1 tensor(0.2793)

structure fit() and get_data()

Now do some custom refactoring . Because there are two similar processes of calculating losses during training and verification , We now put it in function,loss_batch, It can be for each batch Calculate the loss .

We pass in for the training set optimizer, And used for back propagation , For the validation set , Unwanted optimizer, So this function will not carry out back propagation .

def loss_batch(model, loss_func, xb, yb, opt=None):
loss = loss_func(model(xb), yb) if opt is not None:
loss.backward()
opt.step()
opt.zero_grad() return loss.item(), len(xb)

fit Run the necessary operations to train our model , And calculate each epoch Training and verification loss

import numpy as np

def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
loss_batch(model, loss_func, xb, yb, opt) model.eval()
with torch.no_grad():
losses, nums = zip(
*[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]
)
val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums) print(epoch, val_loss)

get_data Returns the number of training sets and validation sets dataloaders

def get_data(train_ds, valid_ds, bs):
return(
DataLoader(train_ds, batch_size=bs, shuffle=True),
DataLoader(valid_ds, batch_size=bs * 2),
)

Now? , All the processes , Including data loading 、 Fitting model can be used 3 Line of code runs

train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
model, opt = get_model()
fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Output :

0 0.37930963896512987
1 0.28746109589338303

You can use this 3 Line based code trains various models . Let's see if we can train with these three lines of code CNN.

Switch to CNN

We are now going to build with 3 Layer by layer neural network . Because the previous section did not assume functions about the model form , So you can use the above 3 Line code direct training CNN, No changes needed .

We will use Pytorch The predefined Conv2d Class as our convolution layer . Define a with 3 Laminated CNN. Each convolution layer is followed by a ReLU. Last , Use an average pooling layer ( Be careful ,view when numpy Medium reshape Corresponding pytorch edition )

class Mnist_CNN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1) def forward(self, xb):
xb = xb.view(-1, 1, 28, 28)
xb = F.relu(self.conv1(xb))
xb = F.relu(self.conv2(xb))
xb = F.relu(self.conv2(xb))
xb = F.avg_pool2d(xb, 4)
return xb.view(-1, xb.size(1)) lr = 0.1

Momentum Is a variant of random gradient descent , It is considered and falls behind the previous update , It usually makes training faster .

model = Mnist_CNN()
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9) fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Output :

0 0.36561058859825135
1 0.2279014788389206

nn.Sequential

torch.nn There is another convenient class that can be used to simplify our code :Sequential . One Sequential Object runs each module contained in it in sequence . This is a simpler way to write Neural Networks .

To take advantage of this , We need to define a from a given function custom layer. for example ,PyTorch No, view layer , We need to create one ,Lambda A layer will be created , When using Sequential When defining neural networks, you can use .

class Lambda(nn.Module):
def __init__(self, func):
super().__init__()
self.func = func def forward(self, x):
return self.func(x) def preprocess(x):
return x.view(-1, 1, 28, 28)

Use Sequential establish model It's simpler :

model = nn.Sequential(
Lambda(preprocess),
nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.AvgPool2d(4),
Lambda(lambda x: x.view(x.size(0), -1)),
) opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9) fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Output :

0 0.3266831987142563
1 0.24143515722751618

Wrapping DataLoader

our CNN It's quite simple , But it can only be used in MNIST Upper use , because :

  • It assumes that the input is 28*28 The long vector of
  • It assumes the last CNN The grid size is 4*4( This is the average pooled core size we use )

Let's jump out of the above two assumptions , So that our model can be used for any 2 Dimensional single channel pictures . First , We can remove the initial by moving the data preprocessing to the generator Lambda layer .

def preprocess(x, y):
return x.view(-1, 1, 28, 28), y class WrapperDataLoader:
def __init__(self, dl, func):
self.dl = dl
self.func = func def __len__(self):
return len(self.dl) def __iter__(self):
for b in batches:
yield (self.func(*b)) train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrapperDataLoader(train_dl, preprocess)
valid_dl = WrapperDataLoader(valid_dl, preprocess)

Next , Use nn.AdaptiveAvgPool2d Instead of nn.AvgPool2d, It allows us to define what we need output tensor size , Instead of what we already have input tensor. such , The model can be applied to any input size .

model = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
Lambda(lambda x: x.view(x.size(0), -1)),
) opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

Have a try :

fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Output :

0 0.3653434091329575
1 0.31836258018016816

Use GPU

If you can use support CUDA Of GPU( You can choose from most cloud providers in 0.5 dollar / Rent one per hour ) Speed up your code . First check your GPU Can it be in Pytorch Run in :

print(torch.cuda.is_available())

Output :

True

And then create a device object

dev = torch.device(
"cuda") if torch.cuda.is_available() else torch.device("cpu")

to update preprocess, take batches Move to GPU On :

def preprocess(x, y):
return x.view(-1, 1, 28, 28).to(dev), y.to(dev) train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrappedDataLoader(train_dl, preprocess)
valid_dl = WrappedDataLoader(valid_dl, preprocess)

Last , take model Move to GPU:

model.to(dev)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

You will find that it runs faster now :

fit(epochs, model, loss_func, opt, train_dl, valid_dl)

Output :

0 0.22935870126485824
1 0.21556809792518616

Closing thougths

Now we have a general data pipeline And the training cycle , utilize Pytorch, You can train many types of models. To understand how simple the current training model is , You can see mnist_sample Examples notebook.

Of course , There are many more you want to add , Such as data enhancement 、 Super parameter adjustment 、 Monitor training 、 Transfer learning and so on . These functions are fastai In the library , It was developed using the same design methodology shown in this tutorial , It provides convenience for practitioners who want to further develop the model .

We promise at the beginning , It will be explained by examples torch.nn、'torch.optim'、'Dataset'、'DataLoader'. So let's summarize the above :

  • torch.nn

    • Module: Create a callable object similar to a function , But it can also include some states ( Such as the weight of neural network layer ). It knows what it contains Parameter(s), And can reduce their gradient to 0, Loop through them for weight updates, etc .

    • Parameter:a wrapper for a tensor. It tells Module The weight that needs to be updated during back propagation . Only with requires_grad Attribute tensor Will be updated .

    • functional: An activation function 、 Loss function and other modules ( Usually imported as F), It also includes non-stateful Version layer , For example, convolution layer and linear layer .

  • torch.optim: Include optimizer , for example SGD etc. , In back propagation step In the update Paramter The weight of .

  • Dataset: One has __len__ and __getitem__ The abstract interface of the object , Include Pytorch Class provided , Such as TensorDataset.

  • DataLoader: Receive any Dataset, And create an iterator , The iterator returns batches.

What exactly is TORCH.NN? More articles about

  1. torch.nn.Embedding understand

    Pytorch The explanation on the official website is : A simple lookup table with a fixed dictionary and size . This module is often used to store embedded words and retrieve them with subscripts . The input to the module is a list of subscripts , The output is the corresponding word embedded . torch.nn.Embedding(nu ...

  2. Pytorch I doubt (1) torch.nn and torch.nn.functional The difference between

    When we write the code, we find that we are defining Model when , There are two ways to define : torch.nn.Conv2d() and torch.nn.functional.conv2d() So what's the difference between the two methods , Let's look at it through the following code ...

  3. PyTorch Official Chinese documents :torch.nn

    torch.nn Parameters class torch.nn.Parameter() Albert (http://www.aibbt.com/) The first AI portal in China , WeChat official account :aibbtcom ...

  4. pytorch Chinese document -torch.nn.init Common functions - To be added

    Reference resources :https://pytorch.org/docs/stable/nn.html torch.nn.init.constant_(tensor, val) Using parameter val Fill in the input with the value of tensor ...

  5. pytorch Chinese document -torch.nn Common functions - To be added - Continue tomorrow

    https://pytorch.org/docs/stable/nn.html 1) Convolution layer class torch.nn.Conv2d(in_channels, out_channels, kerne ...

  6. torch.nn.functional in softmax The function and parameter description of

    Reference resources :https://pytorch-cn.readthedocs.io/zh/latest/package_references/functional/#_1 class torch.nn.Soft ...

  7. torch.nn.Embedding

    The common method of constructing word vector in natural language , take id The corpora after transformation , Mapping to a low dimensional dense vector space ,pytorch Use the following in : import torch import torch.utils.data as Dat ...

  8. pytorch Gradient cut (Clipping Gradient):torch.nn.utils.clip_grad_norm

    torch.nn.utils.clip_grad_norm(parameters, max_norm, norm_type=2) 1. Gradient cutting principle (http://blog.csdn.net/qq_29 ...

  9. torch.nn.CrossEntropyLoss

    class torch.nn.CrossEntropyLoss(weight=None, size_average=True, ignore_index=-100, reduce=True) I don't have ...

  10. pytorch in torch.nn The meaning of building different layers of neural networks

    Mainly refer to here , Write very well PyTorch Introduction of actual combat ( Four )-- utilize Torch.nn Building convolutional neural network Convolution layer nn.Con2d() Common parameters in_channels: Enter the number of channels out_channels: Output ...

Random recommendation

  1. Django

    One .Django brief introduction Django It's a by Python Written in open source Web Application framework . It was originally developed to manage news content based websites owned by Lawrence publishing group , That is CMS( Content management system ) ...

  2. Eclipse: Tips Toolchain &quot;MinGW GCC&quot; is not detected

    terms of settlement : 1. hold D:\MinGW\bin, Set to PATH route . 2. restart eclipse

  3. [ recommend ] Two paragraphs Flash Upload plug-ins (1)–CFUpdate File batch upload component

    All the students who are engaged in development know that , Available on the Internet Flash There are very few components to upload pictures , Customizable interface , No need to install components , And it can be used for any commercial system free of charge , It's even rarer , If you're like me , If there is an urgent need for such a component , Might as well try. C ...

  4. Reprint -MySQL Lock processing analysis

    MySQL Lock processing analysis Published in  2013 year 12 month 13 Japan   from  hedengcheng 1     background     1 1.1    MVCC:Snapshot Read vs Current Re ...

  5. JS Some basic compatibility issues in Maybe the explanation is not very clear

    Do it. Pay attention : One, if both are attributes , Use logic || Be compatible Second, if there is a method Use the ternary operator for compatibility Three or more properties or methods encapsulate functions for compatibility One : Google browser is compatible with the mouse scroll bar of Firefox browser 1.document.docume ...

  6. Distributed computing framework Gearman The principle,

    What is? Gearman? Gearman Provides a general application framework , It is used to transfer work to other machines or processes that are more suitable for the work . It allows you to work in parallel , Load balancing processing , And call functions between languages . It can be used to transfer from high availability sites to databases ...

  7. Linux Common search commands

    The first one is :grep command Example : The second kind :find command Example : The third kind of :locate command Example : A fourth :whereis command Example : The fifth :which command Example :

  8. Section 11 : EF Three models of ( One ) And DBFirst Pattern (SQLServer and MySQL Two schemes )

    One . brief introduction EF There are three modes to connect to the database , Namely DBFirst.ModelFirst.CodeFirst, Applicable to different development scenarios . This chapter , It will mainly introduce EF Of DBFirst Connect SQLServer Database and M ...

  9. poj2492( Take the right and check the collection )

    Topic link :http://poj.org/problem?id=2492 The question : give n personal ,m Relationships , Each relationship represents two people of opposite sex , Judge this m Whether the relationship is wrong . Ideas : Take the right and check the collection , similar poj1182, And look up the direction of the set ...

  10. ubuntu18.04 To configure nvidia docker And remote connection ssh+ Remote Desktop Connection ( One )

    ubuntu18.04 To configure nvidia docker And remote connection ssh+ Remote Desktop Connection ( One ) This tutorial is for you to configure... On a remote server docker Graphical interface for users of deep learning . ( One )ubuntu18.04 To configure n ...