当前位置：网站首页>A detailed explanation of vector derivative and matrix derivative

A detailed explanation of vector derivative and matrix derivative

2022-07-03 10:40:00 【serity】

Catalog

One 、 Numerator layout and denominator layout
Two 、 Vector derivative
3、 ... and 、 Matrix derivative
- 3.1 Matrix derivative of scalar
- 3.2 Scalar derivative of matrix
Reference resources

Vector derivative and matrix derivative are the mathematical basis of machine learning , Read this article carefully , I believe you will have a lot to gain ~

$\textcolor{red}{ When it comes to vectors , Unless otherwise specified , We default to {\bf Column vector }}$

One 、 Numerator layout and denominator layout

We know , Scalar （Scalar）、 vector （Vector） And matrices （Matrix） The three of them meet the following relationship ：

$\subset vector \subset matrix$

That is, a vector can be understood as a special matrix （ The number of columns is $1$ Matrix ）, Scalar can be understood as a special vector （ Dimension for $1$ Vector ）, It can also be understood as a $1\times 1$ Matrix , So the vector derivative and matrix derivative we are discussing today can be collectively referred to as “ Matrix derivative ”.

There are six common matrix derivatives ：

Scalar derivative of scalar is familiar to everyone （ $f^{'} (x)$ It's a typical example ）, We won't discuss it here . In fact, we can also discuss the derivative between matrix and vector , Derivative between matrices , That is, where the table is empty , But because the results of these derivatives involve dimensions greater than $2$ Tensor （tensor）, We can no longer express in the form of matrix , Therefore, we will not discuss .

Next, we will focus on the remaining five matrix derivatives , namely ：

Vector derivative of scalar
Scalar derivative of vector
Vector to vector derivation
Matrix derivative of scalar
Scalar derivative of matrix

Suppose we have $\boldsymbol{x}=(x_1,\cdots,x_n)^{\mathrm T}$ and $\boldsymbol{y}=(y_1,\cdots,y_m)^{\mathrm T}$ Two vectors , be $\displaystyle \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}$ share $m n$ Weight ：

$\frac{\partial y_i}{\partial x_j},\quad i=1,\cdots,m\quad j=1,\cdots,n$

How should we arrange this $m n$ How many components ？ That's what we need Molecular arrangement （Numerator Layout） and Denominator layout （Denominator Layout）了 . The so-called layout , It is nothing more than an arrangement of the above results , If the arrangement is not specified , It is likely to cause errors in the process of mathematical operations （ For example, due to the dimension of the matrix, it cannot be multiplied ）.

When talking about vector derivatives , We have two very important premises ：

① Molecules and denominators Is a vector , And one of them is Row vector , The other is Column vector
② One of the numerator and denominator is Scalar , The other is That's ok / Column vector

When ① or ② When satisfied , Our next discussion is meaningful .

We see first ①：

If denominator is column vector , Molecules are row vectors , It is called denominator layout
If the molecule is a column vector , The denominator is the row vector , It is called molecular layout

In a word, it is ： Who is the column vector is what layout .

about ②, We can still use “ Who is the column vector is what layout ” To judge , But if the numerator and denominator are not column vectors , How to judge ？

This situation is also summarized in one sentence ： Who is scalar is what layout .

We can summarize these discussions in the following table ：

about Matrix derivative , Things are a little different ：

Besides , We also have the following Important equation ：

$^{\mathrm{T}},\qquad The result of the denominator layout = The result of molecular layout ^{\mathrm{T}}$

All the above results can be summarized into the following three figures ：

More intuitive expression ：

Our next discussion will be based on Molecular arrangement .

Two 、 Vector derivative

2.1 Vector derivative of scalar

Some rules for the derivation of vectors from scalars ：

2.2 Scalar derivative of vector

Some rules of scalar derivation from vector ：

Some of scalar derivatives of vectors Important conclusions ：

$\frac{\partial a }{\partial \boldsymbol x}={\bf 0}^{\mathrm T}\tag{2.2.A}$

$\frac{\partial \boldsymbol a^{\mathrm T}\boldsymbol x }{\partial \boldsymbol x}=\frac{\partial \boldsymbol x^{\mathrm T}\boldsymbol a }{\partial \boldsymbol x}=\boldsymbol a^{\mathrm T} \tag{2.2.B}$

$\frac{\partial \boldsymbol x^{\mathrm T}\boldsymbol x }{\partial \boldsymbol x}=2\boldsymbol x^{\mathrm T} \tag{2.2.C}$

$\frac{\partial \boldsymbol x^{\mathrm T}{\bf A}\boldsymbol x }{\partial \boldsymbol x}=\boldsymbol x^{\mathrm T}({\bf A}+{\bf A}^{\mathrm T}) \tag{2.2.D}$

2.3 Vector to vector derivation

Some rules of vector to vector derivation ：

Some of the derivatives of vectors Important conclusions ：

$\frac{\partial \boldsymbol a }{\partial \boldsymbol x}={\bf O}\tag{2.3.A}$

$\frac{\partial \boldsymbol x }{\partial \boldsymbol x}={\bf I}\tag{2.3.B}$

$\frac{\partial {\bf A}\boldsymbol x }{\partial \boldsymbol x}={\bf A}\tag{2.3.C}$

3、 ... and 、 Matrix derivative

3.1 Matrix derivative of scalar

Some rules of matrix deriving scalar ：

3.2 Scalar derivative of matrix

Some rules of scalar matrix derivation ：

Some of scalar derivation of matrix Important conclusions ：

$\frac{\partial a }{\partial {\bf X}}={\bf O}\tag{3.2.A}$

$\frac{\partial \boldsymbol a^{\mathrm T}{\bf X}\boldsymbol b }{\partial {\bf X}}=\boldsymbol{ba}^{\mathrm T}\tag{3.2.B}$

$\frac{\partial \boldsymbol a^{\mathrm T}{\bf X}^{\mathrm T}\boldsymbol b }{\partial {\bf X}}=\boldsymbol{ab}^{\mathrm T}\tag{3.2.C}$

Besides , We often encounter the derivation of trace pair matrix , Relevant conclusions are as follows ：

$\frac{\partial \mathrm{tr}({\bf X})}{\partial {\bf X}}={\bf I}\tag{3.2.D}$

$\frac{\partial \mathrm{tr}({\bf X}^{k})}{\partial {\bf X}}=k{\bf X}^{k-1}\tag{3.2.E}$

$\frac{\partial \mathrm{tr}({\bf AX})}{\partial {\bf X}}=\frac{\partial \mathrm{tr}({\bf XA})}{\partial {\bf X}}={\bf A}\tag{3.2.F}$

$\frac{\partial \mathrm{tr}({\bf AX}^{\mathrm T})}{\partial {\bf X}}=\frac{\partial \mathrm{tr}({\bf X}^{\mathrm T}{\bf A})}{\partial {\bf X}}={\bf A}^{\mathrm T}\tag{3.2.G}$

$\frac{\partial \mathrm{tr}({\bf X}^{\mathrm T}{\bf AX})}{\partial {\bf X}}={\bf X}^{\mathrm T}({\bf A}+{\bf A}^{\mathrm T})\tag{3.2.H}$

$\frac{\partial \mathrm{tr}({\bf X}^{-1}{\bf A})}{\partial {\bf X}}=-{\bf X}^{-1}{\bf AX}^{-1}\tag{3.2.I}$

$\frac{\partial \mathrm{tr}({\bf AXB})}{\partial {\bf X}}=\frac{\partial \mathrm{tr}({\bf BAX})}{\partial {\bf X}}={\bf BA}\tag{3.2.J}$

$\frac{\partial \mathrm{tr}({\bf AXBX}^{\mathrm T}{\bf C})}{\partial {\bf X}}={\bf BX}^{\mathrm T}{\bf CA}+{\bf B}^{\mathrm T}{\bf X}^{\mathrm T}{\bf A}^{\mathrm T}{\bf C}^{\mathrm T} \tag{3.2.K}$