当前位置:网站首页>General process of machine learning training and parameter optimization (discussion)

General process of machine learning training and parameter optimization (discussion)

2022-07-06 02:13:00 Min fan

Abstract : In practical machine learning applications , Not only model training , Also control the input parameters . This paper describes the general process , For reference only .

1. Training machine learning models

For an input of m m m Features , Output as a decision indicator , Machine learning models can be built
f : R m → R (1) f: \mathbb{R}^m \to \mathbb{R} \tag{1} f:RmR(1)
among R \mathbb{R} R Is a set of real numbers . If different features have their own value range , Then the machine learning model can be expressed as
f : ∏ i = 1 m V i → R (2) f: \prod_{i=1}^m \mathbf{V}_i \to \mathbb{R} \tag{2} f:i=1mViR(2)
among V i \mathbf{V}_i Vi It's No i i i Value range of features .
Simplicity , Only... Is discussed below (1) Model corresponding to formula .
Given to contain n n n Characteristic matrix of instances X = [ x 1 , … , x n ] T ∈ R n × m \mathbf{X} = [\mathbf{x}_1, \dots, \mathbf{x}_n]^{\mathrm{T}} \in \mathbb{R}^{n \times m} X=[x1,,xn]TRn×m And the corresponding label vector Y ∈ R n \mathbf{Y} \in \mathbb{R}^n YRn, The optimization objective of machine learning can generally be expressed as
min ⁡ f L ( f ( X ) , Y ) + R ( f ) (3) \min_f \mathcal{L}(f(\mathbf{X}), \mathbf{Y}) + R(f) \tag{3} fminL(f(X),Y)+R(f)(3)
among f ( X ) = [ f ( x 1 ) , … , f ( x n ) ] f(\mathbf{X}) = [f(\mathbf{x}_1), \dots, f(\mathbf{x}_n)] f(X)=[f(x1),,f(xn)] Vector for predicted tags , R ( f ) R(f) R(f) by f f f Regular term of parameter in . If the optimization goal is a convex function , Then the gradient descent method can be used to quickly find the optimal solution . For regular terms :

  • If f f f For a linear model , The regular can be 1 norm 、2 norm 、 Kernel norm, etc . Its function is to prevent over fitting .
  • If f f f For a neural network model , You can use the dropout And other technologies to prevent over fitting .

2. Parameter optimization method

For some practical problems , Some of the input characteristics are objective , Some are controllable . No loss of generality , Before order m 1 m_1 m1 The first feature is objective , after m 2 m_2 m2 Three features are controllable ( So we also call it parameter ), m 1 + m 2 = m m_1 + m_2 = m m1+m2=m. Suppose a reliable machine learning model has been trained through a large amount of data f f f, And we expect to maximize the decision indicators . Given the objective eigenvector x b ∈ R m 1 \mathbf{x}_b \in \mathbb{R}^{m_1} xbRm1, The objective function of parameter optimization is
arg max ⁡ x u ∈ R m 2 f ( x b ∥ x u ) (4) \argmax_{\mathbf{x_u} \in \mathbb{R}^{m_2}} f(\mathbf{x}_b \| \mathbf{x}_u)\tag{4} xuRm2argmaxf(xbxu)(4)
among ∥ \| Indicates the vector splicing operation .

  • If f f f Each controllable feature is a convex function , Then the optimal parameters can be obtained by gradient descent and other methods .
  • If f f f The controllable features are not a convex function , Then some bionic algorithms can be used to optimize the parameters .
  • If the controllable features are enumerated, the cardinality of the definition domain is not large , Then the optimal parameters can be obtained directly by the exhaustive method . example : Controllable features include 5 individual , Everyone with a 10 Possible values , From the 1 0 5 10^5 105 The optimal parameter vector is obtained from three parameter combinations , It only takes a few seconds to calculate .
原网站

版权声明
本文为[Min fan]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060149083353.html