当前位置：网站首页>Hyperparameter Optimization - Excerpt

Hyperparameter Optimization - Excerpt

2022-07-31 06:14:00 【Young_win】

Introduction to Hyperparameter Optimization

When building a deep learning model, you must decide: how many layers should be stacked?How many cells or filters should each layer contain?Should activation use relu or some other function?Should BatchNormalization be used after a certain layer?What dropout ratio should I use?These parameters at the architectural level are called hyperparameters.Correspondingly, model parameters can be trained by backpropagation.

The way to adjust hyperparameters is generally "to formulate a principle to systematically and automatically explore the possible decision space".Search the architecture space and empirically find the best performing architecture.The process of hyperparameter optimization:
1.) Select a set of hyperparameters (automatic selection);
2.) Build the corresponding model;
3.) Fit the model on the training data and measureits final performance on validation data;
4.) choose the next set of hyperparameters to try (automatic selection);
5.) repeat the above process;
6.) finally, measure the model atPerformance on test data.
The key to this process is that, given many sets of hyperparameters, the history of validation performance is used to select the next set of hyperparameters to evaluate of algorithms, such as: Bayesian Optimization, Genetic Algorithms, Simple Random Search, etc.

Hyperparameter optimization v.s. model parameter optimization

Training the model weights is relatively simple, that is, calculating the loss function on a small batch of data, and then using the backpropagation algorithm to make the weights move in the correct direction.

Hyperparameter optimization: (1.) Computing the feedback signal - whether this set of hyperparameters results in a high-performance model on this task - can be very computationally expensive, and it requires creating aNew models and trained from scratch; (2.) The hyperparameter space usually consists of many discrete decisions and thus is neither continuous nor differentiable.Therefore, in general can't do gradient descent in hyperparameter space.Instead, you have to rely on optimization methods that don't use gradients, which are much less efficient than gradient descent.

In general, random search - randomly selecting the hyperparameters to evaluate, and repeating the process - is the best solution.a.) The Python tool library Hyperopt is a hyperparameter optimization tool that internally uses Parzen to estimate its tree to predict which set of hyperparameters may yield good results.b.) The Hyperas library is to integrate Hyperopt with the Keras model.

An important issue to keep in mind when doing large-scale hyperparameter field optimization is validation overfitting.Because you are using the validation data to calculate a signal, and then updating the hyperparameters based on that signal, you are actually training the hyperparameters on the validation data and will soon overfit the validation data.

原网站

版权声明
本文为[Young_win]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/212/202207310515164738.html

当前位置：网站首页>Hyperparameter Optimization - Excerpt

Hyperparameter Optimization - Excerpt

Introduction to Hyperparameter Optimization

Hyperparameter optimization v.s. model parameter optimization

边栏推荐

猜你喜欢

随机推荐